This is a talk that was given for the Scalable Internet Services Masters-level Computer Science class at UCLA and UCSB. It briefly discusses the server architecture for the game League of Legends before going into depth about how the data warehouse can hold petabytes of player data. Discussion about message queue architecture and scalability occurs along the way
2. SEAN MALONEY
BIG DATA ENGINEER
WHO IS THIS GUY?
Lead developer on Riot’s ETL tools
FUN FACT:
Was a student in this class 4 years
ago
Intern at Appfolio
3. MOVING MOUNTAINS OF DATA
INTRODUCTION1.
THE GAME PLATFORM: OUR MAIN DATA SOURCE2.
HOW WE INGEST AND QUERY DATA3.
HOW WE SCALE IN AWS4.
CONCLUSION - SEAN’S PRO TIPS5.
14. CHAT
ORACLE COHERENCE (IN MEMORY DB)
STORE AUDIT GAME ETC.
CHAT
CHAT
STORE AUDIT GAME ETC.
STORE AUDIT GAME ETC.
PRIMARY DB
HOT BACKUP DB
2nd BACKUP DB
/ ETL
18. PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING
19. PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE
QUERY /
VIEWS
VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING
20. Distributed ETL Software written in
Ruby.
Scales Horizontally
Same ETL applied to multiple regions
/ datacenters
Self-Service UI with SQL query
templating.
28. Webapp
Core Libraries
Task Service
Tasks
Helper Service
Helpers
Environment
Service
Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool
View
- backbone.js
- Bootstrap CSS
Task DAO Helper DAOEnvironment
DAO
Env. Task DAO Env. Helper DAO
29. Webapp
Core Libraries
Task Service
Tasks
Helper Service
Helpers
Environment
Service
Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool
View
- backbone.js
- Bootstrap CSS
Task DAO Helper DAOEnvironment
DAO
Env. Task DAO Env. Helper DAO
30. Webapp
Core Libraries
Task Service
Tasks
Task DAO
Helper Service
Helpers
Helper DAO
Environment
Service
Environment
DAO
Scheduler Process Worker Process Task / Helper / Controllers
Env. Task DAO Env. Helper DAO
Command Line Tool
View
- backbone.js
- Bootstrap CSS
31. Webapp
Core Libraries
Task Service
Tasks
Helper Service
Helpers
Environment
Service
Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool
View
- backbone.js
- Bootstrap CSS
Task DAO Helper DAOEnvironment
DAO
Env. Task DAO Env. Helper DAO
35. Idempotency
Idempotent - an operation that will produce the
same results if executed once or multiple times
EXAMPLE:
Non-Idempotent: - x = x * 5;
- Submitting a purchase
Idempotent: - abs( abs(x) ) = abs(X)
- Cancelling a purchase
36. Idempotent?
In the transactional OLTP world….
INSERT INTO games_played
(SELECT * FROM games_played_na
WHERE date >= ‘2015-10-25’)
37. Idempotent?
In the big data / OLAP world….
INSERT INTO games_played
(SELECT * FROM games_played_na
WHERE date >= ‘2015-10-25’)
41. Message Queues
● AMAZON SIMPLE QUEUE SERVICE
● APACHE ACTIVEMQ
● RABBITMQ
● HORNETQ
● MICROSOFT MQ (MSMQ)
42. PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE
QUERY /
VIEWS
VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING
43. Self Service, Custom HTTP Edge
Service (Java)
0
Fronted by ELB in front of ~40
autoscaled m1.xlarge instances
Forwards JSON data indirectly to S3
Honu
The batches need to then be unpacked
and converted into Hive tables
0
44. Custom Collector Infrastructure
(Java) - Derived from Netflix Suro
0
Deployed in every data center
worldwide and also AWS
Self Service, Custom HTTP Edge
Service (Java API)
Honu
50. Idempotency
Use application logic to make idempotent
msg = queue.pop;
if (processed_games.contains( msg.game_id )
{
return; //do nothing
else {
process_game(msg);
}
51. What’s in there?
Data team doesn’t know everything that is submitted
Compliance
Are we violating international data laws?
Inconsistent data structure
Its formatted however developer submits it
THE
DOWN
SIDE
52. User Documentation
No one likes doing it, but it helps a lot.
Onboard training
Get new coworkers in-the-know
Familiar Protocols
Use REST or RPC so developers are on the same page
Focus on UX
Your tools need to be easy for non-technical people to use.
SELF
SERVICE
HOW?
53. PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE
QUERY /
VIEWS
VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING
57. PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE
QUERY /
VIEWS
VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING
58. REST micro-service built with Java
and docker.
Reports and visualizations we can
use to find problems.
Source and target comparison.
Warehouse
Auditing
Service
Platform
68. RDS
AWS Infrastructure Today
EMR EC2 Storage
Data
Science
Analytics /
Hue
ETL Telemetry
PlatforaDynamoDB
Loading
Auditing ETL
Telemetry
collectors
Data
dictionary
Rocana
(real time
dashboard)
Solr (real
time)
Point Data
Service
Metastore
Data
Science
Fraud
DYNAMODB
ETL App DB
Point Data
Store
S3
Source of “Truth”
Networking
VPC
AWS Direct
Connect
AWS Direct
Connect
AWS Direct
Connect
AWS Direct
Connect
70. DON’T
SEAN’S PRO TIPS OF THE DAY
DO
➔ Don’t wait. Create S3
permissions and naming
standards early
➔ Get an auditing solution
for DW accuracy
➔ Allocate time for tuning
AWS infrastructure
➔ Don’t forget to track cost.
AWS bills can surprise you
➔ Don’t underestimate simple
problems in big data.
➔ Prepare for multiple data
access patterns
➔ Keep idempotency in mind
and use MQ architecture
➔ Don’t stop. Believing
71. Custom rewards for mastering
different champions
Intensive query that spans every
game that every player has played
Improves player engagement
CHAMPION
MASTERY
72. Full copy of our data warehouse in
DynamoDB
Hive->DynamoDB Dynamic Partition
Support can answer questions faster
than ever.
PLAYER
SUPPORT
73. Data science team queries all chat
messages in game
Sentiment analysis and
classification
Identifies negative, offensive players
and mutes them automatically.
OFFENSIVE
CHAT
DETECTION