SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
Nitin Sharma
Real-Time Impression Store @ Netflix
Sep 2018
● Recommendations @ Netflix
● Distributed Impression Store
● Distributed Processing Infrastructure
Agenda
Recommendations @ Netflix
● 129M+ active members
● 190+ countries
● Increasing launch of originals
● 2 Trillion kafka events/day
Recommendations @Scale
● Signals:
○ Impressions, Plays, View History
○ Searches, My Lists etc.
● Compute:
○ Offline Compute
○ Online Compute
Netflix Recommendation Signals
Impression Store
● Raw impressions:
○ Large dataset (PBs every day)
○ 10s of trillions of rows
○ $$$$
● Lack of signal:
○ Too noisy
○ Custom aggregation - expensive
Storage Philosophy
● Impression fatigue , Familiarity effect
● Under vs over Impressing
● Use in Recommendations - Score Online
Goals
● Want: I want all impressions
● Need:
○ Meaningful way to capture impressions
○ (W/O storing full impressions)
● Idea:
○ EMA Counts # of times - <User, Video> over
multiple decay windows.
How many times has a show been recommended
to you?
● Representation:
○ Exponential Moving Average (EMA) Score
○ Rate of impressions over a given window
○ EMA score over Windows:
■ <user, video, location> => <last_seen_ts, [1d, 1w, 1m, 6m, 1y]>
■ E.g <a, narcos, toprow> => <today, [0.9, 0.2, 0, 0, 0]>
● Benefits:
○ Better Signal
○ Memory Footprint
○ Extensible
Data model Definition - EMA
EMA Visualization
Scale & SLA
Dimension Attribute Scale
Storage
Videos * locations * users ~600B + entries
Raw Data size 150+TB
Indexing Updates
Bulk 600B+ entries < 1.5 hours
Real time 50M entries < 5mins
Query 99th Percentile Latency < 10s of ms
Patterns Read/Update/Write
Operations
Cost $$ >> $$$$
Availability 99.99% [~4m downtime/30
days]
Distributed Data Store
Observation
Attributes Stringent Flexible
Predictably faster Reads
Bulk + RT writes
Strict Schema
Secondary Index
Easy Operations
Ser/Deser at Read (aka
blob storage)
Cache with props of a
DataStore?
● Format: <K,V> storage & Schema-less
● Underlying: Memcached + Cross Region support
● Storage: In memory + Ext Storage
Evcache
Architecture
Architecture
Server
EVCar
Application
Client Library
Client
Client Side Writing (set, delete, add, etc.)
us-west-2a us-west-2cus-west-2b
ClientClient Client
Client Side Reading (get)
us-west-2a us-west-2cus-west-2b
Client
Primary Secondary
● In Memory:
○ $$$$
○ No persistence
○ Lack of consistency - Node crash etc
● Layered Storage?
But wait...
● Scenario:
○ 10% of active items => 90% of hits
○ Large values eat up RAM
● Philosophy:
○ LRU <k,v> on disk; Index in RAM
○ MRU key + value in RAM
○ Values => NVMe; SSD
Ext Storage
Ext Storage
Ext Storage
● Effects:
○ Reduce overall RAM
○ Reduce Server Count
○ Increase overall Cache size
Cost : 1 TB of storage space
EVCache - RAM Only
20 r4.2xlarge
EVCache - Flash
1 i3.xlarge
Network Bandwidth : ~40 Gbps Network Bandwidth : ~2.5 Gbps
● Cross Region Replication
● Backup & Recovery
Data Store (ish) Features
Region BRegion A
APP APP
Repl Proxy
Repl Relay
1 mutate
2 send
metadata
3 poll msg
5
https send
m
sg
6 mutate4
get data
for set
Kafka Repl Relay Kafka
Repl Proxy
Cross-Region Replication
7 read
● Cross Region:
○ Relay: Event forward through kafka
○ Proxy: Event receiver and mutate cache
■ Stateless
■ Easy Throttle
● CAS:
■ Client-> Replica
■ Client -> Kafka
■ Write Retries (by proxy) until replica is consistent.
Characteristics
● Client Side
○ Writes: All replicas in a region
○ Compression: Gzip
○ Reads:
■ Quorum - Latches
■ Consistency All
● Server Side
○ Writes: Cross regions [replication]
○ Reads: Basic read repairs [Compare & Set]
○ Storage: Mem & External
Client & Server Responsibilities
Backup & Restore Architecture
Cache Warmer
(Spark)
Application
Client Library
Client
Control
S3
Data Flow
Metadata Flow
Control Flow
Data Representation
● Protobuf
● EMA Payload:
○ (last_seen_ts, Float[1d, 1w, 1m, 6m, 1y])
● Overall:
○ Map<VideoId,Map<locationId,EMA>>
EMA Representation
Client Side Write Trade-offs
ClientClient
● Writes:
○ Read->De-ser -> Update -> Ser -> Write
● Schema evolution
Impressions Infrastructure
Batch Indexing Infrastructure (V1)
us-east-1 us-west
eu-west
Precompute/Live
Compute
RT Indexing Infrastructure (V2)
Precompute/Live
Compute
Rocksdb
● Engine: Apache Flink
● Processing Semantics:
○ Exactly once
○ Late arriving events
● RocksDB:
○ Zero data loss:
■ Failures/Crashes - Checkpoints/Savepoints
○ State Management:
■ Application state
RT Indexing Infrastructure
Chaos Engineering
● Region Failovers
● Crash Proof ?:
○ Replicas
○ Tail Latency tests
● Cross Region Replication:
○ Kafka:
■ 2-3x data
○ Low Lag:
■ Scale up Replay Proxies
Tiered Storage:
○ Memory
○ SSD/Nvme
○ EBS (Nitro based)
■ 80,000 IOPS
■ 14Gbps
Future Work
https://www.linkedin.com/in/knitinsharma/
nitins@netflix.com
Speaker Info

Más contenido relacionado

La actualidad más candente

Amazon Webservice & Cloud Computing
Amazon Webservice & Cloud ComputingAmazon Webservice & Cloud Computing
Amazon Webservice & Cloud Computing
Jack Smith
 

La actualidad más candente (20)

Aerospike - fast and furious caching @ Burgasconf 2016
Aerospike - fast and furious caching @ Burgasconf 2016Aerospike - fast and furious caching @ Burgasconf 2016
Aerospike - fast and furious caching @ Burgasconf 2016
 
Improve search optimization engine with ssr in nextjs
Improve search optimization engine with ssr in nextjsImprove search optimization engine with ssr in nextjs
Improve search optimization engine with ssr in nextjs
 
Building realtime data pipeline with Apache Kafka
Building realtime data pipeline with Apache KafkaBuilding realtime data pipeline with Apache Kafka
Building realtime data pipeline with Apache Kafka
 
Building an Efficient AI Training Platform at bilibili with Alluxio
Building an Efficient AI Training Platform at bilibili with AlluxioBuilding an Efficient AI Training Platform at bilibili with Alluxio
Building an Efficient AI Training Platform at bilibili with Alluxio
 
Building Asynchronous Microservices with Armeria
Building Asynchronous Microservices with ArmeriaBuilding Asynchronous Microservices with Armeria
Building Asynchronous Microservices with Armeria
 
Dynamo db and Cross Region Migration
Dynamo db and Cross Region MigrationDynamo db and Cross Region Migration
Dynamo db and Cross Region Migration
 
Search engine based on Elasticsearch
Search engine based on ElasticsearchSearch engine based on Elasticsearch
Search engine based on Elasticsearch
 
Scaling Islandora
Scaling IslandoraScaling Islandora
Scaling Islandora
 
Tales from production with postgreSQL at scale
Tales from production with postgreSQL at scaleTales from production with postgreSQL at scale
Tales from production with postgreSQL at scale
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 
Beyond 1000 bosh Deployments
Beyond 1000 bosh DeploymentsBeyond 1000 bosh Deployments
Beyond 1000 bosh Deployments
 
OpenStreetMap Belarus Tile Server
OpenStreetMap Belarus Tile ServerOpenStreetMap Belarus Tile Server
OpenStreetMap Belarus Tile Server
 
Amazon WebServices lection 1
Amazon WebServices lection 1Amazon WebServices lection 1
Amazon WebServices lection 1
 
What Reika Taught us
What Reika Taught usWhat Reika Taught us
What Reika Taught us
 
Amazon Webservice & Cloud Computing
Amazon Webservice & Cloud ComputingAmazon Webservice & Cloud Computing
Amazon Webservice & Cloud Computing
 
Omaha Rails User Group - Ec2
Omaha Rails User Group - Ec2Omaha Rails User Group - Ec2
Omaha Rails User Group - Ec2
 
Docker for mac & local developer environment optimization
Docker for mac & local developer environment optimizationDocker for mac & local developer environment optimization
Docker for mac & local developer environment optimization
 
Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase
 
A Cheapskates Guide to AWS
A Cheapskates Guide to AWSA Cheapskates Guide to AWS
A Cheapskates Guide to AWS
 
Ruslan Gibaiev.Doing real-time stream processing in one of the fastest-growin...
Ruslan Gibaiev.Doing real-time stream processing in one of the fastest-growin...Ruslan Gibaiev.Doing real-time stream processing in one of the fastest-growin...
Ruslan Gibaiev.Doing real-time stream processing in one of the fastest-growin...
 

Similar a Netflix - Realtime Impression Store

Similar a Netflix - Realtime Impression Store (20)

EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden Microservice
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 

Último

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 

Último (20)

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 

Netflix - Realtime Impression Store

  • 1.
  • 2. Nitin Sharma Real-Time Impression Store @ Netflix Sep 2018
  • 3. ● Recommendations @ Netflix ● Distributed Impression Store ● Distributed Processing Infrastructure Agenda
  • 5.
  • 6. ● 129M+ active members ● 190+ countries ● Increasing launch of originals ● 2 Trillion kafka events/day Recommendations @Scale
  • 7. ● Signals: ○ Impressions, Plays, View History ○ Searches, My Lists etc. ● Compute: ○ Offline Compute ○ Online Compute Netflix Recommendation Signals
  • 8.
  • 10. ● Raw impressions: ○ Large dataset (PBs every day) ○ 10s of trillions of rows ○ $$$$ ● Lack of signal: ○ Too noisy ○ Custom aggregation - expensive Storage Philosophy
  • 11.
  • 12. ● Impression fatigue , Familiarity effect ● Under vs over Impressing ● Use in Recommendations - Score Online Goals
  • 13. ● Want: I want all impressions ● Need: ○ Meaningful way to capture impressions ○ (W/O storing full impressions) ● Idea: ○ EMA Counts # of times - <User, Video> over multiple decay windows. How many times has a show been recommended to you?
  • 14. ● Representation: ○ Exponential Moving Average (EMA) Score ○ Rate of impressions over a given window ○ EMA score over Windows: ■ <user, video, location> => <last_seen_ts, [1d, 1w, 1m, 6m, 1y]> ■ E.g <a, narcos, toprow> => <today, [0.9, 0.2, 0, 0, 0]> ● Benefits: ○ Better Signal ○ Memory Footprint ○ Extensible Data model Definition - EMA
  • 16. Scale & SLA Dimension Attribute Scale Storage Videos * locations * users ~600B + entries Raw Data size 150+TB Indexing Updates Bulk 600B+ entries < 1.5 hours Real time 50M entries < 5mins Query 99th Percentile Latency < 10s of ms Patterns Read/Update/Write Operations Cost $$ >> $$$$ Availability 99.99% [~4m downtime/30 days]
  • 18. Observation Attributes Stringent Flexible Predictably faster Reads Bulk + RT writes Strict Schema Secondary Index Easy Operations Ser/Deser at Read (aka blob storage)
  • 19. Cache with props of a DataStore?
  • 20.
  • 21. ● Format: <K,V> storage & Schema-less ● Underlying: Memcached + Cross Region support ● Storage: In memory + Ext Storage Evcache
  • 24. Client Side Writing (set, delete, add, etc.) us-west-2a us-west-2cus-west-2b ClientClient Client
  • 25. Client Side Reading (get) us-west-2a us-west-2cus-west-2b Client Primary Secondary
  • 26. ● In Memory: ○ $$$$ ○ No persistence ○ Lack of consistency - Node crash etc ● Layered Storage? But wait...
  • 27. ● Scenario: ○ 10% of active items => 90% of hits ○ Large values eat up RAM ● Philosophy: ○ LRU <k,v> on disk; Index in RAM ○ MRU key + value in RAM ○ Values => NVMe; SSD Ext Storage
  • 29. Ext Storage ● Effects: ○ Reduce overall RAM ○ Reduce Server Count ○ Increase overall Cache size
  • 30. Cost : 1 TB of storage space EVCache - RAM Only 20 r4.2xlarge EVCache - Flash 1 i3.xlarge Network Bandwidth : ~40 Gbps Network Bandwidth : ~2.5 Gbps
  • 31. ● Cross Region Replication ● Backup & Recovery Data Store (ish) Features
  • 32. Region BRegion A APP APP Repl Proxy Repl Relay 1 mutate 2 send metadata 3 poll msg 5 https send m sg 6 mutate4 get data for set Kafka Repl Relay Kafka Repl Proxy Cross-Region Replication 7 read
  • 33. ● Cross Region: ○ Relay: Event forward through kafka ○ Proxy: Event receiver and mutate cache ■ Stateless ■ Easy Throttle ● CAS: ■ Client-> Replica ■ Client -> Kafka ■ Write Retries (by proxy) until replica is consistent. Characteristics
  • 34. ● Client Side ○ Writes: All replicas in a region ○ Compression: Gzip ○ Reads: ■ Quorum - Latches ■ Consistency All ● Server Side ○ Writes: Cross regions [replication] ○ Reads: Basic read repairs [Compare & Set] ○ Storage: Mem & External Client & Server Responsibilities
  • 35. Backup & Restore Architecture Cache Warmer (Spark) Application Client Library Client Control S3 Data Flow Metadata Flow Control Flow
  • 37. ● Protobuf ● EMA Payload: ○ (last_seen_ts, Float[1d, 1w, 1m, 6m, 1y]) ● Overall: ○ Map<VideoId,Map<locationId,EMA>> EMA Representation
  • 38. Client Side Write Trade-offs ClientClient ● Writes: ○ Read->De-ser -> Update -> Ser -> Write ● Schema evolution
  • 40. Batch Indexing Infrastructure (V1) us-east-1 us-west eu-west Precompute/Live Compute
  • 41. RT Indexing Infrastructure (V2) Precompute/Live Compute Rocksdb
  • 42. ● Engine: Apache Flink ● Processing Semantics: ○ Exactly once ○ Late arriving events ● RocksDB: ○ Zero data loss: ■ Failures/Crashes - Checkpoints/Savepoints ○ State Management: ■ Application state RT Indexing Infrastructure
  • 43. Chaos Engineering ● Region Failovers ● Crash Proof ?: ○ Replicas ○ Tail Latency tests ● Cross Region Replication: ○ Kafka: ■ 2-3x data ○ Low Lag: ■ Scale up Replay Proxies
  • 44. Tiered Storage: ○ Memory ○ SSD/Nvme ○ EBS (Nitro based) ■ 80,000 IOPS ■ 14Gbps Future Work