SlideShare una empresa de Scribd logo
1 de 64
June 17, 2013
#Cassandra13
Axel Liljencrantz
liljencrantz@spotify.com
How not to use
Cassandra
#Cassandra13
About me
#Cassandra13
The Spotify backend
#Cassandra13
The Spotify backend
•  Around 4000 servers in 4 datacenters
•  Volumes
-  We have ~ 12 soccer fields of music
-  Streaming ~ 4 Wikipedias/second
-  ~ 24 000 000 active users
#Cassandra13
The Spotify backend
•  Specialized software powering Spotify
-  ~ 70 services
-  Mostly Python, some Java
-  Small, simple services responsible for single task
#Cassandra13
Storage needs
•  Used to be a pure PostgreSQL shop
•  Postgres is awesome, but...
-  Poor cross-site replication support
-  Write master failure requires manual intervention
-  Sharding throws most relational advantages out the
window
#Cassandra13
Cassandra @ Spotify
•  We started using Cassandra 2+ years ago
-  ~ 24 services use it by now
-  ~ 300 Cassandra nodes
-  ~ 50 TB of data
•  Back then, there was little information about how to design
efficient, scalable storage schemas for Cassandra
#Cassandra13
Cassandra @ Spotify
•  We started using Cassandra 2+ years ago
-  ~ 24 services use it by now
-  ~ 300 Cassandra nodes
-  ~ 50 TB of data
•  Back then, there was little information about how to design
efficient, scalable storage schemas for Cassandra
•  So we screwed up
•  A lot
#Cassandra13
How to misconfigure Cassandra
#Cassandra13
Read repair
•  Repair from outages during regular read operation
•  With RR, all reads request hash digests from all nodes
•  Result is still returned as soon as enough nodes have replied
•  If there is a mismatch, perform a repair
#Cassandra13
Read repair
•  Useful factoid: Read repair is performed across all data
centers
•  So in a multi-DC setup, all reads will result in requests being
sent to every data center
•  We've made this mistake a bunch of times
•  New in 1.1: dclocal_read_repair
#Cassandra13
Row cache
•  Cassandra can be configured to cache entire data rows in
RAM
•  Intended as a memcache alternative
•  Lets enable it. What's the worst that could happen, right?
#Cassandra13
Row cache
NO!
•  Only stores full rows
•  All cache misses are silently promoted to full row slices
•  All writes invalidate entire row
•  Don't use unless you understand all use cases
#Cassandra13
Compression
•  Cassandra supports transparent compression of all data
•  Compression algorithm (snappy) is super fast
•  So you can just enable it and everything will be better, right?
#Cassandra13
Compression
•  Cassandra supports transparent compression of all data
•  Compression algorithm (snappy) is super fast
•  So you can just enable it and everything will be better, right?
•  NO!
•  Compression disables a bunch of fast paths, slowing down
fast reads
#Cassandra13
How to misuse Cassandra
#Cassandra13
Performance worse over time
•  A freshly loaded Cassandra cluster is usually snappy
•  But when you keep writing to the same columns over for a
long time, the row will spread over more SSTables
•  And performance jumps off a cliff
•  We've seen clusters where reads touch a dozen SSTables on
average
•  nodetool cfhistograms is your friend
#Cassandra13
Performance worse over time
•  CASSANDRA-5514
•  Every SSTable stores first/last column of SSTable
•  Time series-like data is effectively partitioned
#Cassandra13
Few cross continent clusters
•  Few cross continent Cassandra users
•  We are kind of on our own when it comes to some problems
•  CASSANDRA-5148
•  Disable TCP nodelay
•  Reduced packet count by 20 %
#Cassandra13
How not to upgrade Cassandra
#Cassandra13
How not to upgrade Cassandra
•  Very few total cluster outages
-  Clusters have been up and running since the early 0.7
days, been rolling upgraded, expanded, full hardware
replacements etc.
•  Never lost any data!
-  No matter how spectacularly Cassandra fails, it has
never written bad data
-  Immutable SSTables FTW
#Cassandra13
Upgrade from 0.7 to 0.8
•  This was the first big upgrade we did, 0.7.4 ⇾ 0.8.6
•  Everyone claimed rolling upgrade would work
-  It did not
•  One would expect 0.8.6 to have this fixed
•  Patched Cassandra and rolled it a day later
•  Takeaways:
-  ALWAYS try rolling upgrades in a testing environment
-  Don't believe what people on the Internet tell you
#Cassandra13
Upgrade from 0.8 to 1.0
•  We tried upgrading in test env, worked fine
•  Worked fine in production...
•  Except the last cluster
•  All data gone
#Cassandra13
Upgrade from 0.8 to 1.0
•  We tried upgrading in test env, worked fine
•  Worked fine in production...
•  Except the last cluster
•  All data gone
•  Many keys per SSTable ⇾ corrupt bloom filters
•  Made Cassandra think it didn't have any keys
•  Scrub data ⇾ fixed
•  Takeaway: ALWAYS test upgrades using production data
#Cassandra13
Upgrade from 1.0 to 1.1
•  After the previous upgrades, we did all the tests with
production data and everything worked fine...
•  Until we redid it in production, and we had reports of missing
rows
•  Scrub ⇾ restart made them reappear
•  This was in December, have not been able to reproduce
•  PEBKAC?
•  Takeaway: ?
#Cassandra13
How not to deal with large clusters
#Cassandra13
Coordinator
•  Coordinator performs partitioning, passes on request to
the right nodes
•  Merges all responses
#Cassandra13
What happens if one node is slow?
#Cassandra13
What happens if one node is slow?
Many reasons for temporary slowness:
•  Bad raid battery
•  Sudden bursts of compaction/repair
•  Bursty load
•  Net hiccup
•  Major GC
•  Reality
#Cassandra13
What happens if one node is slow?
•  Coordinator has a request queue
•  If a node goes down completely, gossip will notice quickly
and drop the node
•  But what happens if a node is just super slow?
#Cassandra13
What happens if one node is slow?
•  Gossip doesn't react quickly to slow nodes
•  The request queue for the coordinator on every node in
the cluster fills up
•  And the entire cluster stops accepting requests
#Cassandra13
What happens if one node is slow?
•  Gossip doesn't react quickly to slow nodes
•  The request queue for the coordinator on every node in
the cluster fills up
•  And the entire cluster stops accepting requests
•  No single point of failure?
#Cassandra13
What happens if one node is slow?
•  Solution: Partitioner awareness in client
•  Max 3 nodes go down
•  Available in Astyanax
#Cassandra13
How not to delete data
#Cassandra13
How not to delete data
How is data deleted?
•  SSTables are immutable, we can't remove the data
•  Cassandra creates tombstones for deleted data
•  Tombstones are versioned the same way as any other
write
#Cassandra13
How not to delete data
Do tombstones ever go away?
•  During compactions, tombstones can get merged into
SStables that hold the original data, making the
tombstones redundant
•  Once a tombstone is the only value for a specific column,
the tombstone can go away
•  Still need grace time to handle node downtime
#Cassandra13
How not to delete data
•  Tombstones can only be deleted once all non-tombstone
values have been deleted
•  Tombstones can only be deleted if all values for the
specified row are all being compacted
•  If you're using SizeTiered compaction, 'old' rows will
rarely get deleted
#Cassandra13
How not to delete data
•  Tombstones are a problem even when using levelled
compaction
•  In theory, 90 % of all rows should live in a single SSTable
•  In production, we've found that only 50 - 80 % of all reads
hit only one SSTable
•  In fact, frequently updated columns will exist in most
levels, causing tombstones to stick around
#Cassandra13
How not to delete data
•  Deletions are messy
•  Unless you perform major compactions, tombstones will
rarely get deleted
•  The problem is much worse for «popular» rows
•  Avoid schemas that delete data!
#Cassandra13
TTL:ed data
•  Cassandra supports TTL:ed data
•  Once TTL:ed data expires, it should just be compacted
away, right?
•  We know we don't need the data anymore, no need for a
tombstone, so it should be fast, right?
#Cassandra13
TTL:ed data
•  Cassandra supports TTL:ed data
•  Once TTL:ed data expires, it should just be compacted
away, right?
•  We know we don't need the data anymore, no need for a
tombstone, so it should be fast, right?
•  Noooooo...
•  (Overwritten data could theoretically bounce back)
#Cassandra13
TTL:ed data
•  CASSANDRA-5228
•  Drop entire sstables when all columns are expired
#Cassandra13
The Playlist service
Our most complex service
•  ~ 1 billion playlists
•  40 000 reads per second
•  22 TB of compressed data
#Cassandra13
The Playlist service
Our old playlist system had many problems:
•  Stored data across hundreds of millions of files, making
backup process really slow.
•  Home brewed replication model that didn't work very well
•  Frequent downtimes, huge scalability problems
#Cassandra13
The Playlist service
Our old playlist system had many problems:
•  Stored data across hundreds of millions of files, making
backup process really slow.
•  Home brewed replication model that didn't work very well
•  Frequent downtimes, huge scalability problems
•  Perfect test case for
Cassandra!
#Cassandra13
Playlist data model
•  Every playlist is a revisioned object
•  Think of it like a distributed versioning system
•  Allows concurrent modification on multiple offlined clients
•  We even have an automatic merge conflict resolver that
works really well!
•  That's actually a really useful feature
#Cassandra13
Playlist data model
•  Every playlist is a revisioned object
•  Think of it like a distributed versioning system
•  Allows concurrent modification on multiple offlined clients
•  We even have an automatic merge conflict resolver that
works really well!
•  That's actually a really useful feature said no one ever
#Cassandra13
Playlist data model
•  Sequence of changes
•  The changes are the authoritative data
•  Everything else is optimization
•  Cassandra pretty neat for storing this kind of stuff
•  Can use consistency level ONE safely
#Cassandra13
#Cassandra13
Tombstone hell
•  The HEAD column family stores the sequence ID of the latest
revision of each playlist
•  90 % of all reads go to HEAD
•  mlock
#Cassandra13
Tombstone hell
•  Noticed that HEAD requests took several seconds for some
lists
•  Easy to reproduce in cassandra-cli:
• get playlist_head[utf8('spotify:user...')];
•  1-15 seconds latency; should be < 0.1 s
•  Copy SSTables to development machine for investigation
#Cassandra13
Tombstone hell
•  Noticed that HEAD requests took several seconds for some
lists
•  Easy to reproduce in cassandra-cli:
• get playlist_head[utf8('spotify:user...')];
•  1-15 seconds latency; should be < 0.1 s
•  Copy SSTables to development machine for investigation
•  Cassandra tool sstabletojson showed that the row contained
600 000 tombstones!
#Cassandra13
Tombstone hell
•  WAT‽
•  Data is in the column name
•  Used to detect forks
#Cassandra13
Tombstone hell
•  We expected tombstones would be deleted after 30 days
•  Nope, all tombstones since 1.5 years ago were there
•  Revelation: Rows existing in 4+ SSTables never have
tombstones deleted during minor compactions
•  Frequently updated lists exists in nearly all SSTables
Solution:
•  Major compaction (CF size cut in half)
#Cassandra13
Zombie tombstones
•  Ran major compaction manually on all nodes during a few
days.
•  All seemed well...
•  But a week later, the same lists took several seconds
again‽‽‽
#Cassandra13
Repair vs major compactions
A repair between the major compactions "resurrected" the
tombstones :(
New solution:
•  Repairs during Monday-Friday
•  Major compaction Saturday-Sunday
A (by now) well-known Cassandra anti-pattern:
Don't use Cassandra to store queues
#Cassandra13
Cassandra counters
•  There are lots of places in the Spotify UI where we count
things
•  # of followers of a playlist
•  # of followers of an artist
•  # of times a song has been played
•  Cassandra has a feature called distributed counters that
sounds suitable
•  Is this awesome?
#Cassandra13
Cassandra counters
•  Yep
•  They've actually worked pretty well for us.
#Cassandra13
Lessons
#Cassandra13
How not to fail
•  Treat Cassandra as a utility belt
•  Flash
Lots of one-off solutions:
•  Weekly major compactions
•  Delete all sstables and recreate from scratch every day
•  Memlock frequently used SSTables in RAM
#Cassandra13
Lessons
•  Cassandra read performance is heavily dependent on the
temporal patterns of your writes
•  Cassandra is initially snappy, but various write patterns
make read performance slowly decrease
•  Making benchmarks close to useless
#Cassandra13
Lessons
•  Avoid repeatedly writing data to the same row over very
long spans of time
•  Avoid deleting data
•  If you're working at scale, you'll need to know how
Cassandra works under the hood
•  nodetool cfhistograms is your friend
#Cassandra13
Lessons
•  There are still various esoteric problems with large scale
Cassandra installations
•  Debugging them is really interesting
•  If you agree with the above statements, you should totally
come work with us
June 17, 2013
#Cassandra13
spotify.com/jobs
Questions?

Más contenido relacionado

La actualidad más candente

Monitoring Cassandra with Riemann
Monitoring Cassandra with RiemannMonitoring Cassandra with Riemann
Monitoring Cassandra with RiemannPatricia Gorla
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...DataStax
 
Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraMichael Kjellman
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in CassandraShogo Hoshii
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...DataStax
 
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...DataStax Academy
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersJulien Anguenot
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodesaaronmorton
 
CrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For OperatorsCrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For OperatorsDataStax Academy
 
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...DataStax
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraDataStax
 

La actualidad más candente (20)

Cassandra Metrics
Cassandra MetricsCassandra Metrics
Cassandra Metrics
 
Monitoring Cassandra with Riemann
Monitoring Cassandra with RiemannMonitoring Cassandra with Riemann
Monitoring Cassandra with Riemann
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
 
Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to Cassandra
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
 
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
CrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For OperatorsCrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For Operators
 
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 

Destacado

C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzDataStax Academy
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with CassandraMikalai Alimenkou
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Luke Tillman
 
OSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofOSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofrhatr
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsDuyhai Doan
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
More Datacenters, More Problems
More Datacenters, More ProblemsMore Datacenters, More Problems
More Datacenters, More ProblemsTodd Palino
 
OSv at Cassandra Summit
OSv at Cassandra SummitOSv at Cassandra Summit
OSv at Cassandra SummitDon Marti
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Spotify: Automating Cassandra repairs
Spotify: Automating Cassandra repairsSpotify: Automating Cassandra repairs
Spotify: Automating Cassandra repairsDataStax Academy
 

Destacado (14)

C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with Cassandra
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
 
OSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofOSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear of
 
Cassandra queuing
Cassandra queuingCassandra queuing
Cassandra queuing
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Tombstones and Compaction
Tombstones and CompactionTombstones and Compaction
Tombstones and Compaction
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
More Datacenters, More Problems
More Datacenters, More ProblemsMore Datacenters, More Problems
More Datacenters, More Problems
 
OSv at Cassandra Summit
OSv at Cassandra SummitOSv at Cassandra Summit
OSv at Cassandra Summit
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Spotify: Automating Cassandra repairs
Spotify: Automating Cassandra repairsSpotify: Automating Cassandra repairs
Spotify: Automating Cassandra repairs
 

Similar a Cassandra summit 2013 how not to use cassandra

Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsJon Haddad
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoJon Haddad
 
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanC* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanDataStax Academy
 
LJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache CassandraLJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache CassandraChristopher Batey
 
Riak at Posterous
Riak at PosterousRiak at Posterous
Riak at Posterouscapotej
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Jason Brown
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of MillionsErik Onnen
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010jbellis
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseJoe Alex
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
 

Similar a Cassandra summit 2013 how not to use cassandra (20)

Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day Toronto
 
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanC* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
 
Cassandra Silicon Valley
Cassandra Silicon ValleyCassandra Silicon Valley
Cassandra Silicon Valley
 
LJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache CassandraLJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache Cassandra
 
Riak at Posterous
Riak at PosterousRiak at Posterous
Riak at Posterous
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed Database
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Cassandra summit 2013 how not to use cassandra

  • 1. June 17, 2013 #Cassandra13 Axel Liljencrantz liljencrantz@spotify.com How not to use Cassandra
  • 4. #Cassandra13 The Spotify backend •  Around 4000 servers in 4 datacenters •  Volumes -  We have ~ 12 soccer fields of music -  Streaming ~ 4 Wikipedias/second -  ~ 24 000 000 active users
  • 5. #Cassandra13 The Spotify backend •  Specialized software powering Spotify -  ~ 70 services -  Mostly Python, some Java -  Small, simple services responsible for single task
  • 6. #Cassandra13 Storage needs •  Used to be a pure PostgreSQL shop •  Postgres is awesome, but... -  Poor cross-site replication support -  Write master failure requires manual intervention -  Sharding throws most relational advantages out the window
  • 7. #Cassandra13 Cassandra @ Spotify •  We started using Cassandra 2+ years ago -  ~ 24 services use it by now -  ~ 300 Cassandra nodes -  ~ 50 TB of data •  Back then, there was little information about how to design efficient, scalable storage schemas for Cassandra
  • 8. #Cassandra13 Cassandra @ Spotify •  We started using Cassandra 2+ years ago -  ~ 24 services use it by now -  ~ 300 Cassandra nodes -  ~ 50 TB of data •  Back then, there was little information about how to design efficient, scalable storage schemas for Cassandra •  So we screwed up •  A lot
  • 10. #Cassandra13 Read repair •  Repair from outages during regular read operation •  With RR, all reads request hash digests from all nodes •  Result is still returned as soon as enough nodes have replied •  If there is a mismatch, perform a repair
  • 11. #Cassandra13 Read repair •  Useful factoid: Read repair is performed across all data centers •  So in a multi-DC setup, all reads will result in requests being sent to every data center •  We've made this mistake a bunch of times •  New in 1.1: dclocal_read_repair
  • 12. #Cassandra13 Row cache •  Cassandra can be configured to cache entire data rows in RAM •  Intended as a memcache alternative •  Lets enable it. What's the worst that could happen, right?
  • 13. #Cassandra13 Row cache NO! •  Only stores full rows •  All cache misses are silently promoted to full row slices •  All writes invalidate entire row •  Don't use unless you understand all use cases
  • 14. #Cassandra13 Compression •  Cassandra supports transparent compression of all data •  Compression algorithm (snappy) is super fast •  So you can just enable it and everything will be better, right?
  • 15. #Cassandra13 Compression •  Cassandra supports transparent compression of all data •  Compression algorithm (snappy) is super fast •  So you can just enable it and everything will be better, right? •  NO! •  Compression disables a bunch of fast paths, slowing down fast reads
  • 17. #Cassandra13 Performance worse over time •  A freshly loaded Cassandra cluster is usually snappy •  But when you keep writing to the same columns over for a long time, the row will spread over more SSTables •  And performance jumps off a cliff •  We've seen clusters where reads touch a dozen SSTables on average •  nodetool cfhistograms is your friend
  • 18. #Cassandra13 Performance worse over time •  CASSANDRA-5514 •  Every SSTable stores first/last column of SSTable •  Time series-like data is effectively partitioned
  • 19. #Cassandra13 Few cross continent clusters •  Few cross continent Cassandra users •  We are kind of on our own when it comes to some problems •  CASSANDRA-5148 •  Disable TCP nodelay •  Reduced packet count by 20 %
  • 20. #Cassandra13 How not to upgrade Cassandra
  • 21. #Cassandra13 How not to upgrade Cassandra •  Very few total cluster outages -  Clusters have been up and running since the early 0.7 days, been rolling upgraded, expanded, full hardware replacements etc. •  Never lost any data! -  No matter how spectacularly Cassandra fails, it has never written bad data -  Immutable SSTables FTW
  • 22. #Cassandra13 Upgrade from 0.7 to 0.8 •  This was the first big upgrade we did, 0.7.4 ⇾ 0.8.6 •  Everyone claimed rolling upgrade would work -  It did not •  One would expect 0.8.6 to have this fixed •  Patched Cassandra and rolled it a day later •  Takeaways: -  ALWAYS try rolling upgrades in a testing environment -  Don't believe what people on the Internet tell you
  • 23. #Cassandra13 Upgrade from 0.8 to 1.0 •  We tried upgrading in test env, worked fine •  Worked fine in production... •  Except the last cluster •  All data gone
  • 24. #Cassandra13 Upgrade from 0.8 to 1.0 •  We tried upgrading in test env, worked fine •  Worked fine in production... •  Except the last cluster •  All data gone •  Many keys per SSTable ⇾ corrupt bloom filters •  Made Cassandra think it didn't have any keys •  Scrub data ⇾ fixed •  Takeaway: ALWAYS test upgrades using production data
  • 25. #Cassandra13 Upgrade from 1.0 to 1.1 •  After the previous upgrades, we did all the tests with production data and everything worked fine... •  Until we redid it in production, and we had reports of missing rows •  Scrub ⇾ restart made them reappear •  This was in December, have not been able to reproduce •  PEBKAC? •  Takeaway: ?
  • 26. #Cassandra13 How not to deal with large clusters
  • 27. #Cassandra13 Coordinator •  Coordinator performs partitioning, passes on request to the right nodes •  Merges all responses
  • 28. #Cassandra13 What happens if one node is slow?
  • 29. #Cassandra13 What happens if one node is slow? Many reasons for temporary slowness: •  Bad raid battery •  Sudden bursts of compaction/repair •  Bursty load •  Net hiccup •  Major GC •  Reality
  • 30. #Cassandra13 What happens if one node is slow? •  Coordinator has a request queue •  If a node goes down completely, gossip will notice quickly and drop the node •  But what happens if a node is just super slow?
  • 31. #Cassandra13 What happens if one node is slow? •  Gossip doesn't react quickly to slow nodes •  The request queue for the coordinator on every node in the cluster fills up •  And the entire cluster stops accepting requests
  • 32. #Cassandra13 What happens if one node is slow? •  Gossip doesn't react quickly to slow nodes •  The request queue for the coordinator on every node in the cluster fills up •  And the entire cluster stops accepting requests •  No single point of failure?
  • 33. #Cassandra13 What happens if one node is slow? •  Solution: Partitioner awareness in client •  Max 3 nodes go down •  Available in Astyanax
  • 34. #Cassandra13 How not to delete data
  • 35. #Cassandra13 How not to delete data How is data deleted? •  SSTables are immutable, we can't remove the data •  Cassandra creates tombstones for deleted data •  Tombstones are versioned the same way as any other write
  • 36. #Cassandra13 How not to delete data Do tombstones ever go away? •  During compactions, tombstones can get merged into SStables that hold the original data, making the tombstones redundant •  Once a tombstone is the only value for a specific column, the tombstone can go away •  Still need grace time to handle node downtime
  • 37. #Cassandra13 How not to delete data •  Tombstones can only be deleted once all non-tombstone values have been deleted •  Tombstones can only be deleted if all values for the specified row are all being compacted •  If you're using SizeTiered compaction, 'old' rows will rarely get deleted
  • 38. #Cassandra13 How not to delete data •  Tombstones are a problem even when using levelled compaction •  In theory, 90 % of all rows should live in a single SSTable •  In production, we've found that only 50 - 80 % of all reads hit only one SSTable •  In fact, frequently updated columns will exist in most levels, causing tombstones to stick around
  • 39. #Cassandra13 How not to delete data •  Deletions are messy •  Unless you perform major compactions, tombstones will rarely get deleted •  The problem is much worse for «popular» rows •  Avoid schemas that delete data!
  • 40. #Cassandra13 TTL:ed data •  Cassandra supports TTL:ed data •  Once TTL:ed data expires, it should just be compacted away, right? •  We know we don't need the data anymore, no need for a tombstone, so it should be fast, right?
  • 41. #Cassandra13 TTL:ed data •  Cassandra supports TTL:ed data •  Once TTL:ed data expires, it should just be compacted away, right? •  We know we don't need the data anymore, no need for a tombstone, so it should be fast, right? •  Noooooo... •  (Overwritten data could theoretically bounce back)
  • 42. #Cassandra13 TTL:ed data •  CASSANDRA-5228 •  Drop entire sstables when all columns are expired
  • 43. #Cassandra13 The Playlist service Our most complex service •  ~ 1 billion playlists •  40 000 reads per second •  22 TB of compressed data
  • 44. #Cassandra13 The Playlist service Our old playlist system had many problems: •  Stored data across hundreds of millions of files, making backup process really slow. •  Home brewed replication model that didn't work very well •  Frequent downtimes, huge scalability problems
  • 45. #Cassandra13 The Playlist service Our old playlist system had many problems: •  Stored data across hundreds of millions of files, making backup process really slow. •  Home brewed replication model that didn't work very well •  Frequent downtimes, huge scalability problems •  Perfect test case for Cassandra!
  • 46. #Cassandra13 Playlist data model •  Every playlist is a revisioned object •  Think of it like a distributed versioning system •  Allows concurrent modification on multiple offlined clients •  We even have an automatic merge conflict resolver that works really well! •  That's actually a really useful feature
  • 47. #Cassandra13 Playlist data model •  Every playlist is a revisioned object •  Think of it like a distributed versioning system •  Allows concurrent modification on multiple offlined clients •  We even have an automatic merge conflict resolver that works really well! •  That's actually a really useful feature said no one ever
  • 48. #Cassandra13 Playlist data model •  Sequence of changes •  The changes are the authoritative data •  Everything else is optimization •  Cassandra pretty neat for storing this kind of stuff •  Can use consistency level ONE safely
  • 50. #Cassandra13 Tombstone hell •  The HEAD column family stores the sequence ID of the latest revision of each playlist •  90 % of all reads go to HEAD •  mlock
  • 51. #Cassandra13 Tombstone hell •  Noticed that HEAD requests took several seconds for some lists •  Easy to reproduce in cassandra-cli: • get playlist_head[utf8('spotify:user...')]; •  1-15 seconds latency; should be < 0.1 s •  Copy SSTables to development machine for investigation
  • 52. #Cassandra13 Tombstone hell •  Noticed that HEAD requests took several seconds for some lists •  Easy to reproduce in cassandra-cli: • get playlist_head[utf8('spotify:user...')]; •  1-15 seconds latency; should be < 0.1 s •  Copy SSTables to development machine for investigation •  Cassandra tool sstabletojson showed that the row contained 600 000 tombstones!
  • 53. #Cassandra13 Tombstone hell •  WAT‽ •  Data is in the column name •  Used to detect forks
  • 54. #Cassandra13 Tombstone hell •  We expected tombstones would be deleted after 30 days •  Nope, all tombstones since 1.5 years ago were there •  Revelation: Rows existing in 4+ SSTables never have tombstones deleted during minor compactions •  Frequently updated lists exists in nearly all SSTables Solution: •  Major compaction (CF size cut in half)
  • 55. #Cassandra13 Zombie tombstones •  Ran major compaction manually on all nodes during a few days. •  All seemed well... •  But a week later, the same lists took several seconds again‽‽‽
  • 56. #Cassandra13 Repair vs major compactions A repair between the major compactions "resurrected" the tombstones :( New solution: •  Repairs during Monday-Friday •  Major compaction Saturday-Sunday A (by now) well-known Cassandra anti-pattern: Don't use Cassandra to store queues
  • 57. #Cassandra13 Cassandra counters •  There are lots of places in the Spotify UI where we count things •  # of followers of a playlist •  # of followers of an artist •  # of times a song has been played •  Cassandra has a feature called distributed counters that sounds suitable •  Is this awesome?
  • 58. #Cassandra13 Cassandra counters •  Yep •  They've actually worked pretty well for us.
  • 60. #Cassandra13 How not to fail •  Treat Cassandra as a utility belt •  Flash Lots of one-off solutions: •  Weekly major compactions •  Delete all sstables and recreate from scratch every day •  Memlock frequently used SSTables in RAM
  • 61. #Cassandra13 Lessons •  Cassandra read performance is heavily dependent on the temporal patterns of your writes •  Cassandra is initially snappy, but various write patterns make read performance slowly decrease •  Making benchmarks close to useless
  • 62. #Cassandra13 Lessons •  Avoid repeatedly writing data to the same row over very long spans of time •  Avoid deleting data •  If you're working at scale, you'll need to know how Cassandra works under the hood •  nodetool cfhistograms is your friend
  • 63. #Cassandra13 Lessons •  There are still various esoteric problems with large scale Cassandra installations •  Debugging them is really interesting •  If you agree with the above statements, you should totally come work with us