Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra.
There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler.
http://github.com/tjake/stormscraper
2. What is Storm?
•
Distributed event processor
•
Provides constructs to reliably process all events
•
Simple conceptual model
•
New to Apache Incubator:
http://wiki.apache.org/incubator/StormProposal
3. Storm Concepts
Spout - Collects work and submits it to be processed.
Tracks success or failure of each tuple.
…
Tuple - A collection of data that is passed within storm.
Bolt - Processes tuples and optionally emits more tuples.
Stream - Identifies outputs from a Spout/Bolt.
Forces tuples have some declared structure.
4. Storm Topologies
A directed graph of spouts and bolts connected via streams
A-F
G-P
Firehose
Zookeeper
Q-Z
Host A
Host B
Host C
Cassandra
(optional)
6. Where does data end up?
•
Storm supports built in RPC so client requests can
effectively become a spout.
!
•
Put the data into a database…
•
Why Cassandra though?
7. Why Cassandra?
•
Cassandra’s Data model allows incremental
modifications to rows.
•
Different bolts can update different parts of a
Cassandra row asynchronously.