SlideShare una empresa de Scribd logo
1 de 90
Descargar para leer sin conexión
A Practical Guide to Selecting a
Stream Processing Technology
Michael  G.  Noll
Product  Manager,  Confluent
Kafka Talk Series
Date Title
Sep 27 Introduction	
  To	
  Streaming	
  Data	
  and	
  Stream	
  Processing	
  with	
  Apache	
  Kafka
Oct	
  06 Deep	
  Dive	
  into	
  Apache	
  Kafka
Oct	
  27 Data	
  Integration	
  with	
  Apache	
  Kafka
Nov	
  17 Demystifying	
  Stream	
  Processing	
  with	
  Apache	
  Kafka
Dec	
  01 A	
  Practical	
  Guide	
  to	
  Selecting	
  a	
  Stream	
  Processing	
  Technology
Dec	
  15 Streaming	
  in	
  Practice:	
  Putting	
  Apache	
  Kafka	
  in	
  Production
https://www.confluent.io/apache-­‐kafka-­‐talk-­‐series
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Powered by Kafka (﴾thousands more)﴿
Spark Streaming API (﴾2.0)﴿
Kafka’s Streams API (﴾0.10)﴿
Example: Streams and Tables in Kafka
Word Count
hello 2
kafka 1
world 1
… …
Streams & Databases
• A  stream  processing  technology  must  have  first-class  
support  for Streams  and Tables
• With  scalability,  fault  tolerance,  …
• Why?  Because  most  use  cases  require  not  just  one,  but  both!
• Support  – or  lack  thereof  – strongly  impacts  the  resulting  
technical  architecture  and  development  efforts
• No  support  means:
• Painful  Do-It-Yourself
• Increased  complexity,  more  moving  pieces  to  juggle
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Organizational/Non-‐Tech Dimensions
• Can  your  org  understand  and  leverage  the  technology?
• Familiarity  with  languages;  intuitive  concepts  and  APIs;  trainings
• Are  you  permitted  to  use  it  in  your  organization?
• Security  features,  licensing,  open  source  vs.  proprietary
• Can  you  continue  to  use  it  in  the  future?
• Longevity  of  technology,  licensing,  vendor  strength
Organizational/Non-‐Tech Dimensions
• Do  you  believe  in  the  long-term  vision?
• Switching  technologies  in  an  organization  is  often  expensive/slow:  
legacy  migration,  re-training,  resistance  to  change,  etc.
• What  is  the  path  and  time  to  success?
• Can  you  move  smoothly  and  quickly  from  proof-of-concept  to  
production?
• Areas  and  range  of  applicability in  your  organization
• General-purpose  vs.  niche  technology
• Viable  for  S/M/L/XL  use  cases  vs.  for  XL  use  cases  only
• Building  core  business  apps  vs.  doing  backend  analytics
Organizational/Non-‐Tech Dimensions
Licensing Vision/Roadmap ROI
Impact	
  on
Organization
Broad	
  vs.	
  Niche
Applicability
Time	
  to	
  Market
Professional
Services
Documentation Examples User	
  CommunityLearning	
  Curve
Impact	
  on	
  Tools,
Infrastructure,	
  …
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
State
• Stateful  processing  of  any  kind  requires…state
• Many  (most?)  use  cases  for  stream  processing  are  stateful
• Joins,  aggregations,  windowing,  counting,  ...
• Is  state  performant?  Local  vs.  remote  state?
50
State
• Stateful  processing  of  any  kind  requires…state
• Many  (most?)  use  cases  for  stream  processing  are  stateful
• Joins,  aggregations,  windowing,  counting,  ...
• Is  state  performant?  Local  vs.  remote  state?
• Is  state  fault-tolerant?  How  fast  is  recovery/failover?
53
State
• Stateful  processing  of  any  kind  requires…state
• Many  (most?)  use  cases  for  stream  processing  are  stateful
• Joins,  aggregations,  windowing,  counting,  ...
• Is  state  performant?  Local  vs.  remote  state?
• Is  state  fault-tolerant?  How  fast  is  recovery/failover?
• Is  state  interactively  queryable?
• Kafka:  ready  for  use  (GA)
• Spark,  Flink:  under  development  (alpha)
• Storm,  Samza,  and  others:  not  available
55
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Abstractions
• What  are  the  data  model  and  the  available  abstractions?
• Most  common  abstraction:  stream of  records,  events
• Kafka,  Spark,  Storm,  Samza,  Flink,  Apex,  ...
• New,  very  powerful:  table  of  records
• Currently  unique  to  Kafka
• Represents  latest  state and  materialized  views
• State  must  have  a  first-class  abstraction  because,  as  we  just  saw  in  
the  previous  section,  state  is  crucial  for  stream  processing!
58
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Time model
• Different  use  cases  require  different  time  semantics
• Great  majority  of  use  cases  require  event-time semantics
• Other  use  cases  may  require  processing-time (e.g.  real-
time  monitoring)  or  special  variants  like  ingestion-time
• A  stream  processing  technology  should,  at  a  minimum,  
support  event-time  to  cover  most  use  cases  in  practice
• Examples:  Kafka,  Beam,  Flink
Time Model
61
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Windowing
• Windowing  is  an  operation  that  groups events
Windowing
Input	
  data,	
  where
colors	
  represent
different	
  users	
  events
Rectangles	
  denote
different	
  event-­‐time
windows
processing-­‐time
event-­‐time
windowing
alice
bob
dave
Windowing
• Windowing  is  an  operation  that  groups events
• Most  commonly  needed:  time  windows,  session  windows
• Examples:
• Real-time  monitoring:  5-minute  averages
• Reader  behavior  on  a  website:  user  browsing  sessions
Windowing
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Out-‐of-‐order and late-‐arriving data
• Is  very  common in  practice,  not  a  rare  corner  case
• Related  to  time  model  discussion
Out-‐of-‐order and late-‐arriving data
Users	
  with	
  mobile	
  phones	
  enter
airplane,	
  lose	
  Internet	
  connectivity
Emails	
  are	
  being	
  written
during	
  the	
  10h	
  flight
Internet	
  connectivity	
  is	
  restored,
phones	
  will	
  send	
  queued	
  emails	
  now
Out-‐of-‐order and late-‐arriving data
• Is  very  common in  practice,  not  a  rare  corner  case
• Related  to  time  model  discussion
• We  want  control over  how  out-of-order  data  is  handled
• Example:
• We  process  data  in  5-minute  windows,  e.g.  compute  statistics
• When  event  arrives  1  minute  late:  update the  original  result!
• When  event  arrives  2  hours  late:  discard it!
• Handling  must  be  efficient because  it  happens  so  often
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Reprocessing
• Re-process  data  by  rewinding  a  stream  back  in  time
• Use  cases  in  practice  include
• Correcting  output  data  after  fixing  a  bug
• Facilitate  iterative  and  explorative  development
• A/B  testing
• Processing  historical  data
• Walking  through  "What  If?"  scenarios
• Also:  often  used  behind-the-scenes  for  fault  tolerance
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Scalability, Elasticity, Fault Tolerance
• Can  the  technology  scale according  to  your  needs?
• Desired  latency,  throughput?
• Able  to  process  millions  of  messages  per  second?
• What  is  the  minimum  footprint?
• Expand/shrink  capacity  dynamically  during  operations?
• Helps  with  resource  utilization  because  most  stream  apps  run  continuously
• Resilience and  fault  tolerance
• Which  guarantees  for  data  delivery  and  for  state?  "At-least-once",  "exactly-
once",  "effectively-once",  etc.
• Failover  behavior  and  recovery  time?  Automated  or  manual?
• Any  negative  impact  of  fault  tolerance  features  on  performance?
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Security
• To  meet  internal  security  policies,  legal  compliance,  etc.
• Typical  base  requirements  for  stream  processing  applications:
• Encrypt  data-in-transit  (e.g.  from/to  Kafka)
• Authentication:  "only  some  applications  may  talk  to  production"
• Authorization:  "access  to  sensitive  data  such  as  PII  is  restricted”
• The  easier  it  is  to  use  security  features,  the  more  likely  they  are  
actually  being  used  in  practice
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Processing Model
• True  stream  processing  is  record-at-a-time processing
• Benefits  include  low  latency (millisecs),  dealing  efficiently  with  out-of-order  data
• Can  provide  both  latency  and  high  throughput  via  internal  optimizations
• Examples:  Kafka,  Storm,  Samza,  Flink,  Beam
• Some  processing  technologies  opt  for  (micro)batching
• Micro-batching  has  no  true  benefits:  consider  it  a  technical  workaround  to  
shoehorn  stream-like  functionality  into  a  tool
• Suffers  from  significant  overhead  when  dealing  with  e.g.  out-of-order/late-arriving  
data,  when  performing  windowed  analyses  (e.g.  session  windows)
• Typically  a  strong  blocker  for  use  cases  such  as  fraud  detection  or  anything  where  
"a  few  seconds"  of  latency  is  prohibitive
• Examples:  Spark,  Storm  (Trident),  Hadoop*
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
API
• Choice  of  API  is  a  subjective  matter  – skills,  preference,  …
• Typical  options
• Declarative,  expressive  API:  operations  like  map(),  filter()
• Imperative,  lower-level  API:  callbacks  like  process(event)
• Streaming  SQL:  STREAM  SELECT  …  FROM  …  WHERE  …  
• In  the  best  case  you  get  not  just  one,  but  all  three
• "Abstractions  are  great!"
• "Abstractions  considered  harmful!"
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Developer/Operations Lifecycle
• How  should  your  daily  work  look  and  feel  like?
• "I  like  to  do  quick,  iterative  development"  (modify/test/repeat)
• "I  want  to  decouple  team  roadmaps,  project  schedules"
• Big  difference  between  App  Model  <->  Cluster  Model
• Testing,  packaging,  deployment,  monitoring,  operations
• "Do  I  need  to  know  Java  (app)  or  YARN  (cluster)  for  this?”
• "I  want  reactive  processing  in  containers  that  run  on  Mesos!"
• Rolling,  no-downtime  upgrades?
• Integration  with  existing  Ops  infra,  tools,  processes?
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Summary
• What  we  covered  is  a  good  starting  point
• But,  no  free  lunch!
• Understand  what  you  need,  and  weigh  criteria  appropriately
• Think  end-to-end:  idea,  development,  operations,  troubleshooting
• Think  big-picture:  future  use  cases,  architecture,  security,  training,  …
• Do  your  own  internal  hackathons,  proof-of-concepts
• Do  your  own  benchmarks
• If  in  doubt:  simplicity  beats  complexity
• Faster  to  learn,  easier  to  understand,  less  likely  to  fail,  …
Q&A Session
89
Coming Up Next
Date Title Speaker
Dec	
  15 Streaming in Practice: Putting Apache
Kafka in Production
Roger Hoover
https://www.confluent.io/apache-­‐kafka-­‐talk-­‐series

Más contenido relacionado

La actualidad más candente

Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
Event Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuEvent Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuHeroku
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Surviveconfluent
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applicationsconfluent
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...confluent
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 confluent
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
How to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALABHow to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALABHostedbyConfluent
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center confluent
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...Thomas Alex
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windowsconfluent
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streamingconfluent
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterconfluent
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupGwen (Chen) Shapira
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 

La actualidad más candente (20)

Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Event Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuEvent Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on Heroku
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
How to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALABHow to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALAB
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windows
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 

Destacado

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
Demystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache KafkaDemystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache Kafkaconfluent
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Servicesconfluent
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...confluent
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafkaconfluent
 
Data Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache KafkaData Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache Kafkaconfluent
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...confluent
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan confluent
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafkaconfluent
 
Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connectconfluent
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...confluent
 
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...confluent
 
Confluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern AnalyticsConfluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern Analyticsconfluent
 

Destacado (20)

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Demystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache KafkaDemystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache Kafka
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Services
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafka
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Data Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache KafkaData Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache Kafka
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafka
 
Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connect
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
 
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
 
Confluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern AnalyticsConfluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern Analytics
 

Similar a A Practical Guide to Selecting a Stream Processing Technology

Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018Rohan Rasane
 
Introduction to the Typesafe Reactive Platform
Introduction to the Typesafe Reactive PlatformIntroduction to the Typesafe Reactive Platform
Introduction to the Typesafe Reactive PlatformBoldRadius Solutions
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Lucas Jellema
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Pavel Hardak
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Eren Avşaroğulları
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsClaudiu Barbura
 
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...Lucas Jellema
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black Friday6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black FridayAli Hodroj
 
Ultra-scale e-Commerce Transaction Services with Lean Middleware
Ultra-scale e-Commerce Transaction Services with Lean Middleware Ultra-scale e-Commerce Transaction Services with Lean Middleware
Ultra-scale e-Commerce Transaction Services with Lean Middleware WSO2
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Oracle Forms Modernization Roadmap
Oracle Forms Modernization RoadmapOracle Forms Modernization Roadmap
Oracle Forms Modernization RoadmapKai-Uwe Möller
 
Oracle Sistemas Convergentes
Oracle Sistemas ConvergentesOracle Sistemas Convergentes
Oracle Sistemas ConvergentesFran Navarro
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!Richard Robinson
 
Top Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.comTop Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.comPawan Sharma
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesNick Pentreath
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowGoDataDriven
 

Similar a A Practical Guide to Selecting a Stream Processing Technology (20)

Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018
 
Introduction to the Typesafe Reactive Platform
Introduction to the Typesafe Reactive PlatformIntroduction to the Typesafe Reactive Platform
Introduction to the Typesafe Reactive Platform
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black Friday6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black Friday
 
Ultra-scale e-Commerce Transaction Services with Lean Middleware
Ultra-scale e-Commerce Transaction Services with Lean Middleware Ultra-scale e-Commerce Transaction Services with Lean Middleware
Ultra-scale e-Commerce Transaction Services with Lean Middleware
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Oracle Forms Modernization Roadmap
Oracle Forms Modernization RoadmapOracle Forms Modernization Roadmap
Oracle Forms Modernization Roadmap
 
Oracle Sistemas Convergentes
Oracle Sistemas ConvergentesOracle Sistemas Convergentes
Oracle Sistemas Convergentes
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
Top Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.comTop Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.com
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 

Más de confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Más de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 

Último (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 

A Practical Guide to Selecting a Stream Processing Technology

  • 1. A Practical Guide to Selecting a Stream Processing Technology Michael  G.  Noll Product  Manager,  Confluent
  • 2. Kafka Talk Series Date Title Sep 27 Introduction  To  Streaming  Data  and  Stream  Processing  with  Apache  Kafka Oct  06 Deep  Dive  into  Apache  Kafka Oct  27 Data  Integration  with  Apache  Kafka Nov  17 Demystifying  Stream  Processing  with  Apache  Kafka Dec  01 A  Practical  Guide  to  Selecting  a  Stream  Processing  Technology Dec  15 Streaming  in  Practice:  Putting  Apache  Kafka  in  Production https://www.confluent.io/apache-­‐kafka-­‐talk-­‐series
  • 3. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 4. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Powered by Kafka (﴾thousands more)﴿
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Spark Streaming API (﴾2.0)﴿
  • 21. Kafka’s Streams API (﴾0.10)﴿
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37. Example: Streams and Tables in Kafka Word Count hello 2 kafka 1 world 1 … …
  • 38.
  • 39.
  • 40.
  • 41.
  • 42. Streams & Databases • A  stream  processing  technology  must  have  first-class   support  for Streams  and Tables • With  scalability,  fault  tolerance,  … • Why?  Because  most  use  cases  require  not  just  one,  but  both! • Support  – or  lack  thereof  – strongly  impacts  the  resulting   technical  architecture  and  development  efforts • No  support  means: • Painful  Do-It-Yourself • Increased  complexity,  more  moving  pieces  to  juggle
  • 43. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 44. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 45. Organizational/Non-‐Tech Dimensions • Can  your  org  understand  and  leverage  the  technology? • Familiarity  with  languages;  intuitive  concepts  and  APIs;  trainings • Are  you  permitted  to  use  it  in  your  organization? • Security  features,  licensing,  open  source  vs.  proprietary • Can  you  continue  to  use  it  in  the  future? • Longevity  of  technology,  licensing,  vendor  strength
  • 46. Organizational/Non-‐Tech Dimensions • Do  you  believe  in  the  long-term  vision? • Switching  technologies  in  an  organization  is  often  expensive/slow:   legacy  migration,  re-training,  resistance  to  change,  etc. • What  is  the  path  and  time  to  success? • Can  you  move  smoothly  and  quickly  from  proof-of-concept  to   production? • Areas  and  range  of  applicability in  your  organization • General-purpose  vs.  niche  technology • Viable  for  S/M/L/XL  use  cases  vs.  for  XL  use  cases  only • Building  core  business  apps  vs.  doing  backend  analytics
  • 47. Organizational/Non-‐Tech Dimensions Licensing Vision/Roadmap ROI Impact  on Organization Broad  vs.  Niche Applicability Time  to  Market Professional Services Documentation Examples User  CommunityLearning  Curve Impact  on  Tools, Infrastructure,  …
  • 48. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 49. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 50. State • Stateful  processing  of  any  kind  requires…state • Many  (most?)  use  cases  for  stream  processing  are  stateful • Joins,  aggregations,  windowing,  counting,  ... • Is  state  performant?  Local  vs.  remote  state? 50
  • 51.
  • 52.
  • 53. State • Stateful  processing  of  any  kind  requires…state • Many  (most?)  use  cases  for  stream  processing  are  stateful • Joins,  aggregations,  windowing,  counting,  ... • Is  state  performant?  Local  vs.  remote  state? • Is  state  fault-tolerant?  How  fast  is  recovery/failover? 53
  • 54.
  • 55. State • Stateful  processing  of  any  kind  requires…state • Many  (most?)  use  cases  for  stream  processing  are  stateful • Joins,  aggregations,  windowing,  counting,  ... • Is  state  performant?  Local  vs.  remote  state? • Is  state  fault-tolerant?  How  fast  is  recovery/failover? • Is  state  interactively  queryable? • Kafka:  ready  for  use  (GA) • Spark,  Flink:  under  development  (alpha) • Storm,  Samza,  and  others:  not  available 55
  • 56.
  • 57. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 58. Abstractions • What  are  the  data  model  and  the  available  abstractions? • Most  common  abstraction:  stream of  records,  events • Kafka,  Spark,  Storm,  Samza,  Flink,  Apex,  ... • New,  very  powerful:  table  of  records • Currently  unique  to  Kafka • Represents  latest  state and  materialized  views • State  must  have  a  first-class  abstraction  because,  as  we  just  saw  in   the  previous  section,  state  is  crucial  for  stream  processing! 58
  • 59. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 60. Time model • Different  use  cases  require  different  time  semantics • Great  majority  of  use  cases  require  event-time semantics • Other  use  cases  may  require  processing-time (e.g.  real- time  monitoring)  or  special  variants  like  ingestion-time • A  stream  processing  technology  should,  at  a  minimum,   support  event-time  to  cover  most  use  cases  in  practice • Examples:  Kafka,  Beam,  Flink
  • 62. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 63. Windowing • Windowing  is  an  operation  that  groups events
  • 64. Windowing Input  data,  where colors  represent different  users  events Rectangles  denote different  event-­‐time windows processing-­‐time event-­‐time windowing alice bob dave
  • 65. Windowing • Windowing  is  an  operation  that  groups events • Most  commonly  needed:  time  windows,  session  windows • Examples: • Real-time  monitoring:  5-minute  averages • Reader  behavior  on  a  website:  user  browsing  sessions
  • 67. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 68. Out-‐of-‐order and late-‐arriving data • Is  very  common in  practice,  not  a  rare  corner  case • Related  to  time  model  discussion
  • 69. Out-‐of-‐order and late-‐arriving data Users  with  mobile  phones  enter airplane,  lose  Internet  connectivity Emails  are  being  written during  the  10h  flight Internet  connectivity  is  restored, phones  will  send  queued  emails  now
  • 70. Out-‐of-‐order and late-‐arriving data • Is  very  common in  practice,  not  a  rare  corner  case • Related  to  time  model  discussion • We  want  control over  how  out-of-order  data  is  handled • Example: • We  process  data  in  5-minute  windows,  e.g.  compute  statistics • When  event  arrives  1  minute  late:  update the  original  result! • When  event  arrives  2  hours  late:  discard it! • Handling  must  be  efficient because  it  happens  so  often
  • 71. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 72. Reprocessing • Re-process  data  by  rewinding  a  stream  back  in  time • Use  cases  in  practice  include • Correcting  output  data  after  fixing  a  bug • Facilitate  iterative  and  explorative  development • A/B  testing • Processing  historical  data • Walking  through  "What  If?"  scenarios • Also:  often  used  behind-the-scenes  for  fault  tolerance
  • 73.
  • 74. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 75. Scalability, Elasticity, Fault Tolerance • Can  the  technology  scale according  to  your  needs? • Desired  latency,  throughput? • Able  to  process  millions  of  messages  per  second? • What  is  the  minimum  footprint? • Expand/shrink  capacity  dynamically  during  operations? • Helps  with  resource  utilization  because  most  stream  apps  run  continuously • Resilience and  fault  tolerance • Which  guarantees  for  data  delivery  and  for  state?  "At-least-once",  "exactly- once",  "effectively-once",  etc. • Failover  behavior  and  recovery  time?  Automated  or  manual? • Any  negative  impact  of  fault  tolerance  features  on  performance?
  • 76.
  • 77.
  • 78.
  • 79. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 80. Security • To  meet  internal  security  policies,  legal  compliance,  etc. • Typical  base  requirements  for  stream  processing  applications: • Encrypt  data-in-transit  (e.g.  from/to  Kafka) • Authentication:  "only  some  applications  may  talk  to  production" • Authorization:  "access  to  sensitive  data  such  as  PII  is  restricted” • The  easier  it  is  to  use  security  features,  the  more  likely  they  are   actually  being  used  in  practice
  • 81. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 82. Processing Model • True  stream  processing  is  record-at-a-time processing • Benefits  include  low  latency (millisecs),  dealing  efficiently  with  out-of-order  data • Can  provide  both  latency  and  high  throughput  via  internal  optimizations • Examples:  Kafka,  Storm,  Samza,  Flink,  Beam • Some  processing  technologies  opt  for  (micro)batching • Micro-batching  has  no  true  benefits:  consider  it  a  technical  workaround  to   shoehorn  stream-like  functionality  into  a  tool • Suffers  from  significant  overhead  when  dealing  with  e.g.  out-of-order/late-arriving   data,  when  performing  windowed  analyses  (e.g.  session  windows) • Typically  a  strong  blocker  for  use  cases  such  as  fraud  detection  or  anything  where   "a  few  seconds"  of  latency  is  prohibitive • Examples:  Spark,  Storm  (Trident),  Hadoop*
  • 83. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 84. API • Choice  of  API  is  a  subjective  matter  – skills,  preference,  … • Typical  options • Declarative,  expressive  API:  operations  like  map(),  filter() • Imperative,  lower-level  API:  callbacks  like  process(event) • Streaming  SQL:  STREAM  SELECT  …  FROM  …  WHERE  …   • In  the  best  case  you  get  not  just  one,  but  all  three • "Abstractions  are  great!" • "Abstractions  considered  harmful!"
  • 85. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 86. Developer/Operations Lifecycle • How  should  your  daily  work  look  and  feel  like? • "I  like  to  do  quick,  iterative  development"  (modify/test/repeat) • "I  want  to  decouple  team  roadmaps,  project  schedules" • Big  difference  between  App  Model  <->  Cluster  Model • Testing,  packaging,  deployment,  monitoring,  operations • "Do  I  need  to  know  Java  (app)  or  YARN  (cluster)  for  this?” • "I  want  reactive  processing  in  containers  that  run  on  Mesos!" • Rolling,  no-downtime  upgrades? • Integration  with  existing  Ops  infra,  tools,  processes?
  • 87. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 88. Summary • What  we  covered  is  a  good  starting  point • But,  no  free  lunch! • Understand  what  you  need,  and  weigh  criteria  appropriately • Think  end-to-end:  idea,  development,  operations,  troubleshooting • Think  big-picture:  future  use  cases,  architecture,  security,  training,  … • Do  your  own  internal  hackathons,  proof-of-concepts • Do  your  own  benchmarks • If  in  doubt:  simplicity  beats  complexity • Faster  to  learn,  easier  to  understand,  less  likely  to  fail,  …
  • 90. Coming Up Next Date Title Speaker Dec  15 Streaming in Practice: Putting Apache Kafka in Production Roger Hoover https://www.confluent.io/apache-­‐kafka-­‐talk-­‐series