2. @kimutansk
Self Introduction
• Kimura, Sotaro(@kimutansk)
– Data Engineer at DWANGO Co., Ltd.
• Maintenance of Data Analytics Infrastructure
• Development for ETL pipeline
• Construct Data Mart
• And things to do with Data Analytics Infrastructure
– Favorite technology field
• Stream Processing technologies
• Distributed computing systems
– Favorite oss products
• Apache Kafka
• Apache Beam
• Apache NiFi
木村宗太郎(@kimutansk)です。
ドワンゴでデータエンジニアをやっています。
3. @kimutansk
In the beginning
• In this presentation, I use “Stream Processing”
instead of “Stream Data Processing”.
– “Stream Processing” is more used in related articles.
表題は「ストリームデータ処理入門」ですが、
本資料においては、「ストリーム処理」として記述します。
4. @kimutansk
Agenda
• What is Stream Processing?
• Data processing patterns
• Trouble of Stream Processing
• Stream Processing system structure and products
• Technical consideration points
• Real Stream Processing performance problems
• Stream Processing system misconceptions
はじめにストリーム処理とは何かから説明し、
実現するためのプロダクト、検討ポイントを説明します。
6. @kimutansk
In a nutshell
• Model of data processing designed for continuously
generated unbounded data sets.
– More detail...
一言でいうと、
「無限のデータを処理するよう設計されたデータ処理モデル」
7. @kimutansk
Stream Processing properties
• Unbounded data
– Ever-growing, essentially infinite data set.
– These are often referred to as “streaming data”.
• Ex)System logs, sensor data, activity logs, etc...
• Unbounded processing
– Data is unbounded, so processing is also unbounded.
– For distinction between batch processing, “unbounded”.
• Low latency, approximate, speculative results
– Because of Stream Processing problems,
system often output approximate, speculative results.
– Batch processing traditionally designed for high latency,
complete results.
「無限のデータを処理」「無限に処理が継続」
「低レイテンシ、しばしば近似値・不定期な出力」
8. @kimutansk
Usage of Stream Processing
• Billing
– Ex) Cloud service billing. Mobile communication billing.
• Live cost estimating
– Ex) Cloud service usage cost. Mobile data usage.
• Anomaly/Change event detection
– Ex) Injustice login. System failure. Recommendation.
Weather data anomaly detection.
• Detection backfill
– Ex) System failure recovery(After notification).
Weather data anomaly progress notification.
ストリーム処理の用途「課金処理」「ライブコスト見積り」
「異常・変化イベント検知」「検知結果復旧」
10. @kimutansk
Big data processing patterns
• Typical big data processing pattern list is below.
ビッグデータの処理モデルとして、3モデルが言われる。
「バッチ処理」「対話型クエリ」「ストリーム処理」
Batch Processing Interactive Query Stream Processing
Execute timing Manual execute
Periodical execute
Manual execute
Periodical execute
Continuous execute
Processing target Archived data Archived data Unbound generating
stream data
Processing time Minutes ~ Hours Seconds ~ Minutes Permanence
Data size TBs~PBs GBs~TBs Bs~KBs(Per 1 event)
Latency Minutes ~ Hours Seconds ~ Minutes Milliseconds ~ seconds
Typical
applications
ETL
Reporting
Generate ML model
Business intelligence
Analytics
Anomaly detection
Recommend
Visualize
OSS products MapReduce
Spark
Impala, Presto, Drill
Hive
(Described later)
11. @kimutansk
Batch Processing
• Process “archived data”, output to data store.
バッチ処理:データストアに蓄積したデータを一括変換し、
結果出力を行う処理モデル
Processed data destination
= data store.
12. @kimutansk
Interactive query
• Process “archived data”, get results from client.
対話的クエリ:データストアに蓄積したデータを一括変換し、
結果をクライアントで取得する処理モデル
Processed data destination
= client.
14. @kimutansk
Batch Processing Interactive Query Stream Processing
Execute timing Manual execute
Periodical execute
Manual execute
Periodical execute
Continuous execute
Processing target Archived data Archived data Unbound generating
stream data
Processing time Minutes ~ Hours Seconds ~ Minutes Permanence
Data size TBs~PBs GBs~TBs Bs~KBs(Per 1 event)
Latency Minutes ~ Hours Seconds ~ Minutes Milliseconds ~ seconds
Typical
applications
ETL
Reporting
Generate ML model
Business intelligence
Analytics
Anomaly detection
Recommend
Visualize
OSS products MapReduce
Spark
Impala, Presto, Drill
Hive
(Described later)
Difference between Batch vs Stream
• Difference is “Input data is completed” or not.
バッチ処理(対話型クエリ)と、ストリーム処理の違いは、
「入力データが完全に揃っているか否か」
Input data is completed ! Input data is streaming !
15. @kimutansk
Batch Processing’s premise
• Batch Processing’s premise.
– When Batch Processing executes,
data completeness is needed.
• Target data are needed bounded.
– Basically, outputs over several Batches are difficult.
• Basic Batch Processing model
バッチ処理の前提として、「実行時データが揃っていること」
「バッチを跨いだ結果出力は困難」がある。
MapReduce
17. @kimutansk
Batch Processing problems
• When user session processing job,
Batch Processing is not well adapted.
– If the user session continues over 2/27・2/28,
needed re-output 2/27 result.
– If more continues,... really?
ユーザのセッションを算出したい場合、
日跨ぎセッションが区切られる。過去を読んで再出力は難しい。
MapReduce
2/282/27 2/282/27
Red
Yellow
Green
18. @kimutansk
Batch Processing is...?
• Multiple outputs pattern is transformable below.
– This means that..
「複数回バッチを実行」の図はこのスライドのように
変形することができる。すなわち・・・?
2/26 2/27 2/28
2/282/272/26
Map
Reduce
Map
Reduce
Map
Reduce
19. @kimutansk
Batch is subset of Stream!
• Multiple outputs pattern is bounded
“unbounded stream data” by interval
これは、すなわち無限のデータであるストリームデータを
一定時間ごとに区切ったものに他ならない。
Bounded finite stream
Bounded by interval
2/26 2/27 2/28
Unbounded infinite stream
20. @kimutansk
Batch is subset of Stream!
• This means that, Batch Processing is
subset pattern of Stream Processing
つまり、バッチ処理とは、ストリーム処理の中の
限定的な処理のモデルであるということ。
Bounded finite stream
Bounded by interval
2/26 2/27 2/28
Unbounded infinite stream
Stream
Processing
Batch
Processing
21. @kimutansk
Batch premise does not hold.
• In Stream Processing
• Batch premise : Data completeness
– Continuously generating data.
• Batch premise : Difficulty for over Batch data
– Continuously processing, different from the premise.
• If data completeness is satisfied,
Stream Processing can process same processing by
Batch Processing.
バッチ処理の前提はストリーム処理では成り立たない。
だが、データが揃うならバッチ処理と同じことは出来る。
However...
23. @kimutansk
New problem in Stream Processing
• Data ingest order is different from actual order!
– That called “Out of order”
– Ex.) The phones configured airplane mode for entire flight.
• Cause example
– Network disconnect/delay.
– Time gap between servers that constitute the system.
• So, typically two domains of time within systems.
– EventTime , which is the time when events actually
occurred.
– ProcessingTime, which is the time when events are
ingested by the system.
ストリーム処理ではデータは発生した順に到着しない。
そのため、「EventTime」「ProcessingTime」の時刻概念が存在。
24. @kimutansk
Why are there in trouble?
• If no relation between ingested data,
“Out of order” is not problem.
– Except, processing result is approximate.
• But, in the real Stream Processing system,
data grouping method are needed.
– Ex)In anomaly event detection,
few cases are able to be detected by one event.
“Many logins occur in a short time” etc..
Relations between before and after are needed.
データが発生した順に到着しなくても、関連が無ければ困らない。
だが、実際は「不正ログイン検知」等でデータの関連が必要。
25. @kimutansk
Data grouping concept
• Window, which is data grouping concept.
データのグルーピングの概念としてWindowがある。
「固定長」「スライディング」「セッション」が代表的。
Tumbling
Window
Time
Sliding
Window
Sesson
Window
26. @kimutansk
Out of order trouble
• With window, “Out of order” becomes trouble.
– If after [00:00 ~ 06:00] result output,
“05:55” data arrived...?
Windowを使うと「データが順番に到着しない」が問題になる。
結果出力後に該当時間帯のデータが到着したらどうする?
[00:00 ~ 06:00)
1. Output2. Arrive...?
27. @kimutansk
Countermeasure for trouble
• For Stream Processing trouble,
three countermeasures are proposed.
• Watermark
– What time ingested data are completed in EventTime
domain.
• Trigger
– When output aggregate results.
• Accumulation
– How do refinements of results relate?
この問題に対して、「Watermark」「Trigger」「Accumulation」
という対処の概念が挙げられている。
28. @kimutansk
What is watermark?
• Watermark is the notion of What time
is process completed in EventTime domain.
– EventTime and ProcessingTime exist individual,
there are skews between EventTime and ProcessingTime.
EventTimeベースでどこまで処理したかを示す概念。
実際の処理時刻と、どこまで処理したかがずれるため必要。
Event Time
ProcessingTime
Ideal System
Real System
(≒Watermark)
Skew
29. @kimutansk
What is watermark?
• Usage of Watermark
– If watermark time is “X”,
events which EventTime is before “X” are all processed.
• But...
– Watermark can not be complete.
– Data arrival is out of order,
so watermark is only an approximation.
• However, watermark is useful.
– Watermark can be the indication of processing timing.
ただ、注意すべきはWatermarkも近似であり、完全ではない。
しかし、処理をするタイミングの基準として有用。
30. @kimutansk
What is trigger?
• Trigger is mechanism when aggregated results
should be output.
– With trigger, output timings can be flexible and multiple.
– In addition, system can adapt late data over watermark.
• Example
– When watermark arrives end of the window, output results.
Triggerはいつ集計結果を出力するかを定義する機構。
Triggerによって、出力タイミングを柔軟に、複数回定義可能。
PCollection<KV<String, Integer>> wordCountResult =
wordOneCountStream.apply("Fixed Windowing",
Window.<KV<String, Integer>>into(
FixedWindows.of(Duration.standardMinutes(2)))
.triggering(Repeatedly.forever(AfterWatermark.pastEndOfWindow())))
.apply(Sum.integersPerKey());
31. @kimutansk
What is trigger?
• Example
– When late data over watermark, output results.
Triggerの導入によって、Watermarkより遅れたデータが
到着した場合でも、ハンドリングが可能となる。
PCollection<KV<String, Integer>> wordCountResult =
wordOneCountStream.apply("Late Firing",
Window.<KV<String, Integer>>into(
FixedWindows.of(Duration.standardMinutes(2)))
.triggering(AfterWatermark.pastEndOfWindow()
.withLateFirings(
AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(1)))))
.apply(Sum.integersPerKey());
32. @kimutansk
What is trigger?
• Example
– When late data over watermark, output results.
– Target the limit of latency is 5 minute.
Triggerの導入によって、Watermarkよりどれだけ
データが遅れたら、以後のデータを破棄するかも指定可能。
PCollection<KV<String, Integer>> wordCountResult =
wordOneCountStream.apply("Late Firing until 5 min late",
Window.<KV<String, Integer>>into(
FixedWindows.of(Duration.standardMinutes(2)))
.triggering(AfterWatermark.pastEndOfWindow()
.withLateFirings(
AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(1))))
.withAllowedLateness(Duration.standardMinutes(5))
.accumulatingFiredPanes())
.apply(Sum.integersPerKey());
33. @kimutansk
What is accumulation?
• Accumulation is refinement mode define of multiple
results.
– Triggers are used to produce multiple outputs for a window.
– So, needs to decide how do refinements of results relate.
– “How” depends on the target system.
• The system which has self accumulation functions.
• The system which depends on Key-Value datastore.
• The system consists of multiple components which
persistence methods are different.
Accumulationとは、Triggerで結果が複数回出力される場合に
どう扱うかの方式。システムによって何がいいかは異なる。
34. @kimutansk
What is accumulation?
• Typical three accumulation modes
– Discarding mode
• When aggregated result output, discard previous result.
• Next result contains only data which after previous output.
– Accumulating mode
• When aggregated result output, hold previous result.
• Next result accumulates data which after previous output.
– Accumulating & Retracting mode
• When aggregated result output, hold previous result.
• Next result contains accumulation result and retraction of
previous result.
代表的な方式として、「Discard(出力毎の蓄積)」
「Accumulating(結果を累算)」「Retracting(累算と差分)」
35. @kimutansk
What is accumulation?
• Individual mode output example
– Aggregation [12:00~12:02) result
• Arrive Data
実際のモードごとの例を示す。
[12:00~12:02)の集計結果を各モードで出力すると・・・
No Processing Time Event Time Event Value
1 12:05 12:01 7
2 12:06 12:01 9
3 12:06 12:00 3
4 12:07 12:01 4
36. @kimutansk
What is accumulation?
• Individual mode output example
– Aggregation [12:00~12:02) result
• Output Data
出力タイミング毎、最終値、合計値は表のようになる。
Accumulating & Retractは複数のシステムが混在しても対応可能。
Output Timing Discard Accumulating Accumulating & Retract
12:05 7 7 7
12:06 12 19 19,-7
12:07 4 23 23,-19
Final Output 4 23 23
Total Output 23 49 23
Final & Total output are same
37. @kimutansk
With countermeasures, perfect?
• With Watermark & Trigger & Accumulation,
Stream Processing adapt any case?
• ...No!
• With these countermeasures, these are still
problems.
– How long lag are between watermark between
ProcessingTime?
• If the longer lag are, completeness is higher, but also latency
is higher.
– For accumulation, how long time hold intermediate data?
• If the longer holding time are, system resource requirement is
higher.
これらの概念を導入しても問題は全部解決するわけではない。
Watermarkの決め方やどれだけ遅れを許容するか等は決まらない。
38. @kimutansk
Data Processing Systems trade-off
• It is said that data processing systems(contained
Batch and Stream) has 3 elements trade-off.
– Completeness
– Low Latency
– Low Cost
• All of 3 elements can not be achieved.
– Any data processing systems consist from 3 elements
balance.
– From 3 elements balance, the real solutions are decided.
• “Cost” contains not only system resource but also data
transfer and communication path.
• For real examples...
データ処理システムには「完全性」「低遅延」「低コスト」の
トレードオフがあり、そこから先の問題への落とし所が決まる。
39. @kimutansk
Trade-off example
• Billing system
– An important element is completeness!
– Some latency or cost are acceptable.
例えば課金処理であれば、
完全性重視で、遅延やコストが発生してもある程度は許容範囲。
Important
Not Important
Completeness Low Latency Low Cost
40. @kimutansk
Trade-off example
• Anomaly / Change detection system
– Most important element is low latency!
– Other elements priority are low.
例えば異常検知であれば、
低遅延が最優先で、他の要素の優先度は下がる。
Important
Not Important
Completeness Low Latency Low Cost
43. @kimutansk
Each system element detail
メッセージバスでデータをバッファリングし、
ストリーム処理エンジンで処理するのが基本の構成。
• Message bus
– At unbounded data, sometimes data flow rate is very high.
– When trouble occurs, data need to be reload.
– So, data are temporarily buffered.
– Ex) Kafka, Kinesis, Cloud PubSub etc
• Stream Processing Engine
– Processing engine which get data and process.
– Continuously executed, high availability are needed.
– Products are later.
• Output systems
– The system which use Stream Processing system output.
– Depends on use case.
44. @kimutansk
Stream Processing Engine genealogy
ストリーム処理エンジンは図のようにいくつかの
カテゴリに分類することができる。
DSL
With Dataflow
Design UI
Pure Stream Processing
Managed Service
Time to release
Streams
45. @kimutansk
Stream Processing Engine genealogy
「純ストリーム処理エンジン」「UIで処理を定義可能」
「DSL」「マネージドサービス」といったカテゴリが存在。
• Pure Stream Processing
– Basic Stream Processing Engine.
– With specific functions other Stream Processing Engine.
• With Dataflow Design UI
– The product which have Dataflow Design UI.
– User can design the Stream Processing easily.
• DSL
– Once write code, run on multiple Stream Processing Engine.
– DSL generates abstract Dataflow define.
• Managed Service
– Execution environment are managed on Public Cloud.
46. @kimutansk
Product Introduction:Storm
実質的に広く使用された初のOSSストリーム処理エンジン。
問題も多かったが、以後のプロダクトに大きく影響を与えている。
• At 2011, open sourced by Twitter
– Developed with Clojure.
• For deep dive, required Clojure Skill.
– Practically, first wide used OSS Stream Processing Engine.
• “At least once” semantics support at initial version.
– Because of initial product, Storm had many problem.
• Latency is very low, but throughput is low.
• No BackPressure.
• Default process placement is inefficiency.
• Message ack function are executed per message.
• At current version, most of problems are solved.
– Storm influenced after many Stream Processing products.
47. @kimutansk
Product Introduction:Spark Streaming
バッチ処理エンジン上でマイクロバッチとして実行。
Sparkエコシステム、開発手法を使えるのが大きい。
• At 2013, open sourced by amplab.
– Developed with Scala.
– On Batch Processing Framework Spark,
pseudo realized by sequential executed mini batched.
• Called “Micro Batch”
– Throughput is high, but latency is also high.
• Compare with Storm at the time.
• Compare with Flink or Apex....?
– The big advantage is executed on Spark Ecosystems.
• Can use Spark components.
Spark SQL, Spark MLlib, etc...
• Also development method.
48. @kimutansk
Product Introduction:NiFi
画面上でデータフローを定義し、ストリーム処理を構築可能。
データ管理機能も優れるが、構成管理には課題がある。
• At 2014, open sourced by NSA.
– Developed with Java.
– Design dataflow by UI, then user can deploy NiFi cluster,
executed on the cluster.
• Ex) Get from Kafka > Enrichment > Put to HDFS
– Between each components, NiFi has message queue,
NiFi can set priority, QoS setting for each queue.
– NiFi traces each data’s source, modify history.
• Useful for data management.
– But, dataflow management by code is difficult.
50. @kimutansk
Product Introduction:Flink
バッチストリームの両方に対応したデータ処理エンジン。
独自のスナップショット方式と、多彩なAPIを提供している。
• At 2014, open sourced.
– At 2011, named Stratosphere.
– Developed with Scala.
– Data Processing Engine produces Batch and Stream both
api.
– For fault tolerance, use “Distributed Snapshot” method.
• Lightweight Asynchronous Snapshots for Distributed Dataflows
• Flink can get snapshot asynchronous and efficient.
– There are 3 type apis for develop Flink application.
• High level api
• Low level api
• Table api(SQL like)
51. @kimutansk
Product Introduction:Apex
耐障害性重視のストリーム処理基盤。
状態管理、実行最適化、オートスケールなどの機能が充実。
• At 2015, open sourced by DataTorrent.
– Developed with Java.
– Originally, used at financial application.
• Focuses fault tolerance.
• Problem traceability at production environment.
• There are message buffers between each operators,
so when failure occurred, influences are limited.
– Both state management and optimization for runtime
environment are considered.
• Apex uses HDFS like KVS, so low latency and fault tolerant.
• YARN native application.
– Auto Scaling during runtime.
52. @kimutansk
Product Introduction:Gearpump
GoogleのMillWheelを参考に開発されたActorベースプロダクト。
性能・拡張性に優れるがエキスパート向き。
• At 2015, open sourced by Intel.
– Developed with Scala.
– Developed reference to MillWheel design.
• Google’s Stream Processing paper.
• MillWheel Fault-Tolerant Stream Processing at Internet Scale
– Lean Stream Processing engine, with high extensibility.
• But, needs state management code by application developer.
• Performance is high, but development cost is also high.
– Based on “Reactive Streams”, with standardized Back
Pressure functions.
– By akka streams like syntax, user can develop intuitively
dataflow graph.
53. @kimutansk
Product Introduction:Kafka Streams
Kafkaと組み合わせてストリーム処理を構築するコンポーネント
Stream/Tableを統合するコンセプトを持ち、シンプルだが機能充実
• At 2016, produced by confluent.
– Developed with Java.
– Component of Kafka.
– Library for implement Stream Processing application.
• Kafka streams does not include process clustering, or high
availability feature.
• These elements depend on user.
– Practically, designated for Kafka.
• Input source, output destination are kafka.
– Key concept is Streams and Tables.
• Close relationship between Streams and Tables.
– Simple, but function are pretty powerful.
• Dual API(declarative, imperative), Queriable State, Windowing.
54. @kimutansk
Product Introduction:Beam
統一的なストリーム/バッチ処理モデルを提供するDSL
様々な環境で実行可能だが、プロダクト固有機能は使用できない。
• At 2016, open sourced by Google.
– Developed with Java.
– Unified Stream / Batch Processing for BigData.
• Beam provides Data Processing abstraction.
– Developed application with Beam,
can execute on multiple Stream Processing Engine.
• Local Executor
• Google Cloud Dataflow(Google Cloud Platform)
• Spark, Flink, Apex, Gearpump
– In exchange for high portability,
user can’t use each product specific libraries.
• Machine learning, Graph processing, etc...
• It needs execute separately these functions. (Ex.Tensowflow)
55. @kimutansk
Product Introduction:Cloud Dataflow
GCP上で提供されるストリームバッチ対応フルマネージドサービス
マネージドサービスのため、動的調整、最適化が強力。
• At 2015, produced by Google.
– Application which Developed by
Beam(Dataflow api at the time),
executed on Google Cloud Platform managed service.
– Compared with other managed Stream Processing service,
it can adapt for wide area application.
• Developed as Stream Processing Application.
– Because of managed service,
auto scaled, and optimized resource allocation.
56. @kimutansk
Product Introduction:KinesisAnalytics
AWS上で提供される、ストリームに継続クエリを実行するサービス
機能は限られるが、非常に手軽に使うことができる。
• At 2016, produced by Amazon.
– By SQL, user can apply continuous query
to Streaming data.
• The concept “Data in Motion”
– Each Kinesis Analytics application, input stream is one.
Output stream number is max 3.
• Input source and output target are limited Kinesis Family.
– Function is limited, but very easy to start.
– Has function distinguishing EventTime and ProcessingTime.
• So, use can use good windowing function.
– Auto scaling, but resource usage of application are difficult
to predict.
57. @kimutansk
Which product should use?
初めはFlinkかApexがバランスがよく無難、Gearpumpは上級者向。
他は状況や、実行環境、既存システムによって使い分ければいい。
• ※Just my opinion.
• Flink or Apex : For first Stream Processing app
– Balance of functions, performance, ease of use are good.
• Gearpump : For akka expert
– Good performance / extensibility, but difficult for first use.
• Spark Streaming : For Spark user.
– High compatibility with other Spark components.
• Beam / Cloud Dataflow : For public cloud user.
– Good portability, on-premise and public cloud.
• NiFi : For many small Stream Processing apps user.
• Kafka Streams : Auxiliary use with other products.
60. @kimutansk
Consideration point list
技術的な検討ポイントの中で、代表的なもの。
問題領域、可用性、システム管理、開発方式に大きく分けられる。
• For develop Stream Processing Systems,
there are many technical consideration points
– It is needed clarify Stream Processing product has
function? or not?
• Target problem area
– Time model
– Windowing
– Out of order processing
• System reliability
– State management
– Fault tolerance
– Re-execute
– Message delivery semantics
• System management
– UI
– Logging
– Back-pressure
– Scale out / Scale in
– Data security
• Development method
– Api
– Specific library
– Environment, operation
61. @kimutansk
Target problem area
これまで説明してきたように、時刻モデルやWindowing、
Out of order処理でシステムが何に対応可能かが決定する。
• Time model
– Is it necessary handle EventTime ? or ProcessingTime
only?
– If ProcessingTime only is OK, system are more simple.
• But, changing it later is difficult.
• Windowing
– Which window method is it needed?
• Tumbling, Sliding, Session
• Out of order processing
– Relative to “Time model”, “Windowing”
• How long late date is allowed?
• Handling method.
62. @kimutansk
System reliability(1)
システムの信頼性の担保のために、検討が必要な項目
状態管理、耐障害性、バグが発生した時の再実行可能性
• State management
– Is “State” save to local machine? or Remote datastore?
– Which format is state serialized?
• Trade-off between reliability VS performance.
• Fault tolerance
– When system has failure, how wide influence ? latency?
• When system failure, auto repair? or manual repair?
• How long mean time to repair? (MTTR)
• Re-execute
– When system failure, or program bug,
is it necessary re-execute?
• For re-execute, messages needs to be stored long term.
63. @kimutansk
System reliability(2)
メッセージ配信のセマンティクスも検討が必要
注意:単体で「全てに対応可能なExactly once」は実現不可
• Message delivery semantics
– Which semantics for message processing.
• At most once
• At Ieast once
• Exactly once
– Premise: Any pattern covered Exactly once is impossible!
– By Stream Processing system, it can guarantee
only “Self state are exactly once processed.”
– System has output for external systems,
it is needed deduplication or “Accumulation” function by
external systems.
64. @kimutansk
Exactly once NG pattern
外部に対するアクセスと状態保存が「Atomic」でないため。
メッセージ通知時アラーム発報するシステムを考える。
• Because, external access and persisting state are
not Atomic.
Consider when message
notified, send alarm.
65. @kimutansk
Exactly once NG pattern
通知後、状態を更新する前に障害が発生すると・・・?
• Because, external access and persisting state are
not Atomic.
After send alarm, if occurs
failure before persisting state?
66. @kimutansk
Exactly once NG pattern
再度通知が行われ、重複してアラームが発報される!
• Because, external access and persisting state are
not Atomic.
Re-notify message,
duplicated alarm sended!
67. @kimutansk
System management(1)
大体のシステムはUIを持っている。持つ機能は重要。
ログを集約して解析可能とする仕組みは必須。
• UI
– Most of Stream Processing products, have Custom UI.
• What kind information can user watch from UI?
• What kind operation can user execute from UI?
– Important information, execution graph.
• If shuffle status are displayed, easy to diagnosis.
• Logging
– For distributed systems, each server login confirm are
unrealistic.
– So, the log collect function are important for error analysis.
68. @kimutansk
System management(2)
バックプレッシャー機能が無いと事前見切りや監視が必要。
基本常時動作するため、動作中の拡張縮小ができるといい。
• Back-pressure
– If performance unbalance between components,
back-pressure function are needed.
– If not exist, needed estimate each component performance,
but the estimations are difficult and unreliable.
• Scale out / Scale in
– In runtime, can system scale out / scale in?
– Or allowing system restart?
– Minimum execution unit is it?
• Depends on external system partition?
• Basically, Stream Processing systems execute continuously,
needed to clarify in advance.
70. @kimutansk
Development method(1)
開発時どのAPIを使用可能か?(開発速度⇔カスタマイズ性)
複数のAPIをシステムに混在させることもプロダクト次第で可能。
• API
– Which develop abstraction can select?
• Depends on team member’s skill,
system’ priority(customize? development speed?).
– Major development api abstractions are below 3.
• Declarative, expressive API
– Development speed:○, Customizability:○
– like: map(), filter(), join(), sort(), etc...
• Imperative, lower-level API
– Development speed:△, Customizability:◎
– like: process(event)
• Streaming SQL
– Development speed:◎, Customizability:△
– like: STREAM SELECT ... FROM ... WHERE ... TO ...
71. @kimutansk
Development method(2)
宣言的なアプリケーション実装。
関数でStreamを加工し、個々の処理を行う。
• Declarative, expressive API Example
– By Flink
case class CarEvent(carId: Int, speed: Int, distance: Double, time: Long)
val DataStream[CarEvent] carEventStream = ...;
val topSeed = carEventStream
.assignAscendingTimestamps( _.time )
.keyBy("carId")
.window(GlobalWindows.create)
.evictor(TimeEvictor.of(Time.of(evictionSec * 1000, TimeUnit.MILLISECONDS)))
.trigger(DeltaTrigger.of(triggerMeters, new DeltaFunction[CarEvent] {
def getDelta(oldSp: CarEvent, newSp: CarEvent): Double = newSp.distance - oldSp.distance
}, cars.getType().createSerializer(env.getConfig)))
.maxBy("speed")
72. @kimutansk
Development method(3)
手続き的なアプリケーション実装。
processメソッドをStreamに適用し、個々の処理を行う。
• Imperative, lower-level API Example
– By Flink
val stream : DataStream[Tuple2[String, String]] = ...;
val result : DataStream[Tuple2[String, Long]] result =
stream
.keyBy(0)
.process(new CountWithTimeoutFunction());
case class CountWithTimestamp(key: String, count: Long, lastModified: Long)
class CountWithTimeoutFunction extends ProcessFunction[(String, Long), (String, Long)] {
lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext()
.getState(new ValueStateDescriptor<>("myState", clasOf[CountWithTimestamp]))
override def processElement(value: (String, Long),
ctx: Context, out: Collector[(String, Long)]): Unit ...;
override def onTimer(timestamp: Long, ctx: OnTimerContext,
out: Collector[(String, Long)]): Unit = ...;
}
73. @kimutansk
Development method(4)
Streaming SQLによるアプリケーション実装。
Streamにスキーマを設定し、SQLで処理を行う。
• Streaming SQL Example
– By Flink
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tableEnv = TableEnvironment.getTableEnvironment(env)
// read a DataStream from an external source
val ds: DataStream[(Long, String, Integer)] = env.addSource(...)
// register the DataStream under the name "Orders"
tableEnv.registerDataStream("Orders", ds, 'user, 'product, 'amount)
// run a SQL query on the Table and retrieve the result as a new Table
val result = tableEnv.sql(
"SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'")
74. @kimutansk
Development method(5)
プロダクトの固有ライブラリで必要なものはあるか?
簡易な実行環境や、既存のプロセスとの親和性も重要。
• Specific library
– Each product, has specific library.
• Machine learning
• Graph processing
• Adapter with external components.
– If product has no target library, needs to be developed.
• Environment, operation
– Can execute local machine?
– Install in advance? or resource manager deploy only?
– For program update, allowed restart? or Rolling update?
– Coexistence of existing team’s development tools?
– What language is product developed? (For diagnosis)
76. @kimutansk
Performance problems list
プロダクトにかかわらず存在する代表的な問題
「ファイルアクセス」「プロセス間通信」「外部の限界」「GC」
• Typical Stream Processing systems performance
problems are below.
– File access
– Communication between processes
– External system performance limit
– GC
• Of cource, depending on the product, many other
performance problems exist.
– In that case, analyze each time reference to previous
chapter “System management”.
77. @kimutansk
File access
初心者の時によくはまるのがローカルファイルへの同期アクセス
キャッシュを使うか非同期で問題ない設計にする必要がある。
• ※This problem is often encountered in beginner
user.
– If Stream Processing system component access local file
system each event synchronously, the component become
bottleneck.
– Avoid synchronous file system access by cache.
– Or update system design without synchronous access.
• For example, batch update file system asynchronously.
78. @kimutansk
Communication between processes
バッチ処理と同じく、プロセス間通信のコストは大きい。
主にShuffle時に発生するため、低減するための対処が必要。
• Similar to Batch Processing, communication between
processes performance impact is high.
– Mainly, communication between processes occur
“Shuffle”
– For example, it is needed reduce communication cost by
grouping each partition’s data beforehand shuffle.
– Or aggregate components for reduce communication
between processes.
– But, excessive aggregation induces high component
maintenance cost, or lack of performance tuning flexibility.
79. @kimutansk
External system performance limit
ストリーム処理システム自体がインメモリで並列処理をするため
他システムのスループットを超過することがしばしば発生する。
• Reach the performance limit of external
systems(Message bus, output systems.).
– In general, Stream Processing system process in memory,
and execute concurrent/parallel. They tend to become
large throughput, so sometimes overflow external systems.
• In generation of Storm or Spark Streaming,
generally Kafka > Stream Processing engine.
• But current generation(Flink or Apex),
sometimes Kafka < Stream Processing engine.
• Tuning cluster size or replication settings for
Message bus and output systems.
80. @kimutansk
GC
JVM上で動作する以上避けられないGCの問題。まずチューニング。
それで駄目ならリスクを負って強引な対処をするしか・・・
• Unavoidable problems executing on the JVM.
• Stream Processing systems handle huge number
events, so object generate huge number.
• First, tuning JVM.
• Still if you can not help it...
– In application code, suppress object generation as much as
possible.
– For object contained events, use byte array instead of each
object.
• But, these countermeasure have a bad impact for
system maintenance, quality.
– You better not do it if you can.
84. @kimutansk
Common misconceptions
初期のプロダクトに問題が多かったこともあり、
ストリーム処理システムにはよくある誤解がある。
• From history of the past, some misconceptions exist
for Stream Processing systems.
– Stream Processing are only applicable for approximate.
– Latency and Throughput, we must choose one.
– Micro-batching means better throughput.
– Completely impossible exactly once semantics.
– Stream Processing only apply to “real-time”
– Stream Processing it too hard anyway.
85. @kimutansk
Answer for misconceptions(1)
近似にのみ使用可能、は初期のStormから来ている誤解。
同様に今はレイテンシとスループットではなく違うトレードオフ
• Stream Processing are only applicable for
approximate.
– In initial Storm, it is exactly. So required combined use with
Batch Processing(Lambda Architecture).
– In current, it is controllable by Watermark, or Trigger.
• Latency and Throughput, we must choose one.
– It also from initial compare Storm VS Spark Streaming.
– In current, there are 3-axis trade-off.
• Completeness
• Low Latency
• Low Cost
86. @kimutansk
Answer for misconceptions(2)
マイクロバッチだからスループットに優れる理由にはならない。
Exactly onceは出来るパターンを見極めて対応が必要。
• Micro-batching means better throughput.
– In real systems, data are buffered in hard layer.
So, micro-batching does not become a reason for good
throughput.
– On the contrary, manage of micro-batching job could be
performance impact.
• Completely impossible exactly once semantics.
– Stream Processing system can guarantee
“Self state are exactly once processed.”
– System has output for external systems,
it is needed deduplication or “Accumulation” function by
external systems.
87. @kimutansk
Answer for misconceptions(3)
ストリーム処理システムはリアルタイム処理以外にも適用可能
開発コストも開発しやすいAPIが増えて、下がってきている。
• Stream Processing only apply to “real-time”
– In previous chapter, “Batch Processing is subset pattern
of Stream Processing.”
– So, Stream Processing can adapt to Batch job.
• But, performance efficiency is needed confirm.
• Stream Processing it too hard anyway.
– With unbound data source and data updates very
frequently, easier to adapt than Batch Processing.
– In initial generation, Stream Processing product has only
imperative, lower-level API, so development cost is high.
– But in current, product also has declarative, expressive API
and Streaming SQL. Development cost become lower.
88. @kimutansk
Summary
ストリーム処理システムとは何かを紹介し、
プロダクトと検討ポイント、よくある誤解について説明しました。
• Stream Processing is superset of Batch Processing.
– But new problems are occurred “Out-of-order”
• There are countermeasures for new problems.
– Watermark / Trigger / Accumulation
• Typical Stream Processing system consisted of
– Message bus / Stream Processing engine / Output system
• Stream Processing products are many.
– Need consideration for product select.
– Introduced consideration point.
• In addition, typical problem and misconceptions.
89. @kimutansk
Reference materials
• The world beyond batch: Streaming 101
– https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
• The world beyond batch: Streaming 102
– https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
• MillWheel: Fault-Tolerant Stream Processing at Internet Scale
– https://research.google.com/pubs/pub41378.html
• The Dataflow Model: A Practical Approach to Balancing Correctness, Latency,
and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
– https://research.google.com/pubs/pub43864.html
• The Evolution of Massive-Scale Data Processing
– https://goo.gl/jg4UAb
• Streaming Engines for Big Data
– http://www.slideshare.net/stavroskontopoulos/voxxed-days-thessaloniki-
21102016-streaming-engines-for-big-data
• Introduction to Streaming Analytics
– http://www.slideshare.net/gschmutz/introduction-to-streaming-analytics-
69120031
90. @kimutansk
Reference materials
• Stream Processing Myths Debunked:Six Common Streaming Misconceptions
– http://data-artisans.com/stream-processing-myths-debunked/
• A Practical Guide to Selecting a Stream Processing Technology
– http://www.slideshare.net/ConfluentInc/a-practical-guide-to-selecting-a-stream-
processing-technology
– https://research.google.com/pubs/pub41378.html
• Apache Beam and Google Cloud Dataflow
– http://www.slideshare.net/SzabolcsFeczak/apache-beam-and-google-cloud-
dataflow-idg-final-64440998
• The Beam Model
– https://goo.gl/6ApbHV
• THROUGHPUT, LATENCY, AND YAHOO! PERFORMANCE BENCHMARKS. IS
THERE A WINNER?
– https://www.datatorrent.com/blog/throughput-latency-and-yahoo/
• Lightweight Asynchronous Snapshots for Distributed Dataflows
– https://arxiv.org/abs/1506.08603
91. Thank you for your attention!
Enjoy Stream Processing!
https://www.flickr.com/photos/neokratz/4913885458