59. @kimutansk
プロダクト紹介 : Flink
•手続き的、低レベルAPIによる実装例
– processメソッドをストリームの各Eventに適用し、
個々の処理を行う。
val stream : DataStream[Tuple2[String, String]] = ...;
val result : DataStream[Tuple2[String, Long]] result =
stream
.keyBy(0)
.process(new CountWithTimeoutFunction());
case class CountWithTimestamp(key: String, count: Long, lastModified: Long)
class CountWithTimeoutFunction extends ProcessFunction[(String, Long), (String, Long)] {
lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext()
.getState(new ValueStateDescriptor<>("myState", clasOf[CountWithTimestamp]))
override def processElement(value: (String, Long),
ctx: Context, out: Collector[(String, Long)]): Unit ...;
override def onTimer(timestamp: Long, ctx: OnTimerContext,
out: Collector[(String, Long)]): Unit = ...;
}
60. @kimutansk
プロダクト紹介 : Flink
•Streaming SQL APIによる実装例
– ストリームにスキーマを設定し、SQLで処理を記述
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tableEnv = TableEnvironment.getTableEnvironment(env)
// read a DataStream from an external source
val ds: DataStream[(Long, String, Integer)] = env.addSource(...)
// register the DataStream under the name "Orders"
tableEnv.registerDataStream("Orders", ds, 'user, 'product, 'amount)
// run a SQL query on the Table and retrieve the result as a new Table
val result = tableEnv.sql(
"SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'")
70. @kimutansk
参照資料(スライド中を除く)
• The world beyond batch: Streaming 101
– https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
• The world beyond batch: Streaming 102
– https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
• MillWheel: Fault-Tolerant Stream Processing at Internet Scale
– https://research.google.com/pubs/pub41378.html
• The Dataflow Model: A Practical Approach to Balancing Correctness, Latency,
and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
– https://research.google.com/pubs/pub43864.html
• The Evolution of Massive-Scale Data Processing
– https://goo.gl/jg4UAb
• Streaming Engines for Big Data
– http://www.slideshare.net/stavroskontopoulos/voxxed-days-thessaloniki-
21102016-streaming-engines-for-big-data
• Introduction to Streaming Analytics
– http://www.slideshare.net/gschmutz/introduction-to-streaming-analytics-
69120031
71. @kimutansk
参照資料(スライド中を除く)
• Stream Processing Myths Debunked:Six Common Streaming Misconceptions
– http://data-artisans.com/stream-processing-myths-debunked/
• A Practical Guide to Selecting a Stream Processing Technology
– http://www.slideshare.net/ConfluentInc/a-practical-guide-to-selecting-a-stream-
processing-technology
– https://research.google.com/pubs/pub41378.html
• Apache Beam and Google Cloud Dataflow
– http://www.slideshare.net/SzabolcsFeczak/apache-beam-and-google-cloud-
dataflow-idg-final-64440998
• The Beam Model
– https://goo.gl/6ApbHV
• THROUGHPUT, LATENCY, AND YAHOO! PERFORMANCE BENCHMARKS. IS
THERE A WINNER?
– https://www.datatorrent.com/blog/throughput-latency-and-yahoo/
• Lightweight Asynchronous Snapshots for Distributed Dataflows
– https://arxiv.org/abs/1506.08603
72. Thank you for your attention!
Enjoy Stream Processing!
https://www.flickr.com/photos/neokratz/4913885458