Enviar búsqueda
Cargar
Apache Flume (NG)
•
25 recomendaciones
•
13,604 vistas
Alexander Alten-Lorenz
Seguir
Apache FlumeNG presentation, held in Stuttgart. Includes coding and configuration examples
Leer menos
Leer más
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 16
Descargar ahora
Descargar para leer sin conexión
Recomendados
Cloudera's Flume
Cloudera's Flume
Cloudera, Inc.
Apache flume - an Introduction
Apache flume - an Introduction
Erik Schmiegelow
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Steve Hoffman
Extracting twitter data using apache flume
Extracting twitter data using apache flume
Bharat Khanna
Flume
Flume
Chirag Ahuja
Apache flume - Twitter Streaming
Apache flume - Twitter Streaming
Kowndinya Mannepalli
Flume and Hadoop performance insights
Flume and Hadoop performance insights
Omid Vahdaty
Inside Flume
Inside Flume
Cloudera, Inc.
Recomendados
Cloudera's Flume
Cloudera's Flume
Cloudera, Inc.
Apache flume - an Introduction
Apache flume - an Introduction
Erik Schmiegelow
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Steve Hoffman
Extracting twitter data using apache flume
Extracting twitter data using apache flume
Bharat Khanna
Flume
Flume
Chirag Ahuja
Apache flume - Twitter Streaming
Apache flume - Twitter Streaming
Kowndinya Mannepalli
Flume and Hadoop performance insights
Flume and Hadoop performance insights
Omid Vahdaty
Inside Flume
Inside Flume
Cloudera, Inc.
Flume in 10minutes
Flume in 10minutes
dwmclary
Apache flume
Apache flume
Ramakrishna kapa
Flume and HBase
Flume and HBase
Alexander Alten-Lorenz
Filesystems, RPC and HDFS
Filesystems, RPC and HDFS
Alexander Alten-Lorenz
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
DataWorks Summit
Apache Flume
Apache Flume
Arinto Murdopo
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
Rapheephan Thongkham-Uan
Apache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
Swapnil Dubey
Centralized logging with Flume
Centralized logging with Flume
Ratnakar Pawar
Apache Flume - DataDayTexas
Apache Flume - DataDayTexas
Arvind Prabhakar
Apache Flume
Apache Flume
GetInData
Flume intro-100715
Flume intro-100715
Cloudera, Inc.
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
Jayesh Thakrar
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
DataWorks Summit
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
StreamNative
Flume vs. kafka
Flume vs. kafka
Omid Vahdaty
Big data: Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
Christophe Marchal
Highlights Of Sqoop2
Highlights Of Sqoop2
Alexander Alten-Lorenz
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
JinfengHuang3
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
StreamNative
Spark+flume seattle
Spark+flume seattle
Hari Shreedharan
Play Framework
Play Framework
Harinath Krishnamoorthy
Más contenido relacionado
La actualidad más candente
Flume in 10minutes
Flume in 10minutes
dwmclary
Apache flume
Apache flume
Ramakrishna kapa
Flume and HBase
Flume and HBase
Alexander Alten-Lorenz
Filesystems, RPC and HDFS
Filesystems, RPC and HDFS
Alexander Alten-Lorenz
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
DataWorks Summit
Apache Flume
Apache Flume
Arinto Murdopo
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
Rapheephan Thongkham-Uan
Apache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
Swapnil Dubey
Centralized logging with Flume
Centralized logging with Flume
Ratnakar Pawar
Apache Flume - DataDayTexas
Apache Flume - DataDayTexas
Arvind Prabhakar
Apache Flume
Apache Flume
GetInData
Flume intro-100715
Flume intro-100715
Cloudera, Inc.
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
Jayesh Thakrar
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
DataWorks Summit
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
StreamNative
Flume vs. kafka
Flume vs. kafka
Omid Vahdaty
Big data: Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
Christophe Marchal
Highlights Of Sqoop2
Highlights Of Sqoop2
Alexander Alten-Lorenz
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
JinfengHuang3
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
StreamNative
La actualidad más candente
(20)
Flume in 10minutes
Flume in 10minutes
Apache flume
Apache flume
Flume and HBase
Flume and HBase
Filesystems, RPC and HDFS
Filesystems, RPC and HDFS
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume
Apache Flume
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
Apache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
Centralized logging with Flume
Centralized logging with Flume
Apache Flume - DataDayTexas
Apache Flume - DataDayTexas
Apache Flume
Apache Flume
Flume intro-100715
Flume intro-100715
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
Flume vs. kafka
Flume vs. kafka
Big data: Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
Highlights Of Sqoop2
Highlights Of Sqoop2
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
Similar a Apache Flume (NG)
Spark+flume seattle
Spark+flume seattle
Hari Shreedharan
Play Framework
Play Framework
Harinath Krishnamoorthy
Tornado Web Server Internals
Tornado Web Server Internals
Praveen Gollakota
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
MarketingArrowECS_CZ
How we scale DroneCi on demand
How we scale DroneCi on demand
Patrick Jahns
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
DataWorks Summit
Lessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networking
Tony Georgiev
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Lucidworks (Archived)
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
Bryan Bende
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
DataWorks Summit
signalr
signalr
Owen Chen
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps Approach
Akshaya Mahapatra
Hands on with CoAP and Californium
Hands on with CoAP and Californium
Julien Vermillard
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
Amazon Web Services
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
Globus
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
Docker, Inc.
Weblogic
Weblogic
sudeeporcl
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and Scala
Yevgeniy Brikman
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Docker, Inc.
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
Eran Harel
Similar a Apache Flume (NG)
(20)
Spark+flume seattle
Spark+flume seattle
Play Framework
Play Framework
Tornado Web Server Internals
Tornado Web Server Internals
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
How we scale DroneCi on demand
How we scale DroneCi on demand
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
Lessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networking
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
signalr
signalr
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps Approach
Hands on with CoAP and Californium
Hands on with CoAP and Californium
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
Weblogic
Weblogic
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and Scala
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
Más de Alexander Alten-Lorenz
Is big data dead?
Is big data dead?
Alexander Alten-Lorenz
Creating a value chain with IoT
Creating a value chain with IoT
Alexander Alten-Lorenz
Big Data in an modern Enterprise
Big Data in an modern Enterprise
Alexander Alten-Lorenz
The Future of Energy
The Future of Energy
Alexander Alten-Lorenz
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
Alexander Alten-Lorenz
Sentry - An Introduction
Sentry - An Introduction
Alexander Alten-Lorenz
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Alexander Alten-Lorenz
Bi with apache hadoop(en)
Bi with apache hadoop(en)
Alexander Alten-Lorenz
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
Alexander Alten-Lorenz
Big Data mit Apache Hadoop
Big Data mit Apache Hadoop
Alexander Alten-Lorenz
Más de Alexander Alten-Lorenz
(10)
Is big data dead?
Is big data dead?
Creating a value chain with IoT
Creating a value chain with IoT
Big Data in an modern Enterprise
Big Data in an modern Enterprise
The Future of Energy
The Future of Energy
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
Sentry - An Introduction
Sentry - An Introduction
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Bi with apache hadoop(en)
Bi with apache hadoop(en)
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
Big Data mit Apache Hadoop
Big Data mit Apache Hadoop
Último
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
LoriGlavin3
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
LoriGlavin3
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
Kari Kakkonen
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
panagenda
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
LoriGlavin3
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
Inflectra
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
panagenda
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
UiPathCommunity
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
HarshalMandlekar2
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
Rick Flair
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
Ravi Sanghani
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Databarracks
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
Mydbops
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
Nicole Novielli
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Scott Andery
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
LoriGlavin3
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
Skynet Technologies
Último
(20)
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
Apache Flume (NG)
1.
March 2012 Apache Flume
(NG) Alexander Lorenz | Customer Operations Engineer
2.
Overview
• Stream data (events, not files) from clients to sinks • Clients: files, syslog, avro, … • Sinks: HDFS files, HBase, … • Configurable reliability levels – Best effort: “Fast and loose” – Guaranteed delivery: “Deliver no matter what” • Configurable routing / topology 2 ©2012 Cloudera, Inc. All Rights Reserved.
3.
Architecture
Component Function Agent The JVM running Flume. One per machine. Runs many sources and sinks. Client Produces data in the form of events. Runs in a separate thread. Sink Receives events from a channel. Runs in a separate thread. Channel Connects sources to sinks (like a queue). Implements the reliability semantics. Event A single datum; a log record, an avro object, etc. Normally around ~4KB. 3 ©2012 Cloudera, Inc. All Rights Reserved.
4.
Agent
• Runs many clients and sinks • Java properties-based configuration • Low overhead (-Xmx20m) – But adding RAM increases performance 4 ©2012 Cloudera, Inc. All Rights Reserved.
5.
Sources
• Plugin interface • Managed by a SourceRunner that controls threading and execution model (e.g. polling vs. event-based) • Included: exec, avro, syslog, … 5 ©2012 Cloudera, Inc. All Rights Reserved.
6.
Sources
public class MySource implements PollableSource { public Status process() { // Do something to create an Event.. Event e = EventBuilder.withBody(…).build(); // A channel instance is injected by Flume. Transaction tx = channel.getTransaction(); tx.begin(); try { channel.put(e); tx.commit(); } catch (ChannelException ex) { tx.rollback(); return Status.BACKOFF; } finally { tx.close(); } return Status.READY; } } 6 ©2012 Cloudera, Inc. All Rights Reserved.
7.
Channel
• Plugin interface • Transactional • Provide queuing between source / sink • Provide reliability semantics – MemoryChannel: Basically a Java BlockingQueue. – JDBC – WAL 7 ©2012 Cloudera, Inc. All Rights Reserved.
8.
Sinks
• Plugin interface • Managed by a SinkRunner that controls threading and execution model • Included: HDFS files (various formats) 8 ©2012 Cloudera, Inc. All Rights Reserved.
9.
Sinks Code
public class MySink implements PollableSink { public Status process() { Transaction tx = channel.getTransaction(); tx.begin(); try { Event e = channel.take(); if (e != null) { // … tx.commit(); } else { return Status.BACKOFF; } } catch (ChannelException ex) { tx.rollback(); return Status.BACKOFF; } finally { tx.close(); } return Status.READY; } } 9 ©2012 Cloudera, Inc. All Rights Reserved.
10.
Tiered collection •
Send events from agents to another tier of agents to aggregate • Use an Avro sink (really just a client) to send events to an Avro source (really just a server) in another machine • Failover supported • Load balancing (soon) • Transactions guarantee handoff 10 ©2012 Cloudera, Inc. All Rights Reserved.
11.
Tiered collection –
The handoff • Agent 1: Tx begin • Agent 1: Channel take event • Agent 1: Sink send • Agent 2: Tx begin • Agent 2: Channel put • Agent 2: Tx commit, respond OK • Agent 1: Tx commit (or rollback) 11 ©2012 Cloudera, Inc. All Rights Reserved.
12.
Configuration • done
in a single file • identifier for flows • <identifier>.type.subtype.parameter.config, where <identifier> is the name of the agent • flow has a client, type, channel, sink • can have multiple channels in one flow 12 ©2012 Cloudera, Inc. All Rights Reserved.
13.
Simple Configuration Example
syslog-agent.sources = Syslog syslog-agent.channels = MemoryChannel-1 syslog-agent.sinks = Console syslog-agent.sources.Syslog.type = syslogTcp syslog-agent.sources.Syslog.port = 5140 syslog-agent.sources.Syslog.channels = MemoryChannel-1 syslog-agent.sinks.Console.channel = MemoryChannel-1 syslog-agent.sinks.Console.type = logger syslog-agent.channels.MemoryChannel-1.type = memory 13 ©2012 Cloudera, Inc. All Rights Reserved.
14.
HDFS Configuration Example
syslog-agent.sources = Syslog syslog-agent.channels = MemoryChannel-1 syslog-agent.sinks = HDFS-LAB syslog-agent.sources.Syslog.type = syslogTcp syslog-agent.sources.Syslog.port = 5140 syslog-agent.sources.Syslog.channels = MemoryChannel-1 syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1 syslog-agent.sinks.HDFS-LAB.type = hdfs syslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://NN.URI:PORT/ flumetest/'%{host}'' syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfiles syslog-agent.sinks.HDFS-LAB.hdfs.file.rollInterval = 60 syslog-agent.sinks.HDFS-LAB.hdfs.file.Type = SequenceFile syslog-agent.channels.MemoryChannel-1.type = memory 14 ©2012 Cloudera, Inc. All Rights Reserved.
15.
Features •
Fan out: One source, many channels • Fan in: Many sources, one channel • Processors (aka: decorators) • Auto-batching of events in RPCs… • Multiplexing channels for datamining • Avro implementation in both ways 15 ©2012 Cloudera, Inc. All Rights Reserved.
16.
Thank You •
Web: https://cwiki.apache.org/FLUME/ getting-started.html • ML: flume-user@incubator.apache.org • Mail: alexander@cloudera.com • Blog: mapredit.blogspot.com • Twitter: @mapredit 16 ©2012 Cloudera, Inc. All Rights Reserved.
Descargar ahora