SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
April 19, 2018
Artem Ervits – Hortonworks
Clay Baenziger – Bloomberg
Breathing New Life into Apache Oozie
with Apache Ambari Workflow Manager
© 2018 Bloomberg Finance L.P. All rights reserved.
Poll:
• Who here uses Oozie?
— In production?
— Do you use HUE with Oozie?
— How many workflows have you in production?
1-10? 10-50? 50+?
— How many actions does the largest workflow contain?
1-10? 10-50? 50+?
— Do you use Oozie with (or want to)?
HBase? Spark? Python? Deployment Automation?
• Do you like XML?
— Do you have a favorite editor for Oozie workflows?
© 2018 Bloomberg Finance L.P. All rights reserved.
Open Source Workflow Managers
• Apache Airflow (Incubating)
• Luigi by Spotify
• Azkaban by LinkedIn
• (And of course) Apache Oozie
© 2018 Bloomberg Finance L.P. All rights reserved.
Introduction to Oozie
• Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
• Oozie workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
• Oozie coordinator jobs are recurrent Oozie workflow jobs triggerd by time and data availability.
• Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as
system specific jobs out of the box.
• Oozie is a scalable, reliable and extensible system.
- Paraphrased from http://oozie.apache.org
Actions:
• Map/Reduce
• Hive
• Pig
• HDFS
• Java
• Shell
• Spark
• Sub-Workflow
• E-Mail
• Decision
• Fork
• Join
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Release Timeline
• 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs.
• 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs.
• 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials.
• 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high
availability.
• 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead
of MR launcher, new actions, code clean up.
- Adopted from: Apache Oozie by
Mohammad Kamrul Islam and Aravind Srinivasan
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Complaints
• Launcher jobs as map tasks
• Dated UI
• Confusing object model – workflows, coordinators, bundles
• Complicated setup
• XML
• DAG visualization
• SLA alerting
• Fine grained authorization
• Easy access to log files
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Complaints Improvements
• Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770
• Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today)
• Confusing object model – jobs API, patch available, targeted for 5.X, OOZIE-2339
• Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666
• XML – fluent job API, patch available, targeted for 5.X, OOZIE-2339
• DAG visualization – solved by Oozie 5.0.0, OOZIE-2406
• SLA alerting – since Oozie 4.0.0, OOZIE-1294
• Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196
• Easy access to log files – solved by Oozie 5.0.0, OOZIE-2296
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Launcher – Prior to Release 5.0
• MR launcher job
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Launcher – Release 5.0
• OYA: OOZIE-1770: Create Oozie Application Master for YARN
— Removes MR launcher job
• Design Doc
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Documentation – Before Release 5.0 and After
Documentation redesign
OOZIE-3163: Improve documentation rendering: use fluido skin and better config
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Workflow Visualiztion – Prior to 5.0 and After
Jung GraphViz
OOZIE-2406: Completely rewrite GraphGenerator code
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Fluent Job API – Apache Oozie 5.X (Preview)
OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
© 2018 Bloomberg Finance L.P. All rights reserved.
Apache Ambari
Ambari Provides:
• Provisioning of a Hadoop Cluster
• Management of a Hadoop Cluster
• Monitoring of a Hadoop Cluster
— A Metrics System for metrics collection
— An Alert Framework
— A dashboard for monitoring the Hadoop cluster
-Paraphrased from http://ambari.apache.org
© 2018 Bloomberg Finance L.P. All rights reserved.
Ambari Views
• Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom
visualization, management and monitoring features in Ambari Web. A "view" is a way of
extending Ambari that allows 3rd parties to plug in new resource types along with the
APIs, providers and UI to support them. In other words, a view is an application that is
deployed into the Ambari container.”
• Key take-aways:
— One does not need an Ambari managed (administrated) cluster
— Third parties can build views packages to run in the Ambari framework too
— Major views available:
(YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN
ATS) Jobs, (Oozie) Workflow Manager
• Alternatives: Cloudera Hue, bespoke applications
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Motivation
• Oozie workflows are defined in XML – too verbose
— Provide GUI workflow builder and editor
— Reduce possibility of user introduced errors
— Provide browser based workflow manager
• Integration with File
Browser (includes S3 support)
— Tighter integration with
Ambari in future
— Can replace existing
Oozie web UI
• Oozie is hard-coded to
display only 25 actions
— WFM doesn’t have this
limit; tested with 300+
action nodes
• Oozie is scalable
— Can scale WFM by
standing-up multiple
Ambari Views servers
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Designer Example
Workflow Manager:
• Available as an Ambari View
• Enables visual editing of Oozie workflows
• Integrated with file browser
• Reduces user input errors
• Minimal input required
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Execution View
• Integrated Dashboard with Workflow Manager View
• Manage Oozie jobs
• Drill down to logs
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Design Component
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Dashboard Component
Good Documentation: HDP 2.6 – Workflow Manager Basics
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
• Setup
• Data Definition
• Compactions
Workflow Manager Examples with
HBase
© 2018 Bloomberg Finance L.P. All rights reserved.
HBase – Setup
Oozie needs HBase Confguration:
• Oozie Server Code (to support Hbase delegation tokens)
— In libexec (see Server JARs list)
— In oozie-site.xml
<name>oozie.credentials.credentialclasses</name>
<value>hbase=org.apache.oozie.action.hadoop.HbaseCredentials,…</value>
</name>
• Client Workflow Code:
— Add to workflow.xml:
<credentials>
<credential name=”myhbase_creds” type=”hbase”>
[…]
</credential>
</credentials>
— All your normal HBase security settings in the credential section
• Server JARs:
(Copy the following to Oozie’s libexec)
— hbase-common.jar
— hbase-client.jar
— hbase-server.jar
— hbase-protocol.jar
— hbase-hadoop2-compat.jar
© 2018 Bloomberg Finance L.P. All rights reserved.
create_my_table.rb:
tables = list
tables.select { |table|
table.eql?('my_table') }
if tables.empty?
create 'my_table',
{NAME => 'my_col'}
end
exit
HBase – Data Definition
HBase Shell:
<action name="HBASE-Shell" cred="hbase_creds">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>hbase</exec>
<argument>shell</argument>
<argument>-n</argument>
<argument>create_my_table.rb</argument>
</shell>
<ok to="do_more_things"/>
<error to="fail"/>
</action>
© 2018 Bloomberg Finance L.P. All rights reserved.
HBase – Compactions
HBASE-19528: Major Compaction Tool
• Automatically scales compaction to selected number of servers
• Requires read ability to /hbase
usage: MajorCompactor [-cf <arg>] [-dryRun] -servers <arg> -table <arg>
[...]
Usage instructions
-cf <arg> column families: comma separated eg: a,b,c
-dryRun Dry run, will just output a list of regions that
require compaction based on parameters passed
-minModTime <arg> Compact if store files have
modification time < minModTime
-servers <arg> Concurrent servers compacting
-table <arg> table name
...
© 2018 Bloomberg Finance L.P. All rights reserved.
More Resources
• Oozie Examples: https://github.com/dbist/oozie-examples
• Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html
• Artem’s 12 Part Series on WFM: http://bit.ly/2syKUIh
• Clay’s Past Oozie presentations:
— Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj
— HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
Demo!
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
Questions?

Más contenido relacionado

La actualidad más candente

Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Lucas Jellema
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Obiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle ManagementObiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle ManagementStewart Bryson
 
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)Lucas Jellema
 
Building a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference applicationBuilding a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference applicationDataWorks Summit
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithDatabricks
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceJohn Archer
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Exploring All options to move your Oracle Databases to the Oracle Cloud
Exploring All options to move your Oracle Databases to the Oracle CloudExploring All options to move your Oracle Databases to the Oracle Cloud
Exploring All options to move your Oracle Databases to the Oracle CloudAlex Zaballa
 
ODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doHortonworks
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in EnterpriseDataWorks Summit
 
a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...DataWorks Summit
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easyDataWorks Summit
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiDataWorks Summit
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionTravis Wright
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)Kay Lerch
 

La actualidad más candente (20)

Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Obiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle ManagementObiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle Management
 
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
 
Building a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference applicationBuilding a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference application
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Exploring All options to move your Oracle Databases to the Oracle Cloud
Exploring All options to move your Oracle Databases to the Oracle CloudExploring All options to move your Oracle Databases to the Oracle Cloud
Exploring All options to move your Oracle Databases to the Oracle Cloud
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
 
ODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doODPi 101: Who we are, What we do
ODPi 101: Who we are, What we do
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
 

Similar a Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerArtem Ervits
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
APIdays 2016 - The State of Web API Languages
APIdays 2016  - The State of Web API LanguagesAPIdays 2016  - The State of Web API Languages
APIdays 2016 - The State of Web API LanguagesRestlet
 
API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17Phil Wilkins
 
First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)Daniel Toomey
 
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...Edward Wilde
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Running SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersRunning SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersSimon Haslam
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoopGergely Devenyi
 
SOA - From Webservices to APIs
SOA - From Webservices to APIsSOA - From Webservices to APIs
SOA - From Webservices to APIsHolger Reinhardt
 
Lessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxLessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxapidays
 
Deploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDeploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDataWorks Summit
 
Modernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsModernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsApigee | Google Cloud
 
Top 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationTop 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationOCTO Technology
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineCsaba Toth
 

Similar a Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager (20)

Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
APIdays 2016 - The State of Web API Languages
APIdays 2016  - The State of Web API LanguagesAPIdays 2016  - The State of Web API Languages
APIdays 2016 - The State of Web API Languages
 
API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17
 
Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
 
First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)
 
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Running SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersRunning SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite Customers
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
SOA - From Webservices to APIs
SOA - From Webservices to APIsSOA - From Webservices to APIs
SOA - From Webservices to APIs
 
Apache Oozie
Apache OozieApache Oozie
Apache Oozie
 
Lessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxLessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptx
 
Deploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDeploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARI
 
Modernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsModernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIs
 
Top 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationTop 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementation
 
Octo API-days 2015
Octo API-days 2015Octo API-days 2015
Octo API-days 2015
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App Engine
 
yii framework
yii frameworkyii framework
yii framework
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Último (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

  • 1. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 April 19, 2018 Artem Ervits – Hortonworks Clay Baenziger – Bloomberg Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
  • 2. © 2018 Bloomberg Finance L.P. All rights reserved. Poll: • Who here uses Oozie? — In production? — Do you use HUE with Oozie? — How many workflows have you in production? 1-10? 10-50? 50+? — How many actions does the largest workflow contain? 1-10? 10-50? 50+? — Do you use Oozie with (or want to)? HBase? Spark? Python? Deployment Automation? • Do you like XML? — Do you have a favorite editor for Oozie workflows?
  • 3. © 2018 Bloomberg Finance L.P. All rights reserved. Open Source Workflow Managers • Apache Airflow (Incubating) • Luigi by Spotify • Azkaban by LinkedIn • (And of course) Apache Oozie
  • 4. © 2018 Bloomberg Finance L.P. All rights reserved. Introduction to Oozie • Oozie is a workflow scheduler system to manage Apache Hadoop jobs. • Oozie workflow jobs are Directed Acyclical Graphs (DAGs) of actions. • Oozie coordinator jobs are recurrent Oozie workflow jobs triggerd by time and data availability. • Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as system specific jobs out of the box. • Oozie is a scalable, reliable and extensible system. - Paraphrased from http://oozie.apache.org Actions: • Map/Reduce • Hive • Pig • HDFS • Java • Shell • Spark • Sub-Workflow • E-Mail • Decision • Fork • Join
  • 5. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Release Timeline • 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs. • 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs. • 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials. • 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high availability. • 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead of MR launcher, new actions, code clean up. - Adopted from: Apache Oozie by Mohammad Kamrul Islam and Aravind Srinivasan
  • 6. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints • Launcher jobs as map tasks • Dated UI • Confusing object model – workflows, coordinators, bundles • Complicated setup • XML • DAG visualization • SLA alerting • Fine grained authorization • Easy access to log files
  • 7. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints Improvements • Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770 • Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today) • Confusing object model – jobs API, patch available, targeted for 5.X, OOZIE-2339 • Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666 • XML – fluent job API, patch available, targeted for 5.X, OOZIE-2339 • DAG visualization – solved by Oozie 5.0.0, OOZIE-2406 • SLA alerting – since Oozie 4.0.0, OOZIE-1294 • Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196 • Easy access to log files – solved by Oozie 5.0.0, OOZIE-2296
  • 8. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Prior to Release 5.0 • MR launcher job
  • 9. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Release 5.0 • OYA: OOZIE-1770: Create Oozie Application Master for YARN — Removes MR launcher job • Design Doc
  • 10. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Documentation – Before Release 5.0 and After Documentation redesign OOZIE-3163: Improve documentation rendering: use fluido skin and better config
  • 11. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Workflow Visualiztion – Prior to 5.0 and After Jung GraphViz OOZIE-2406: Completely rewrite GraphGenerator code
  • 12. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Fluent Job API – Apache Oozie 5.X (Preview) OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
  • 13. © 2018 Bloomberg Finance L.P. All rights reserved. Apache Ambari Ambari Provides: • Provisioning of a Hadoop Cluster • Management of a Hadoop Cluster • Monitoring of a Hadoop Cluster — A Metrics System for metrics collection — An Alert Framework — A dashboard for monitoring the Hadoop cluster -Paraphrased from http://ambari.apache.org
  • 14. © 2018 Bloomberg Finance L.P. All rights reserved. Ambari Views • Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in Ambari Web. A "view" is a way of extending Ambari that allows 3rd parties to plug in new resource types along with the APIs, providers and UI to support them. In other words, a view is an application that is deployed into the Ambari container.” • Key take-aways: — One does not need an Ambari managed (administrated) cluster — Third parties can build views packages to run in the Ambari framework too — Major views available: (YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN ATS) Jobs, (Oozie) Workflow Manager • Alternatives: Cloudera Hue, bespoke applications
  • 15. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Motivation • Oozie workflows are defined in XML – too verbose — Provide GUI workflow builder and editor — Reduce possibility of user introduced errors — Provide browser based workflow manager • Integration with File Browser (includes S3 support) — Tighter integration with Ambari in future — Can replace existing Oozie web UI • Oozie is hard-coded to display only 25 actions — WFM doesn’t have this limit; tested with 300+ action nodes • Oozie is scalable — Can scale WFM by standing-up multiple Ambari Views servers
  • 16. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Designer Example Workflow Manager: • Available as an Ambari View • Enables visual editing of Oozie workflows • Integrated with file browser • Reduces user input errors • Minimal input required
  • 17. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Execution View • Integrated Dashboard with Workflow Manager View • Manage Oozie jobs • Drill down to logs
  • 18. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Design Component
  • 19. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Dashboard Component Good Documentation: HDP 2.6 – Workflow Manager Basics
  • 20. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 • Setup • Data Definition • Compactions Workflow Manager Examples with HBase
  • 21. © 2018 Bloomberg Finance L.P. All rights reserved. HBase – Setup Oozie needs HBase Confguration: • Oozie Server Code (to support Hbase delegation tokens) — In libexec (see Server JARs list) — In oozie-site.xml <name>oozie.credentials.credentialclasses</name> <value>hbase=org.apache.oozie.action.hadoop.HbaseCredentials,…</value> </name> • Client Workflow Code: — Add to workflow.xml: <credentials> <credential name=”myhbase_creds” type=”hbase”> […] </credential> </credentials> — All your normal HBase security settings in the credential section • Server JARs: (Copy the following to Oozie’s libexec) — hbase-common.jar — hbase-client.jar — hbase-server.jar — hbase-protocol.jar — hbase-hadoop2-compat.jar
  • 22. © 2018 Bloomberg Finance L.P. All rights reserved. create_my_table.rb: tables = list tables.select { |table| table.eql?('my_table') } if tables.empty? create 'my_table', {NAME => 'my_col'} end exit HBase – Data Definition HBase Shell: <action name="HBASE-Shell" cred="hbase_creds"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>hbase</exec> <argument>shell</argument> <argument>-n</argument> <argument>create_my_table.rb</argument> </shell> <ok to="do_more_things"/> <error to="fail"/> </action>
  • 23. © 2018 Bloomberg Finance L.P. All rights reserved. HBase – Compactions HBASE-19528: Major Compaction Tool • Automatically scales compaction to selected number of servers • Requires read ability to /hbase usage: MajorCompactor [-cf <arg>] [-dryRun] -servers <arg> -table <arg> [...] Usage instructions -cf <arg> column families: comma separated eg: a,b,c -dryRun Dry run, will just output a list of regions that require compaction based on parameters passed -minModTime <arg> Compact if store files have modification time < minModTime -servers <arg> Concurrent servers compacting -table <arg> table name ...
  • 24. © 2018 Bloomberg Finance L.P. All rights reserved. More Resources • Oozie Examples: https://github.com/dbist/oozie-examples • Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html • Artem’s 12 Part Series on WFM: http://bit.ly/2syKUIh • Clay’s Past Oozie presentations: — Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj — HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
  • 25. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Demo!
  • 26. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Questions?