SlideShare una empresa de Scribd logo
1 de 25
1© Cloudera, Inc. All rights reserved.
More Data in Less Time
Deploying an Operational Data Store with Cloudera
2© Cloudera, Inc. All rights reserved.
Trends in the Market
16 billion connected devices
generating more data
“It will soon be technically
feasible & affordable to record
& store everything…”
ELT drives up to 80% of
database capacity
Internet of Things Data Storage Costs Resource Intensive ELT
Trends Driving Change
Source: Forbes Source: New York Times Source: Syncsort
3© Cloudera, Inc. All rights reserved.
Customers are augmenting their
traditional architectures for
modern business needs.
4© Cloudera, Inc. All rights reserved.
Operational Data Store (ODS):
Ingesting, storing, and preparing data for
both operational and analytical use.
(AKA: Operational Data Warehouse., RDBMS, Storage)
5© Cloudera, Inc. All rights reserved.
ODS Use Cases
Offload resource intensive ETL
workloads from systems
Migrate old data and ELT
workloads off of EDW
Store old data online so analyst
can access historic data
ETL Offload EDW Optimization Active Archive
6© Cloudera, Inc. All rights reserved.
Goals of an Operational Data Store
Ingest Data Store DataPrepare Data
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
7© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
8© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest 2) Inefficient Data Processing
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
2
2
9© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest 2) Inefficient Data Processing 3) Data Archived
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
2
2
3
10© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
1
ETL
BI System
Modeling
Reporting
11© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data 2) Optimize Data Processing
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
2
1
ETL
BI System
Modeling
Reporting
12© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data 2) Optimize Data Processing 3) Automated Secure Archive
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
2
31
ETL
BI System
Modeling
Reporting
13© Cloudera, Inc. All rights reserved.
RelayHealth Customer Story
14© Cloudera, Inc. All rights reserved.
About RelayHealth (A McKesson Business)
What does RelayHealth do-
RelayHealth is a financial solution of McKesson used to automate 2.4 billion financial transactions per year
200K Physicians, 2K Hospitals, 1.9K Payers/ Health Plans
Who is McKesson-
Largest healthcare solution company in the world with $103+ billion in revenue
Headquarters in San Francisco and established in 1833
32K employees
15© Cloudera, Inc. All rights reserved.
RelayHealth’s Objectives
Offload resource intensive ETL
workloads from systems
Migrate old data and ELT
workloads off of EDW
Store old data online so analyst
can access historic data
ETL Offload EDW Optimization Active Archive
16© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
1
RelayHealth Transaction
BATCH Processing System
17© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
2 Batch wasn’t cutting it
1
2
RelayHealth Transaction
BATCH Processing System
18© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
2 Batch wasn’t cutting it
3 Application & report latency
1
3
3
2
3
RelayHealth Transaction
BATCH Processing System
19© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
1
20© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
Stream & batch processing2
2
1
21© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
Stream & batch processing2
Prepared for future use cases3
2
3
1
22© Cloudera, Inc. All rights reserved.
Business and Technical ROI
Technology ROI
Business ROI
1) Active archive and Navigator for HIPAA compliance
2) Prepared for future use cases
3) Data ingest goes from end of day to near real-time
1) Transaction processed in 20ms VS 1 hour prior
2) $250k in licensing and hardware savings per year
3) Greater flexibility with data ingest
23© Cloudera, Inc. All rights reserved.
Key Leanings
Crawl, walk, run
It takes time, start now
Lean on experts in the community
24© Cloudera, Inc. All rights reserved.
INSERT PARTNER SLIDES
25© Cloudera, Inc. All rights reserved.
Thank you

Más contenido relacionado

La actualidad más candente

Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Power BI & SAP - Integration Options and possible Pifalls
Power BI & SAP - Integration Options and possible PifallsPower BI & SAP - Integration Options and possible Pifalls
Power BI & SAP - Integration Options and possible PifallsJJDE
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLJordan Birdsell
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Role-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4jRole-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4jNeo4j
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceDatabricks
 
Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...Lucas Jellema
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Edureka!
 
Fusion Middleware Oracle Data Integrator
Fusion Middleware Oracle Data IntegratorFusion Middleware Oracle Data Integrator
Fusion Middleware Oracle Data IntegratorMark Rabne
 
Optimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptxOptimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptxIDERA Software
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Denodo
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop QuébecMathieu Dumoulin
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 

La actualidad más candente (20)

Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Power BI & SAP - Integration Options and possible Pifalls
Power BI & SAP - Integration Options and possible PifallsPower BI & SAP - Integration Options and possible Pifalls
Power BI & SAP - Integration Options and possible Pifalls
 
Data Mesh 101
Data Mesh 101Data Mesh 101
Data Mesh 101
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Role-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4jRole-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4j
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-Source
 
Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
 
Fusion Middleware Oracle Data Integrator
Fusion Middleware Oracle Data IntegratorFusion Middleware Oracle Data Integrator
Fusion Middleware Oracle Data Integrator
 
Optimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptxOptimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptx
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop Québec
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 

Similar a Breakout: Hadoop and the Operational Data Store

Breakout: Data Discovery with Hadoop
Breakout: Data Discovery with HadoopBreakout: Data Discovery with Hadoop
Breakout: Data Discovery with HadoopCloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetupByung Ho Lee
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.
 
Google take on heterogeneous data base replication
Google take on heterogeneous data base replication Google take on heterogeneous data base replication
Google take on heterogeneous data base replication Svetlin Stanchev
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performanceOracle Korea
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...EMC
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETLLily Luo
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopCloudera, Inc.
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely
 

Similar a Breakout: Hadoop and the Operational Data Store (20)

Breakout: Data Discovery with Hadoop
Breakout: Data Discovery with HadoopBreakout: Data Discovery with Hadoop
Breakout: Data Discovery with Hadoop
 
CS-Op Analytics
CS-Op AnalyticsCS-Op Analytics
CS-Op Analytics
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetup
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Google take on heterogeneous data base replication
Google take on heterogeneous data base replication Google take on heterogeneous data base replication
Google take on heterogeneous data base replication
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performance
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with Hadoop
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Último (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Breakout: Hadoop and the Operational Data Store

  • 1. 1© Cloudera, Inc. All rights reserved. More Data in Less Time Deploying an Operational Data Store with Cloudera
  • 2. 2© Cloudera, Inc. All rights reserved. Trends in the Market 16 billion connected devices generating more data “It will soon be technically feasible & affordable to record & store everything…” ELT drives up to 80% of database capacity Internet of Things Data Storage Costs Resource Intensive ELT Trends Driving Change Source: Forbes Source: New York Times Source: Syncsort
  • 3. 3© Cloudera, Inc. All rights reserved. Customers are augmenting their traditional architectures for modern business needs.
  • 4. 4© Cloudera, Inc. All rights reserved. Operational Data Store (ODS): Ingesting, storing, and preparing data for both operational and analytical use. (AKA: Operational Data Warehouse., RDBMS, Storage)
  • 5. 5© Cloudera, Inc. All rights reserved. ODS Use Cases Offload resource intensive ETL workloads from systems Migrate old data and ELT workloads off of EDW Store old data online so analyst can access historic data ETL Offload EDW Optimization Active Archive
  • 6. 6© Cloudera, Inc. All rights reserved. Goals of an Operational Data Store Ingest Data Store DataPrepare Data Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load
  • 7. 7© Cloudera, Inc. All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1
  • 8. 8© Cloudera, Inc. All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest 2) Inefficient Data Processing Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1 2 2
  • 9. 9© Cloudera, Inc. All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest 2) Inefficient Data Processing 3) Data Archived Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1 2 2 3
  • 10. 10© Cloudera, Inc. All rights reserved. A New Way Forward 1) Ingest More Data ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 1 ETL BI System Modeling Reporting
  • 11. 11© Cloudera, Inc. All rights reserved. A New Way Forward 1) Ingest More Data 2) Optimize Data Processing ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 2 1 ETL BI System Modeling Reporting
  • 12. 12© Cloudera, Inc. All rights reserved. A New Way Forward 1) Ingest More Data 2) Optimize Data Processing 3) Automated Secure Archive ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 2 31 ETL BI System Modeling Reporting
  • 13. 13© Cloudera, Inc. All rights reserved. RelayHealth Customer Story
  • 14. 14© Cloudera, Inc. All rights reserved. About RelayHealth (A McKesson Business) What does RelayHealth do- RelayHealth is a financial solution of McKesson used to automate 2.4 billion financial transactions per year 200K Physicians, 2K Hospitals, 1.9K Payers/ Health Plans Who is McKesson- Largest healthcare solution company in the world with $103+ billion in revenue Headquarters in San Francisco and established in 1833 32K employees
  • 15. 15© Cloudera, Inc. All rights reserved. RelayHealth’s Objectives Offload resource intensive ETL workloads from systems Migrate old data and ELT workloads off of EDW Store old data online so analyst can access historic data ETL Offload EDW Optimization Active Archive
  • 16. 16© Cloudera, Inc. All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 1 RelayHealth Transaction BATCH Processing System
  • 17. 17© Cloudera, Inc. All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 2 Batch wasn’t cutting it 1 2 RelayHealth Transaction BATCH Processing System
  • 18. 18© Cloudera, Inc. All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 2 Batch wasn’t cutting it 3 Application & report latency 1 3 3 2 3 RelayHealth Transaction BATCH Processing System
  • 19. 19© Cloudera, Inc. All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling 1
  • 20. 20© Cloudera, Inc. All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling Stream & batch processing2 2 1
  • 21. 21© Cloudera, Inc. All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling Stream & batch processing2 Prepared for future use cases3 2 3 1
  • 22. 22© Cloudera, Inc. All rights reserved. Business and Technical ROI Technology ROI Business ROI 1) Active archive and Navigator for HIPAA compliance 2) Prepared for future use cases 3) Data ingest goes from end of day to near real-time 1) Transaction processed in 20ms VS 1 hour prior 2) $250k in licensing and hardware savings per year 3) Greater flexibility with data ingest
  • 23. 23© Cloudera, Inc. All rights reserved. Key Leanings Crawl, walk, run It takes time, start now Lean on experts in the community
  • 24. 24© Cloudera, Inc. All rights reserved. INSERT PARTNER SLIDES
  • 25. 25© Cloudera, Inc. All rights reserved. Thank you

Notas del editor

  1. Data storage costs: http://thecaucus.blogs.nytimes.com/2012/08/14/advances-in-data-storage-have-implications-for-government-surveillance/IoT: http://www.forbes.com/sites/gilpress/2014/08/22/internet-of-things-by-the-numbers-market-estimates-and-forecasts/ Resource Intensive ELT: http://www.syncsort.com/getattachment/45696aa9-1e40-43cb-8905-b9fc7e2519f7/Syncsort-Data-Warehouse-Offload-Solution.aspx
  2. An Operational Data Store provides a staging environment in order to ingest, store, and process data in preparation for operational and analytical use. Depending on whether or not this data is structured or unstructured, different systems can be used to optimize data pipelines. The only challenge is that as your organization continues to ask for larger volumes of diverse data, traditional systems face issues.
  3. These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  4. These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  5. These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  6. We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  7. We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  8. We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  9. Arrow from batch to stream processing