SlideShare una empresa de Scribd logo
1 de 17
Cloudera	
  Impala:	
  A	
  Modern	
  SQL	
  USE	
  PUBLICLY	
  
                                     DO	
  NOT	
  
Query	
  Engine	
  for	
  Hadoop	
   PRIOR	
  TO	
  10/23/12	
  
Headline	
  Goes	
  Here	
  
JusJn	
  Erickson	
  |	
  	
  Product	
  Manager	
  
 Speaker	
  Name	
  or	
  Subhead	
  Goes	
  Here	
  
January	
  2013	
  
Agenda	
  
    •  Intro	
  to	
  Impala	
  
    •  Impala’s	
  Architecture	
  
    •  Comparisons	
  




                                   Confidential. ©2013 Cloudera, Inc. All
2
                                             Rights Reserved.
Why	
  Hadoop?	
  
•    Scalability	
  
          •    Simply	
  scales	
  just	
  by	
  adding	
  nodes	
  
          •    Local	
  processing	
  to	
  avoid	
  network	
  boTlenecks	
  

•    Flexibility	
  
          •    All	
  kinds	
  of	
  data	
  (blobs,	
  documents,	
  records,	
  etc)	
  
          •    In	
  all	
  forms	
  (structured,	
  semi-­‐structured,	
  unstructured)	
  
          •    Store	
  anything	
  then	
  later	
  analyze	
  what	
  you	
  need	
  

•    Efficiency	
  
          •    Cost	
  efficiency	
  (<$1k/TB)	
  on	
  commodity	
  hardware	
  
          •    Unified	
  storage,	
  metadata,	
  security	
  (no	
  duplicaJon	
  or	
  synchronizaJon)	
  

                                               Confidential. ©2013 Cloudera, Inc. All
                                                         Rights Reserved.
What’s	
  Impala?	
  
    •    Interac<ve	
  SQL	
  
               •    Typically	
  4-­‐35x	
  faster	
  than	
  Hive	
  (observed	
  up	
  to	
  100x	
  faster)	
  
               •    Responses	
  in	
  seconds	
  instead	
  of	
  minutes	
  (someJmes	
  sub-­‐second)	
  

    •    Nearly	
  ANSI-­‐92	
  standard	
  SQL	
  queries	
  with	
  HiveQL	
  
               •    CompaJble	
  SQL	
  interface	
  for	
  exisJng	
  Hadoop/CDH	
  applicaJons	
  
               •    Based	
  on	
  industry	
  standard	
  SQL	
  

    •    Na<vely	
  on	
  Hadoop/HBase	
  storage	
  and	
  metadata	
  
               •    Flexibility,	
  scale,	
  and	
  cost	
  advantages	
  of	
  Hadoop	
  
               •    No	
  duplicaJon/synchronizaJon	
  of	
  data	
  and	
  metadata	
  
               •    Local	
  processing	
  to	
  avoid	
  network	
  boTlenecks	
  

    •    Separate	
  run<me	
  from	
  MapReduce	
  
               •    MapReduce	
  is	
  designed	
  and	
  great	
  for	
  batch	
  
               •    Impala	
  is	
  purpose-­‐built	
  for	
  low-­‐latency	
  SQL	
  queries	
  on	
  Hadoop	
  

                                                                Confidential. ©2013 Cloudera, Inc. All
4
                                                                          Rights Reserved.
So	
  what?	
  
    •  Interac<ve	
  BI/analy<cs	
  
          •  BI	
  tools	
  impracJcal	
  on	
  Hadoop	
  before	
  Impala	
  
          •  Move	
  from	
  10s	
  of	
  Hadoop	
  users	
  per	
  cluster	
  to	
  100s	
  of	
  SQL	
  users	
  
          •  More	
  and	
  faster	
  value	
  from	
  “big	
  data”	
  


    •  ELT/data	
  processing	
  with	
  <ght	
  SLAs	
  
          •  Sub-­‐minute	
  SLAs	
  now	
  possible	
  


    •  Cost	
  efficiency	
  
          •  Fewer	
  nodes	
  to	
  meet	
  response	
  Jme	
  SLAs	
  


                                             Confidential. ©2013 Cloudera, Inc. All
5
                                                       Rights Reserved.
Impala	
  Architecture	
  
•  Two	
  binaries:	
  impalad	
  and	
  statestored	
  
•  Impala	
  daemon	
  (impalad)	
  
     •  one	
  Impala	
  daemon	
  on	
  each	
  node	
  with	
  data	
  
     •  handles	
  external	
  client	
  requests	
  and	
  all	
  internal	
  requests	
  
       related	
  to	
  query	
  execuJon	
  
•  State	
  store	
  daemon	
  (statestored)	
  
     •  provides	
  name	
  service	
  and	
  metadata	
  distribuJon	
  

                               Confidential. ©2013 Cloudera, Inc. All
                                         Rights Reserved.
Impala	
  Architecture:	
  Query	
  ExecuJon	
  Phases	
  
•  Request	
  arrives	
  via	
  ODBC/JDBC/Beeswax/Shell	
  
•  Planner	
  turns	
  request	
  into	
  collecJons	
  of	
  plan	
  fragments	
  
•  Coordinator	
  iniJates	
  execuJon	
  on	
  impalad's	
  local	
  to	
  data	
  
•  During	
  execuJon:	
  
      •  intermediate	
  results	
  are	
  streamed	
  between	
  executors	
  
      •  query	
  results	
  are	
  streamed	
  back	
  to	
  client	
  
      •  subject	
  to	
  limitaJons	
  imposed	
  to	
  blocking	
  operators	
  (top-­‐n,	
  
        aggregaJon)	
  
                                   Confidential. ©2013 Cloudera, Inc. All
                                             Rights Reserved.
Impala	
  Architecture:	
  Planner	
  
•    Example:	
  query	
  with	
  join	
  and	
  aggregaJon	
  
     SELECT	
  state,	
  SUM(revenue)	
  
     FROM	
  HdfsTbl	
  h	
  JOIN	
  HbaseTbl	
  b	
  ON	
  (...)	
  
     GROUP	
  BY	
  1	
  ORDER	
  BY	
  2	
  desc	
  LIMIT	
  10	
  

            TopN	
                                                               Agg	
  
                                                   TopN	
  
             Agg	
                                                            Hash	
  
          Hash	
                                     Agg	
                      Join	
  
            Join	
                                                        Hdfs	
                        Hbase	
  
                                                    Exch	
                               Exch	
  
      Hdfs	
         Hbase	
                                              Scan	
                         Scan	
  
                                                at	
  coordinator	
     at	
  DataNodes	
           at	
  region	
  servers	
  
      Scan	
          Scan	
  


                                               Confidential. ©2013 Cloudera, Inc. All
                                                         Rights Reserved.
Impala	
  Architecture:	
  Query	
  ExecuJon	
  
•    Request	
  arrives	
  via	
  ODBC/JDBC/Beeswax/Shell	
  

           SQL	
  App	
                                   Hive	
  
                                                        Metastore	
  
                                                                          HDFS	
  NN	
         Statestore	
  
            ODBC	
  
                                SQL	
  
                              request	
  
           Query	
  Planner	
                    Query	
  Planner	
                   Query	
  Planner	
  
         Query	
  Coordinator	
                Query	
  Coordinator	
                Query	
  Coordinator	
  
          Query	
  Executor	
                  Query	
  Executor	
                Query	
  Executor	
  
         HDFS	
  DN	
   HBase	
               HDFS	
  DN	
   HBase	
             HDFS	
  DN	
   HBase	
  



                                            Confidential. ©2013 Cloudera, Inc. All
                                                      Rights Reserved.
Impala	
  Architecture:	
  Query	
  ExecuJon	
  
•    Planner	
  turns	
  request	
  into	
  collecJons	
  of	
  plan	
  fragments	
  
•    Coordinator	
  iniJates	
  execuJon	
  on	
  impalad's	
  local	
  to	
  data	
  
              SQL	
  App	
                                       Hive	
  
                                                               Metastore	
  
                                                                                    HDFS	
  NN	
      Statestore	
  
               ODBC	
  


             Query	
  Planner	
                       Query	
  Planner	
                     Query	
  Planner	
  
           Query	
  Coordinator	
                   Query	
  Coordinator	
                  Query	
  Coordinator	
  
           Query	
  Executor	
                      Query	
  Executor	
                     Query	
  Executor	
  
          HDFS	
  DN	
   HBase	
                   HDFS	
  DN	
   HBase	
                  HDFS	
  DN	
   HBase	
  



                                                 Confidential. ©2013 Cloudera, Inc. All
                                                           Rights Reserved.
Impala	
  Architecture:	
  Query	
  ExecuJon	
  
•    Intermediate	
  results	
  are	
  streamed	
  between	
  impalad’s	
  
•    Query	
  results	
  are	
  streamed	
  back	
  to	
  client	
  
             SQL	
  App	
                                   Hive	
  
                                                          Metastore	
  
                                                                              HDFS	
  NN	
     Statestore	
  
              ODBC	
  
                               query	
  
                              results	
  
            Query	
  Planner	
                    Query	
  Planner	
                  Query	
  Planner	
  
          Query	
  Coordinator	
                Query	
  Coordinator	
               Query	
  Coordinator	
  
           Query	
  Executor	
                   Query	
  Executor	
                  Query	
  Executor	
  
          HDFS	
  DN	
   HBase	
                HDFS	
  DN	
   HBase	
               HDFS	
  DN	
   HBase	
  



                                             Confidential. ©2013 Cloudera, Inc. All
                                                       Rights Reserved.
Impala	
  and	
  Hive	
  
•    Shared	
  with	
  Hive:	
  
          •    Metadata	
  (table	
  definiJons)	
  
          •    ODBC/JDBC	
  drivers	
  
          •    Hue	
  Beeswax	
  
          •    SQL	
  syntax	
  (HiveQL)	
  
          •    Flexible	
  file	
  formats	
  
          •    Machine	
  pool	
  

•    Improvements:	
  
          •    Purpose-­‐built	
  query	
  engine	
  direct	
  on	
  HDFS	
  and	
  HBase	
  
          •    No	
  JVM	
  startup	
  and	
  no	
  MapReduce	
  
          •    In-­‐memory	
  data	
  transfers	
  
          •    NaJve	
  distributed	
  relaJonal	
  query	
  engine	
  
                                                Confidential. ©2012 Cloudera, Inc. All
                                                          Rights Reserved.
What	
  about	
  an	
  EDW/RDBMS?	
  
     •    “Right	
  tool	
  for	
  the	
  right	
  job”	
  
     •    EDW/RDBMS	
  great	
  for:	
  
                 •    OLTP’s	
  complex	
  transacJons	
  
                 •    Highly	
  planned	
  and	
  opJmized	
  known	
  workloads	
  
                 •    Opera4onal	
  reports	
  and	
  drill	
  into	
  repeated	
  known	
  queries	
  

     •    Impala’s	
  great	
  for:	
  
                 •    Exploratory	
  analy4cs	
  with	
  new	
  previously-­‐unknown	
  queries	
  
                 •    Queries	
  on	
  big	
  and	
  growing	
  data	
  sets	
  

     •    EDW/RDBMS	
  can’t:	
  
                 •    Dump	
  in	
  raw	
  data	
  then	
  later	
  define	
  schema	
  and	
  query	
  what	
  you	
  want	
  
                 •    Evolve	
  schemas	
  without	
  an	
  expensive	
  schema	
  upgrade	
  planning	
  process	
  
                 •    Simply	
  scales	
  just	
  by	
  adding	
  nodes	
  
                 •    Store	
  at	
  <	
  $1k/TB	
  instead	
  of	
  $10-­‐150k/TB	
  

                                                                 Confidential. ©2013 Cloudera, Inc. All
13
                                                                           Rights Reserved.
AlternaJve	
  Hadoop	
  Query	
  Approaches	
  
            MapReduce	
                              Remote	
  Query	
                               Side	
  Storage	
  
       Query	
                              Query	
           Query	
         Query	
  
       Node	
                               Node	
            Node	
          Node	
              Query	
              MR	
  
                            Hive	
                                                                Engine	
  
        MR	
       OR	
      MR	
                                                                                      DN	
  
                                            NN	
  
         DN	
               HDFS	
  
                                                     DN	
            DN	
            DN	
  

High-­‐latency	
  MR	
                        Network	
  boTleneck	
                          Query	
  subset	
  of	
  data	
  
	
                                            	
                                              	
  
Separate	
  nodes	
  for	
  SQL/MR	
          Separate	
  nodes	
  for	
  SQL/MR	
            RDBMS	
  rigid	
  schema	
  
	
                                            	
                                              	
  
Duplicate	
  metadata,	
                      Duplicate	
  metadata,	
                        Duplicate	
  storage,	
  
security,	
  SQL,	
  MR,	
  etc.	
            security,	
  SQL,	
  MR,	
  etc.	
              metadata,	
  security,	
  SQL,	
  
	
                                           	
                                               etc.	
  
                                             	
  
                                         Confidential. ©2013 Cloudera, Inc. All
                                                     Rights Reserved.
Comparing	
  Impala	
  to	
  Dremel	
  
•    What	
  is	
  Dremel:	
  
          •    columnar	
  storage	
  for	
  data	
  with	
  nested	
  structures	
  
          •    distributed	
  scalable	
  aggregaJon	
  on	
  top	
  of	
  that	
  

•    Columnar	
  storage	
  in	
  Hadoop:	
  joint	
  project	
  between	
  Cloudera	
  and	
  TwiTer	
  
          •    new	
  columnar	
  format,	
  derived	
  from	
  Doug	
  Culng's	
  Trevni	
  
          •    stores	
  data	
  in	
  appropriate	
  naJve/binary	
  types	
  
          •    can	
  also	
  store	
  nested	
  structures	
  similar	
  to	
  Dremel's	
  ColumnIO	
  
•    Distributed	
  aggregaJon:	
  Impala	
  

•    Impala	
  plus	
  columnar	
  format:	
  a	
  superset	
  of	
  the	
  published	
  version	
  of	
  Dremel	
  
     (which	
  didn't	
  support	
  joins	
  and	
  mulJple	
  file	
  formats)	
  

                                                Confidential. ©2013 Cloudera, Inc. All
                                                          Rights Reserved.
Impala	
  Roadmap	
  
     •    GA	
  (target	
  April	
  2013)	
  
                •    All	
  CDH4	
  OSes:	
  RHEL/CentOS,	
  Ubuntu,	
  Debian,	
  SLES	
  
                •    JDBC	
  driver	
  
                •    More	
  formats:	
  Avro,	
  LZO-­‐compressed	
  
                •    Columnar	
  format	
  
                •    MR/Impala	
  resource	
  isolaJon	
  
                •    Perf	
  (joins,	
  aggregaJons,	
  SQL	
  features)	
  
                •    AutomaJc	
  metadata	
  distribuJon	
  

     •    Post-­‐GA	
  top	
  requests:	
  
                •    UDFs	
  
                •    Memory	
  caching	
  
                •    Nested	
  data	
  
                •    Window	
  funcJons	
  


                                                        Confidential. ©2013 Cloudera, Inc. All
16
                                                                  Rights Reserved.
Validated	
  Beta	
  Partners	
  


                        POWERED BY




                                   IMPALA


                    Confidential. ©2013 Cloudera, Inc. All
                              Rights Reserved.

Más contenido relacionado

La actualidad más candente

Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesCloudera, Inc.
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/KuduChris George
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentationhadooparchbook
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in ImpalaCloudera, Inc.
 

La actualidad más candente (20)

Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in Impala
 

Destacado

Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQLMike Crabb
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
mini MAXI art exhibition
mini MAXI art exhibitionmini MAXI art exhibition
mini MAXI art exhibitionAnna Casey
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Susanna-Assunta Sansone
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Enabling the Industry 4.0 vision: Hype? Real Opportunity!
Enabling the Industry 4.0 vision: Hype? Real Opportunity!Enabling the Industry 4.0 vision: Hype? Real Opportunity!
Enabling the Industry 4.0 vision: Hype? Real Opportunity!Boris Otto
 

Destacado (8)

Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQL
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
mini MAXI art exhibition
mini MAXI art exhibitionmini MAXI art exhibition
mini MAXI art exhibition
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Enabling the Industry 4.0 vision: Hype? Real Opportunity!
Enabling the Industry 4.0 vision: Hype? Real Opportunity!Enabling the Industry 4.0 vision: Hype? Real Opportunity!
Enabling the Industry 4.0 vision: Hype? Real Opportunity!
 

Similar a Cloudera Impala: A modern SQL Query Engine for Hadoop

Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Cloudera, Inc.
 
Jan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache HadoopJan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache HadoopYahoo Developer Network
 
Cloudera Impala presentation
Cloudera Impala presentationCloudera Impala presentation
Cloudera Impala presentationmarkgrover
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfssusere05ec21
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problemsAbhishek Gupta
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...Yahoo Developer Network
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseCloudera, Inc.
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 

Similar a Cloudera Impala: A modern SQL Query Engine for Hadoop (20)

Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
 
Jan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache HadoopJan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
 
Cloudera Impala presentation
Cloudera Impala presentationCloudera Impala presentation
Cloudera Impala presentation
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 

Último (20)

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 

Cloudera Impala: A modern SQL Query Engine for Hadoop

  • 1. Cloudera  Impala:  A  Modern  SQL  USE  PUBLICLY   DO  NOT   Query  Engine  for  Hadoop   PRIOR  TO  10/23/12   Headline  Goes  Here   JusJn  Erickson  |    Product  Manager   Speaker  Name  or  Subhead  Goes  Here   January  2013  
  • 2. Agenda   •  Intro  to  Impala   •  Impala’s  Architecture   •  Comparisons   Confidential. ©2013 Cloudera, Inc. All 2 Rights Reserved.
  • 3. Why  Hadoop?   •  Scalability   •  Simply  scales  just  by  adding  nodes   •  Local  processing  to  avoid  network  boTlenecks   •  Flexibility   •  All  kinds  of  data  (blobs,  documents,  records,  etc)   •  In  all  forms  (structured,  semi-­‐structured,  unstructured)   •  Store  anything  then  later  analyze  what  you  need   •  Efficiency   •  Cost  efficiency  (<$1k/TB)  on  commodity  hardware   •  Unified  storage,  metadata,  security  (no  duplicaJon  or  synchronizaJon)   Confidential. ©2013 Cloudera, Inc. All Rights Reserved.
  • 4. What’s  Impala?   •  Interac<ve  SQL   •  Typically  4-­‐35x  faster  than  Hive  (observed  up  to  100x  faster)   •  Responses  in  seconds  instead  of  minutes  (someJmes  sub-­‐second)   •  Nearly  ANSI-­‐92  standard  SQL  queries  with  HiveQL   •  CompaJble  SQL  interface  for  exisJng  Hadoop/CDH  applicaJons   •  Based  on  industry  standard  SQL   •  Na<vely  on  Hadoop/HBase  storage  and  metadata   •  Flexibility,  scale,  and  cost  advantages  of  Hadoop   •  No  duplicaJon/synchronizaJon  of  data  and  metadata   •  Local  processing  to  avoid  network  boTlenecks   •  Separate  run<me  from  MapReduce   •  MapReduce  is  designed  and  great  for  batch   •  Impala  is  purpose-­‐built  for  low-­‐latency  SQL  queries  on  Hadoop   Confidential. ©2013 Cloudera, Inc. All 4 Rights Reserved.
  • 5. So  what?   •  Interac<ve  BI/analy<cs   •  BI  tools  impracJcal  on  Hadoop  before  Impala   •  Move  from  10s  of  Hadoop  users  per  cluster  to  100s  of  SQL  users   •  More  and  faster  value  from  “big  data”   •  ELT/data  processing  with  <ght  SLAs   •  Sub-­‐minute  SLAs  now  possible   •  Cost  efficiency   •  Fewer  nodes  to  meet  response  Jme  SLAs   Confidential. ©2013 Cloudera, Inc. All 5 Rights Reserved.
  • 6. Impala  Architecture   •  Two  binaries:  impalad  and  statestored   •  Impala  daemon  (impalad)   •  one  Impala  daemon  on  each  node  with  data   •  handles  external  client  requests  and  all  internal  requests   related  to  query  execuJon   •  State  store  daemon  (statestored)   •  provides  name  service  and  metadata  distribuJon   Confidential. ©2013 Cloudera, Inc. All Rights Reserved.
  • 7. Impala  Architecture:  Query  ExecuJon  Phases   •  Request  arrives  via  ODBC/JDBC/Beeswax/Shell   •  Planner  turns  request  into  collecJons  of  plan  fragments   •  Coordinator  iniJates  execuJon  on  impalad's  local  to  data   •  During  execuJon:   •  intermediate  results  are  streamed  between  executors   •  query  results  are  streamed  back  to  client   •  subject  to  limitaJons  imposed  to  blocking  operators  (top-­‐n,   aggregaJon)   Confidential. ©2013 Cloudera, Inc. All Rights Reserved.
  • 8. Impala  Architecture:  Planner   •  Example:  query  with  join  and  aggregaJon   SELECT  state,  SUM(revenue)   FROM  HdfsTbl  h  JOIN  HbaseTbl  b  ON  (...)   GROUP  BY  1  ORDER  BY  2  desc  LIMIT  10   TopN   Agg   TopN   Agg   Hash   Hash   Agg   Join   Join   Hdfs   Hbase   Exch   Exch   Hdfs   Hbase   Scan   Scan   at  coordinator   at  DataNodes   at  region  servers   Scan   Scan   Confidential. ©2013 Cloudera, Inc. All Rights Reserved.
  • 9. Impala  Architecture:  Query  ExecuJon   •  Request  arrives  via  ODBC/JDBC/Beeswax/Shell   SQL  App   Hive   Metastore   HDFS  NN   Statestore   ODBC   SQL   request   Query  Planner   Query  Planner   Query  Planner   Query  Coordinator   Query  Coordinator   Query  Coordinator   Query  Executor   Query  Executor   Query  Executor   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   Confidential. ©2013 Cloudera, Inc. All Rights Reserved.
  • 10. Impala  Architecture:  Query  ExecuJon   •  Planner  turns  request  into  collecJons  of  plan  fragments   •  Coordinator  iniJates  execuJon  on  impalad's  local  to  data   SQL  App   Hive   Metastore   HDFS  NN   Statestore   ODBC   Query  Planner   Query  Planner   Query  Planner   Query  Coordinator   Query  Coordinator   Query  Coordinator   Query  Executor   Query  Executor   Query  Executor   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   Confidential. ©2013 Cloudera, Inc. All Rights Reserved.
  • 11. Impala  Architecture:  Query  ExecuJon   •  Intermediate  results  are  streamed  between  impalad’s   •  Query  results  are  streamed  back  to  client   SQL  App   Hive   Metastore   HDFS  NN   Statestore   ODBC   query   results   Query  Planner   Query  Planner   Query  Planner   Query  Coordinator   Query  Coordinator   Query  Coordinator   Query  Executor   Query  Executor   Query  Executor   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   Confidential. ©2013 Cloudera, Inc. All Rights Reserved.
  • 12. Impala  and  Hive   •  Shared  with  Hive:   •  Metadata  (table  definiJons)   •  ODBC/JDBC  drivers   •  Hue  Beeswax   •  SQL  syntax  (HiveQL)   •  Flexible  file  formats   •  Machine  pool   •  Improvements:   •  Purpose-­‐built  query  engine  direct  on  HDFS  and  HBase   •  No  JVM  startup  and  no  MapReduce   •  In-­‐memory  data  transfers   •  NaJve  distributed  relaJonal  query  engine   Confidential. ©2012 Cloudera, Inc. All Rights Reserved.
  • 13. What  about  an  EDW/RDBMS?   •  “Right  tool  for  the  right  job”   •  EDW/RDBMS  great  for:   •  OLTP’s  complex  transacJons   •  Highly  planned  and  opJmized  known  workloads   •  Opera4onal  reports  and  drill  into  repeated  known  queries   •  Impala’s  great  for:   •  Exploratory  analy4cs  with  new  previously-­‐unknown  queries   •  Queries  on  big  and  growing  data  sets   •  EDW/RDBMS  can’t:   •  Dump  in  raw  data  then  later  define  schema  and  query  what  you  want   •  Evolve  schemas  without  an  expensive  schema  upgrade  planning  process   •  Simply  scales  just  by  adding  nodes   •  Store  at  <  $1k/TB  instead  of  $10-­‐150k/TB   Confidential. ©2013 Cloudera, Inc. All 13 Rights Reserved.
  • 14. AlternaJve  Hadoop  Query  Approaches   MapReduce   Remote  Query   Side  Storage   Query   Query   Query   Query   Node   Node   Node   Node   Query   MR   Hive   Engine   MR   OR   MR   DN   NN   DN   HDFS   DN   DN   DN   High-­‐latency  MR   Network  boTleneck   Query  subset  of  data         Separate  nodes  for  SQL/MR   Separate  nodes  for  SQL/MR   RDBMS  rigid  schema         Duplicate  metadata,   Duplicate  metadata,   Duplicate  storage,   security,  SQL,  MR,  etc.   security,  SQL,  MR,  etc.   metadata,  security,  SQL,       etc.     Confidential. ©2013 Cloudera, Inc. All Rights Reserved.
  • 15. Comparing  Impala  to  Dremel   •  What  is  Dremel:   •  columnar  storage  for  data  with  nested  structures   •  distributed  scalable  aggregaJon  on  top  of  that   •  Columnar  storage  in  Hadoop:  joint  project  between  Cloudera  and  TwiTer   •  new  columnar  format,  derived  from  Doug  Culng's  Trevni   •  stores  data  in  appropriate  naJve/binary  types   •  can  also  store  nested  structures  similar  to  Dremel's  ColumnIO   •  Distributed  aggregaJon:  Impala   •  Impala  plus  columnar  format:  a  superset  of  the  published  version  of  Dremel   (which  didn't  support  joins  and  mulJple  file  formats)   Confidential. ©2013 Cloudera, Inc. All Rights Reserved.
  • 16. Impala  Roadmap   •  GA  (target  April  2013)   •  All  CDH4  OSes:  RHEL/CentOS,  Ubuntu,  Debian,  SLES   •  JDBC  driver   •  More  formats:  Avro,  LZO-­‐compressed   •  Columnar  format   •  MR/Impala  resource  isolaJon   •  Perf  (joins,  aggregaJons,  SQL  features)   •  AutomaJc  metadata  distribuJon   •  Post-­‐GA  top  requests:   •  UDFs   •  Memory  caching   •  Nested  data   •  Window  funcJons   Confidential. ©2013 Cloudera, Inc. All 16 Rights Reserved.
  • 17. Validated  Beta  Partners   POWERED BY IMPALA Confidential. ©2013 Cloudera, Inc. All Rights Reserved.