Hive user group presentation from Netflix (3/18/2010)

•

29 recomendaciones•11,288 vistas

Eva Tse

Tecnología

What are we trying to achieve? Scalable log analysis to gain business insights: Logs for website streaming (phase 1) All logs from web (phase 2) Output required: Engineers access: Ad-hoc query and reporting BI access: Flat files to be loaded into BI system for cross-functional reporting.

Some Metrics Parsing 0.6 TB logs per day Running 50+ persistent nodes

Architecture Overview Web App Phase 1 Phase 2 Phase 2 Chukwa Collector Log copy deamon Hive & Hadoop (for query) Hive MetaStore S3 HDFS / S3 Hive & Hadoop running on the cloud

Chukwa Streaming MyApp Collector ,[object Object]

Data sent to a remote collector using Thrift

Collector write to localFS/S3n/HDFS compressed

http://wiki.github.com/jboulon/Honu/ (stay tuned),[object Object]

Workflow to Hive (phase 2) Continuous log collection via Chukwa Generic and continuous parse/merge/load to ‘real-time’ Hive warehouse merge at hourly boundary and load to public Hive warehouse. SLA is 2 Hr on merged data. Daily/Hourly job: For summary. For publishing data to BI for reporting.

Más contenido relacionado

Destacado

20081030linkedinJeff Hammerbacher

Hive analytic workloads hadoop summit san jose 2014alanfgates

Introduction to Big Data processing (FGRE2016)Thomas Vanhove

Hadoop Summit 2009 HiveNamit Jain

Big Data Warehousing: Pig vs. Hive ComparisonCaserta

Hive Object ModelZheng Shao

Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu

Cost-based query optimization in Apache HiveJulian Hyde

Learning Apache HIVE - Data Warehouse and Query Language for HadoopSomeshwar Kale

Internal HiveRecruit Technologies

Hive Anatomynzhang

Hive Training -- Motivations and Real World Use Casesnzhang

Data Modeling for NoSQLTony Tam

AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...Amazon Web Services

Hive tuningMichael Zhang

SQL to Hive Cheat SheetHortonworks

HIVE: Data Warehousing & Analytics on HadoopZheng Shao

Replacing Telco DB/DW to Hadoop and HiveJunHo Cho

Hive Quick Start TutorialCarl Steinbach

(BDT305) Amazon EMR Deep Dive and Best PracticesAmazon Web Services

Destacado (20)

20081030linkedin

Hive analytic workloads hadoop summit san jose 2014

Introduction to Big Data processing (FGRE2016)

Hadoop Summit 2009 Hive

Big Data Warehousing: Pig vs. Hive Comparison

Hive Object Model

Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...

Cost-based query optimization in Apache Hive

Learning Apache HIVE - Data Warehouse and Query Language for Hadoop

Internal Hive

Hive Anatomy

Hive Training -- Motivations and Real World Use Cases

Data Modeling for NoSQL

AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...

Hive tuning

SQL to Hive Cheat Sheet

HIVE: Data Warehousing & Analytics on Hadoop

Replacing Telco DB/DW to Hadoop and Hive

Hive Quick Start Tutorial

(BDT305) Amazon EMR Deep Dive and Best Practices

Último

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Search Engine Optimization SEO PDF for 2024.pdfRankYa

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

"ML in Production",Oleksandr BaganFwdays

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

From Family Reminiscence to Scholarly Archive .Alan Dix

Hive user group presentation from Netflix (3/18/2010)

1. Use case study of Hive/Hadoop Eva Tse,Jerome Boulon

2. What are we trying to achieve? Scalable log analysis to gain business insights: Logs for website streaming (phase 1) All logs from web (phase 2) Output required: Engineers access: Ad-hoc query and reporting BI access: Flat files to be loaded into BI system for cross-functional reporting.

3. Some Metrics Parsing 0.6 TB logs per day Running 50+ persistent nodes

4. Architecture Overview Web App Phase 1 Phase 2 Phase 2 Chukwa Collector Log copy deamon Hive & Hadoop (for query) Hive MetaStore S3 HDFS / S3 Hive & Hadoop running on the cloud

6. Data sent to a remote collector using Thrift

7. Collector write to localFS/S3n/HDFS compressed

9. Workflow to Hive (phase 2) Continuous log collection via Chukwa Generic and continuous parse/merge/load to ‘real-time’ Hive warehouse merge at hourly boundary and load to public Hive warehouse. SLA is 2 Hr on merged data. Daily/Hourly job: For summary. For publishing data to BI for reporting.

10. Today’s Hive usage at Netflix Streaming summary data: CDN performance # of streams/day # of errors/session Test cell analysis Ad-hoc query for further analysis like: Raw log inspection Detailed inspection of one stream session Simple summary (e.g., percentile, count, max, min, bucketing) for operational metrics

11. Challenges Hive UI (for query building) Multi-DB support (Hive-675) and user access management Hive query on subset of partition files for handling late files (Hive-837 or Hive-951) Merging small files (can’t use hive.merge.mapfiles)

Hive user group presentation from Netflix (3/18/2010)

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Último

Último (20)

Hive user group presentation from Netflix (3/18/2010)