SlideShare una empresa de Scribd logo
1 de 31
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ian Robinson, Specialist SA, Data and Analytics, EMEA
20 September 2017
Serverless Big Data Analytics with
Amazon Athena and QuickSight
data answers
COLLECT STORE
PROCESS/
ANALYZE
CONSUME
time to first answer
Agile Analytics
• Experiment
• Invest in promising experiments
• Fail fast
• React quickly
Serverless Analytics
Amazon S3 Highly durable object storage
AWS Glue Data catalog and managed ETL
Amazon Athena Serverless interactive SQL queries
Amazon QuickSight Business analytics service
Raw S3
Data
Canonical
Data
Amazon
Athena
Amazon
Quicksight
ETL Job
Data
Catalog
describes
describes
uses
AWS Glue
AWS Glue: Components
Data Catalog
 Hive Metastore compatible with enhanced functionality
 Crawlers automatically extracts metadata and creates tables
 Integrated with Amazon Athena, Amazon Redshift Spectrum
Job Execution
 Run jobs on a serverless Spark platform
 Provides flexible scheduling
 Handles dependency resolution, monitoring and alerting
Job Authoring
 Auto-generates ETL code
 Build on open frameworks – Python and Spark
 Developer-centric – editing, debugging, sharing
Taxi
csv
Limo
csv Taxi ETL Job
1.6 GB 94.8 MB
Limo ETL Job
220.3 MB 18 MB
Canonical
Data
parquet
Amazon
Athena
Amazon
Quicksight
Review Database and Table Definitions
Amazon Athena
Introducing Amazon Athena
Amazon Athena is an interactive query service
that makes it easy to analyze data directly from
Amazon S3 using standard SQL
Athena is Serverless
• No Infrastructure or
administration
• Zero Spin up time
• Transparent upgrades
Amazon Athena is Easy To Use
• Log into the Console
• Create a table
• Type in a Hive DDL Statement
• Use the console Add Table wizard
• Use tables from Glue’s Data Catalog
• Start querying
Amazon Athena is Highly Available
• You connect to a service endpoint or log into the console
• Athena uses warm compute pools across multiple
Availability Zones
• Your data is in Amazon S3, which is also highly available
and designed for 99.999999999% durability
Query Data Directly from Amazon S3
• No loading of data
• Query data in its raw format
• Avro, Text, CSV, JSON, weblogs, AWS service logs
• Convert to an optimized form like ORC or Parquet for the best
performance and lowest cost
• No ETL required
• Stream data directly from Amazon S3
• Take advantage of Amazon S3 durability and availability
Familiar Technologies Under the Covers
Used for SQL Queries
In-memory distributed query engine
ANSI-SQL compatible with extensions
Used for DDL functionality
Complex data types
Multitude of formats
Supports data partitioning
EXTERNAL tables – no impact on underlying data
Use ANSI SQL
• Start writing ANSI SQL
• Support for complex joins, nested
queries & window functions
• Support for complex data types
(arrays, structs)
• Support for partitioning of data by
any key
• (date, time, custom keys)
• e.g., Year, Month, Day, Hour or
Customer Key, Date
Amazon Athena Supports Multiple Data Formats
• Text files, e.g. CSV, TSV, custom delimiter
• Apache Web Logs, CloudTrail logs
• JSON (simple, nested), AVRO
• Columnar formats, e.g. Apache Parquet & Apache ORC
• Logstash Grok for unstructured text files
• Compressed files (Snappy, Zlib, GZIP, and LZO)
• Encrypted data (SSE-S3, SSE-KMS, CSE-KMS)
Use large (128MB – 1GB) compressed files
New!
Amazon Athena is Fast
• Tuned for performance
• Automatically parallelizes queries
• Results are streamed to console
• Results also stored in S3
• Improve query performance:
• Compress your data
• Use columnar formats
• Partition your data
Amazon Athena is Cost Effective
• Pay per query
• $5 per TB scanned from S3
• DDL Queries and failed queries are free
• Reduce costs:
• Compress your data
• Use columnar formats
• Partition your data
PARQUET
• Columnar format
• Schema segregated into footer
• Column major format
• All data is pushed to the leaf
• Integrated compression and
indexes
• Support for predicate pushdown
Apache Parquet and Apache ORC – Columnar Formats
ORC
• Apache top level project
• Schema segregated into footer
• Column major with stripes
• Integrated compression,
indexes, and stats
• Support for predicate pushdown
Pay By the Query - $5/TB Scanned
• Pay by the amount of data scanned per query
• Ways to save costs
• Compress
• Convert to Columnar format
• Use partitioning
• Free: DDL Queries, Failed Queries
Dataset Size on Amazon S3 Query Run time Data Scanned Cost
Logs stored as Text
files
1 TB 237 seconds 1.15TB $5.75
Logs stored in
Apache Parquet
format*
130 GB 5.13 seconds 2.69 GB $0.013
Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper
Converting to ORC and PARQUET
• Use Glue!
• You can use Hive CTAS to convert data
• CREATE TABLE new_key_value_store
• STORED AS PARQUET
• AS
• SELECT col_1, col2, col3 FROM noncolumartable
• SORT BY new_key, key_value_pair;
• You can also use Spark to convert the file into PARQUET / ORC
• 20 lines of Pyspark code, running on EMR
• Converts 1TB of text data into 130 GB of Parquet with Snappy compression
• Total cost $5
https://github.com/awslabs/aws-big-data-blog/tree/master/aws-blog-spark-parquet-conversion
Start Querying with Amazon Athena
• Review console
• Run Glue crawler to create canonical table definition
• Change datatypes
• Run some simple queries
Raw S3
Data
Canonical
Data
Amazon
Athena
Amazon
QuicksightData
Catalog
describes
describes
uses
ETL Job
ETL Uber Data into Parquet
glueContext.write_dynamic_frame.from_options(
frame = datasource1,
connection_type = "s3",
connection_options =
{"path": "s3://ianrob-nyc-transportation"},
format = "parquet",
transformation_ctx = "datasink4")
Amazon QuickSight
QuickSight allows you to connect to data from a wide variety of AWS, third-party, and on-
premises sources including Amazon Athena
Amazon RDS
Amazon S3
Amazon Redshift
Amazon Athena
Using Amazon Athena with Amazon QuickSight
AOB
AWS Glue regional availability plan
Planned schedule Regions
At launch US East (N. Virginia)
Q3 2017 US East (Ohio), US West (Oregon)
Q4 2017 EU (Ireland), Asia Pacific (Tokyo), Asia Pacific (Sydney)
2018 Rest of the public regions
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/
Serverlesss Big Data Analytics with Amazon Athena and Quicksight

Más contenido relacionado

La actualidad más candente

VUCA 시대의 디지털 네이티브 리더가 알아야할 AWS의 기술 ::: AWS ExecLeaders Korea 2023
VUCA 시대의 디지털 네이티브 리더가 알아야할 AWS의 기술 ::: AWS ExecLeaders Korea 2023 VUCA 시대의 디지털 네이티브 리더가 알아야할 AWS의 기술 ::: AWS ExecLeaders Korea 2023
VUCA 시대의 디지털 네이티브 리더가 알아야할 AWS의 기술 ::: AWS ExecLeaders Korea 2023 Amazon Web Services Korea
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Amazon Web Services
 
Network Security and Access Control in AWS
Network Security and Access Control in AWSNetwork Security and Access Control in AWS
Network Security and Access Control in AWSAmazon Web Services
 
Executing a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWSExecuting a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWSAmazon Web Services
 
Azure Storage
Azure StorageAzure Storage
Azure StorageMustafa
 
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon Web Services
 
Data saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukData saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukErwin de Kreuk
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekMark Kromer
 
Azure Security Overview
Azure Security OverviewAzure Security Overview
Azure Security OverviewAllen Brokken
 
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...Amazon Web Services Korea
 
다양한 솔루션으로 만들어가는 AWS 네트워크 보안::이경수::AWS Summit Seoul 2018
다양한 솔루션으로 만들어가는 AWS 네트워크 보안::이경수::AWS Summit Seoul 2018다양한 솔루션으로 만들어가는 AWS 네트워크 보안::이경수::AWS Summit Seoul 2018
다양한 솔루션으로 만들어가는 AWS 네트워크 보안::이경수::AWS Summit Seoul 2018Amazon Web Services Korea
 
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...Amazon Web Services Korea
 

La actualidad más candente (20)

VUCA 시대의 디지털 네이티브 리더가 알아야할 AWS의 기술 ::: AWS ExecLeaders Korea 2023
VUCA 시대의 디지털 네이티브 리더가 알아야할 AWS의 기술 ::: AWS ExecLeaders Korea 2023 VUCA 시대의 디지털 네이티브 리더가 알아야할 AWS의 기술 ::: AWS ExecLeaders Korea 2023
VUCA 시대의 디지털 네이티브 리더가 알아야할 AWS의 기술 ::: AWS ExecLeaders Korea 2023
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)
 
AWS Cloud Security Fundamentals
AWS Cloud Security FundamentalsAWS Cloud Security Fundamentals
AWS Cloud Security Fundamentals
 
Network Security and Access Control in AWS
Network Security and Access Control in AWSNetwork Security and Access Control in AWS
Network Security and Access Control in AWS
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Executing a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWSExecuting a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWS
 
Azure Storage
Azure StorageAzure Storage
Azure Storage
 
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
 
Data saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukData saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de Kreuk
 
AWS core services
AWS core servicesAWS core services
AWS core services
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
 
Azure Security Overview
Azure Security OverviewAzure Security Overview
Azure Security Overview
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
 
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
 
Deep dive into AWS IAM
Deep dive into AWS IAMDeep dive into AWS IAM
Deep dive into AWS IAM
 
다양한 솔루션으로 만들어가는 AWS 네트워크 보안::이경수::AWS Summit Seoul 2018
다양한 솔루션으로 만들어가는 AWS 네트워크 보안::이경수::AWS Summit Seoul 2018다양한 솔루션으로 만들어가는 AWS 네트워크 보안::이경수::AWS Summit Seoul 2018
다양한 솔루션으로 만들어가는 AWS 네트워크 보안::이경수::AWS Summit Seoul 2018
 
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
 
Azure datafactory
Azure datafactoryAzure datafactory
Azure datafactory
 

Similar a Serverlesss Big Data Analytics with Amazon Athena and Quicksight

Serverless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSightServerless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSightAmazon Web Services
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料Amazon Web Services
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
 
Los Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveLos Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveKevin Epstein
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea
 
Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017David McDaniel
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS Amazon Web Services
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Amazon Web Services
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Rukmani Gopalan
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSAmazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
 

Similar a Serverlesss Big Data Analytics with Amazon Athena and Quicksight (20)

Serverless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSightServerless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSight
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
 
Los Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveLos Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep Dive
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 
Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWS
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Serverlesss Big Data Analytics with Amazon Athena and Quicksight

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ian Robinson, Specialist SA, Data and Analytics, EMEA 20 September 2017 Serverless Big Data Analytics with Amazon Athena and QuickSight
  • 3. Agile Analytics • Experiment • Invest in promising experiments • Fail fast • React quickly
  • 4. Serverless Analytics Amazon S3 Highly durable object storage AWS Glue Data catalog and managed ETL Amazon Athena Serverless interactive SQL queries Amazon QuickSight Business analytics service
  • 7. AWS Glue: Components Data Catalog  Hive Metastore compatible with enhanced functionality  Crawlers automatically extracts metadata and creates tables  Integrated with Amazon Athena, Amazon Redshift Spectrum Job Execution  Run jobs on a serverless Spark platform  Provides flexible scheduling  Handles dependency resolution, monitoring and alerting Job Authoring  Auto-generates ETL code  Build on open frameworks – Python and Spark  Developer-centric – editing, debugging, sharing
  • 8. Taxi csv Limo csv Taxi ETL Job 1.6 GB 94.8 MB Limo ETL Job 220.3 MB 18 MB Canonical Data parquet Amazon Athena Amazon Quicksight
  • 9. Review Database and Table Definitions
  • 11. Introducing Amazon Athena Amazon Athena is an interactive query service that makes it easy to analyze data directly from Amazon S3 using standard SQL
  • 12. Athena is Serverless • No Infrastructure or administration • Zero Spin up time • Transparent upgrades
  • 13. Amazon Athena is Easy To Use • Log into the Console • Create a table • Type in a Hive DDL Statement • Use the console Add Table wizard • Use tables from Glue’s Data Catalog • Start querying
  • 14. Amazon Athena is Highly Available • You connect to a service endpoint or log into the console • Athena uses warm compute pools across multiple Availability Zones • Your data is in Amazon S3, which is also highly available and designed for 99.999999999% durability
  • 15. Query Data Directly from Amazon S3 • No loading of data • Query data in its raw format • Avro, Text, CSV, JSON, weblogs, AWS service logs • Convert to an optimized form like ORC or Parquet for the best performance and lowest cost • No ETL required • Stream data directly from Amazon S3 • Take advantage of Amazon S3 durability and availability
  • 16. Familiar Technologies Under the Covers Used for SQL Queries In-memory distributed query engine ANSI-SQL compatible with extensions Used for DDL functionality Complex data types Multitude of formats Supports data partitioning EXTERNAL tables – no impact on underlying data
  • 17. Use ANSI SQL • Start writing ANSI SQL • Support for complex joins, nested queries & window functions • Support for complex data types (arrays, structs) • Support for partitioning of data by any key • (date, time, custom keys) • e.g., Year, Month, Day, Hour or Customer Key, Date
  • 18. Amazon Athena Supports Multiple Data Formats • Text files, e.g. CSV, TSV, custom delimiter • Apache Web Logs, CloudTrail logs • JSON (simple, nested), AVRO • Columnar formats, e.g. Apache Parquet & Apache ORC • Logstash Grok for unstructured text files • Compressed files (Snappy, Zlib, GZIP, and LZO) • Encrypted data (SSE-S3, SSE-KMS, CSE-KMS) Use large (128MB – 1GB) compressed files New!
  • 19. Amazon Athena is Fast • Tuned for performance • Automatically parallelizes queries • Results are streamed to console • Results also stored in S3 • Improve query performance: • Compress your data • Use columnar formats • Partition your data
  • 20. Amazon Athena is Cost Effective • Pay per query • $5 per TB scanned from S3 • DDL Queries and failed queries are free • Reduce costs: • Compress your data • Use columnar formats • Partition your data
  • 21. PARQUET • Columnar format • Schema segregated into footer • Column major format • All data is pushed to the leaf • Integrated compression and indexes • Support for predicate pushdown Apache Parquet and Apache ORC – Columnar Formats ORC • Apache top level project • Schema segregated into footer • Column major with stripes • Integrated compression, indexes, and stats • Support for predicate pushdown
  • 22. Pay By the Query - $5/TB Scanned • Pay by the amount of data scanned per query • Ways to save costs • Compress • Convert to Columnar format • Use partitioning • Free: DDL Queries, Failed Queries Dataset Size on Amazon S3 Query Run time Data Scanned Cost Logs stored as Text files 1 TB 237 seconds 1.15TB $5.75 Logs stored in Apache Parquet format* 130 GB 5.13 seconds 2.69 GB $0.013 Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper
  • 23. Converting to ORC and PARQUET • Use Glue! • You can use Hive CTAS to convert data • CREATE TABLE new_key_value_store • STORED AS PARQUET • AS • SELECT col_1, col2, col3 FROM noncolumartable • SORT BY new_key, key_value_pair; • You can also use Spark to convert the file into PARQUET / ORC • 20 lines of Pyspark code, running on EMR • Converts 1TB of text data into 130 GB of Parquet with Snappy compression • Total cost $5 https://github.com/awslabs/aws-big-data-blog/tree/master/aws-blog-spark-parquet-conversion
  • 24. Start Querying with Amazon Athena • Review console • Run Glue crawler to create canonical table definition • Change datatypes • Run some simple queries Raw S3 Data Canonical Data Amazon Athena Amazon QuicksightData Catalog describes describes uses ETL Job
  • 25. ETL Uber Data into Parquet glueContext.write_dynamic_frame.from_options( frame = datasource1, connection_type = "s3", connection_options = {"path": "s3://ianrob-nyc-transportation"}, format = "parquet", transformation_ctx = "datasink4")
  • 27. QuickSight allows you to connect to data from a wide variety of AWS, third-party, and on- premises sources including Amazon Athena Amazon RDS Amazon S3 Amazon Redshift Amazon Athena Using Amazon Athena with Amazon QuickSight
  • 28. AOB
  • 29. AWS Glue regional availability plan Planned schedule Regions At launch US East (N. Virginia) Q3 2017 US East (Ohio), US West (Oregon) Q4 2017 EU (Ireland), Asia Pacific (Tokyo), Asia Pacific (Sydney) 2018 Rest of the public regions