SlideShare a Scribd company logo
1 of 27
USING HADOOP & HBASE
TO BUILD CONTENT
RELEVANCE &
PERSONALIZATION
Tools to build your big data application
Ameya Kanitkar
Ameya Kanitkar – That‟s me!
• Big Data Infrastructure Engineer @ Groupon, Palo Alto

USA (Working on Deal Relevance & Personalization
Systems)
ameya.kanitkar@gmail.com
http://www.linkedin.com/in/ameyakanitkar
@aktwits
Agenda
 Basics of Hadoop & HBase
 How you can use Hadoop & HBase for big data

application
 Case Study: Deal Relevance and Personalization

Systems at Groupon with Hadoop & HBase
Big Data Application Examples
 Recommendation Systems
 Ad targeting
 Personalization Systems
 BI/ DW
 Log Analysis
 Natural Language Processing
So what is Hadoop?
 General purpose framework for processing huge

amounts of data.
 Open Source

 Batch / Offline Oriented
Hadoop - HDFS
 Open Source Distributed File System.

 Store large files. Can easily be accessed via application

built on top of HDFS.
 Data is distributed and replicated over multiple machines
 Linux Style commands eg. ls, cp, mv, touchz etc
Hadoop – HDFS
 Example:

hadoop fs –dus /data/
185453399927478 bytes =~ 168 TB

(One of the folders from one of our hadoop cluster)
Hadoop – Map Reduce
 Application Framework built on top of HDFS to process

your big data
 Operates on key-value pairs
 Mappers filter and transform input data
 Reducers aggregate mapper output
Example
• Given web logs, calculate landing page conversion rate

for each product

• So basically we need to see how many impressions each

product received and then calculate conversion rate of for
each product
Map Reduce Example
Map Phase

Reduce Phase

Map 1: Process Log File:
Output: Key (Product ID), Value
(Impression Count)
Map 2: Process Log File:
Output: Key (Product ID), Value
(Impression Count)
Map N: Process Log File:
Output: Key (Product ID), Value
(Impression Count)

Reducer: Here we receive all
data for a given product. Just run
simple for loop to calculate
conversion rate.
(Output: Product ID, Conversion
Rate
Recap
 We just processed terabytes of data, and calculated

conversion rate across millions of products.
 Note: This is batch process only. It takes time. You can

not start this process after some one visits your website.

How about we generate recommendations in batch process
and serve them in real time?
HBase
 Provides real time random read/ write access over HDFS

 Built on Google‟s „Big Table‟ design
 Open Sourced

This is not RDBMS, so no joins. Access patterns are

generally simple like get(key), put(key, value) etc.
Row

Cf:<qual>

Cf:<qual>

Row 1

Cf1:qual1

Cf1:qual2

Row 11

Cf1:qual2

Cf1:qual22

Row 2

….

Cf2:qual1

Cf1:qual3

Row N

 Dynamic Column Names. No need to define columns upfront.
 Both rows and columns are (lexicological) sorted

Cf:<qual>
….

Row

Cf:<qual>

user1

Cf1:click_history:{actual_cl Cf1:purchases:{actual_pur
icks_data}
chases}

user11

Cf1:purchases:{actual_pur
chases}

user20

Cf1:mobile_impressions:{a Cf1:purchases:{actual_pur
ctual mobile impressions} chases}

Note: Each row has different columns, So think about this as a hash map rather
than at table with rows and columns
Putting it all together
Store data in
HDFS

Web
Generate
Recommendations
(Map Reduce)

Serve Real Time
Requests
(HBase)

Analyze Data
(Map Reduce)

Do offline analysis in Hadoop, and serve real time requests with HBase

Mobile
Use Case: Deal Relevance &
Personalization @ Groupon
What are Groupon Deals?
Our Relevance Scenario
Users
Our Relevance Scenario
How do we surface relevant
deals ?
Users
 Deals are perishable (Deals
expire or are sold out)
 No direct user intent (As in
traditional search
advertising)

 Relatively Limited User
Information
 Deals are highly local
Two Sides to the Relevance Problem

Algorithmic
Issues

Scaling
Issues

How to find
relevant deals for
individual users
given a set of
optimization criteria

How to handle
relevance for
all users across
multiple
delivery platforms
Developing Deal Ranking Algorithms
• Exploring Data
• Understanding signals, finding

patterns

• Building Models/Heuristics
• Employ both classical machine

learning techniques and heuristic
adjustments to estimate user
purchasing behavior

• Conduct Experiments
• Try out ideas on real users and

evaluate their effect
Data Infrastructure
Growing Deals
2011

2012

Growing Users
2013

 100 Million+
subscribers

 We need to store data

20+

like, user click history,
400+

email records, service

logs etc. This tunes to
2000+

billions of data points
and TB‟s of data
Deal Personalization Infrastructure Use
Cases
• Deliver Personalized

Emails

• Deliver Personalized

Website & Mobile
Experience

Email

Personalize billions of emails for
hundredsof millions of users

Offline System

Personalize one of the most popular
e-commerce mobile & web app
for hundreds of millions of
users & page views

Online System
Architecture
• We can now
maintain different
SLA on online and
offline systems

Email

Real Time
Relevance

Relevance
Map/Reduce

HBase
Offline
System

Data Pipeline

Replication

HBase for
Online System

• We can tune
HBase cluster
differently for
online and offline
systems
HBase Schema Design
User ID

Column Family 1

Column Family 2

Unique Identifier
for Users

User History and
Profile Information

Email History For Users

Overwrite user history
and profile info

Append email history for
each day as a separate
columns. (On avg each
row has over 200
columns)

• Most of our data access patterns are via “User Key”
• This makes it easy to design HBase schema
• The actual data is kept in JSON
Cluster Sizing
HBase
Replication

Hadoop +
HBase
Cluster

100+ machine Hadoop
cluster, this runs heavy
map reduce jobs
The same cluster also
hosts 15 node HBase
cluster

Online HBase
Cluster

10 Machine
dedicated HBase
cluster to serve
real time SLA

• Machine Profile
• 96 GB RAM (HBase
25 GB)
• 24 Virtual Cores
CPU
• 8 2TB Disks
• Data Profile
• 100 Million+
Records
• 2TB+ Data
• Over 4.2 Billion Data
Points
Questions?

Thank You!
(We are hiring!)
www.groupon.com/techjobs

More Related Content

What's hot

AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAmazon Web Services
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integrationibi
 
HBaseCon 2015: Running ML Infrastructure on HBase
HBaseCon 2015: Running ML Infrastructure on HBaseHBaseCon 2015: Running ML Infrastructure on HBase
HBaseCon 2015: Running ML Infrastructure on HBaseHBaseCon
 
Rich Data Graphs for MapReduce
Rich Data Graphs for MapReduceRich Data Graphs for MapReduce
Rich Data Graphs for MapReduceScott Cinnamond
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBMongoDB
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiHBaseCon
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudAmazon Web Services
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, whenEugenio Minardi
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Data Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit SydneyData Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit SydneyAmazon Web Services
 
The Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | WhitepaperThe Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | WhitepaperVasu S
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Cloudera, Inc.
 
Customer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveCustomer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveDatabricks
 

What's hot (20)

AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AI
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
 
HBaseCon 2015: Running ML Infrastructure on HBase
HBaseCon 2015: Running ML Infrastructure on HBaseHBaseCon 2015: Running ML Infrastructure on HBase
HBaseCon 2015: Running ML Infrastructure on HBase
 
Rich Data Graphs for MapReduce
Rich Data Graphs for MapReduceRich Data Graphs for MapReduce
Rich Data Graphs for MapReduce
 
Yahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile PlatformYahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile Platform
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Data Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit SydneyData Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit Sydney
 
The Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | WhitepaperThe Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | Whitepaper
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
Big Data - Part I
Big Data - Part IBig Data - Part I
Big Data - Part I
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Big Data - Part II
Big Data - Part IIBig Data - Part II
Big Data - Part II
 
Customer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveCustomer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data Perspective
 

Viewers also liked

Pairing with the queen
Pairing with the queenPairing with the queen
Pairing with the queenDiego Pacheco
 
Campaignion (re:campaign2013)
Campaignion (re:campaign2013)Campaignion (re:campaign2013)
Campaignion (re:campaign2013)more onion
 
T H E L A T E S T T E C H
T H E  L A T E S T  T E C HT H E  L A T E S T  T E C H
T H E L A T E S T T E C HFree- Dominius
 
SEOGuardian - Lencería Online - Informe SEO y SEM
SEOGuardian - Lencería Online - Informe SEO y SEMSEOGuardian - Lencería Online - Informe SEO y SEM
SEOGuardian - Lencería Online - Informe SEO y SEMBint
 
Reputation management tips from Shashi Bellamkonda of Network Solutions
Reputation management tips from Shashi Bellamkonda of Network SolutionsReputation management tips from Shashi Bellamkonda of Network Solutions
Reputation management tips from Shashi Bellamkonda of Network SolutionsWeb.com
 
Marcus Taylor - Getting Practical: Facebook Marketing (Darker Music Talks Jun...
Marcus Taylor - Getting Practical: Facebook Marketing (Darker Music Talks Jun...Marcus Taylor - Getting Practical: Facebook Marketing (Darker Music Talks Jun...
Marcus Taylor - Getting Practical: Facebook Marketing (Darker Music Talks Jun...Tommy Darker
 
What Women Want?
What Women Want?What Women Want?
What Women Want?RedPrairie
 
Agile Australia 2016 - Rescuing Legacy Software from Impending Doom
Agile Australia 2016 - Rescuing Legacy Software from Impending DoomAgile Australia 2016 - Rescuing Legacy Software from Impending Doom
Agile Australia 2016 - Rescuing Legacy Software from Impending DoomJacques De Vos
 
020.guerra.civil. .x-factor.v2.08.hq.br.07 mar07.os.impossiveis.br.gibihq
020.guerra.civil. .x-factor.v2.08.hq.br.07 mar07.os.impossiveis.br.gibihq020.guerra.civil. .x-factor.v2.08.hq.br.07 mar07.os.impossiveis.br.gibihq
020.guerra.civil. .x-factor.v2.08.hq.br.07 mar07.os.impossiveis.br.gibihqMarcos Donato
 
Ghana Capability Statement
Ghana Capability StatementGhana Capability Statement
Ghana Capability Statementjobademas
 
Patrick Zandl: Energy industry post Edison, Křižík & IoT
Patrick Zandl: Energy industry post Edison, Křižík & IoTPatrick Zandl: Energy industry post Edison, Křižík & IoT
Patrick Zandl: Energy industry post Edison, Křižík & IoTWebExpo
 
20120513 repeatsinsymbolicsequences shur_lecture05-06
20120513 repeatsinsymbolicsequences shur_lecture05-0620120513 repeatsinsymbolicsequences shur_lecture05-06
20120513 repeatsinsymbolicsequences shur_lecture05-06Computer Science Club
 
Benjamin Holmquist - Rhetorical Criticism Project
Benjamin Holmquist - Rhetorical Criticism ProjectBenjamin Holmquist - Rhetorical Criticism Project
Benjamin Holmquist - Rhetorical Criticism Projectbenjaminholmquist
 
20080309 efficientalgorithms kulikov_lecture15
20080309 efficientalgorithms kulikov_lecture1520080309 efficientalgorithms kulikov_lecture15
20080309 efficientalgorithms kulikov_lecture15Computer Science Club
 

Viewers also liked (20)

Water Disaster
Water DisasterWater Disaster
Water Disaster
 
Managing benefits from projects - the NHS way - 23rd Sept 2015
Managing benefits from projects - the NHS way - 23rd Sept 2015Managing benefits from projects - the NHS way - 23rd Sept 2015
Managing benefits from projects - the NHS way - 23rd Sept 2015
 
Pairing with the queen
Pairing with the queenPairing with the queen
Pairing with the queen
 
Campaignion (re:campaign2013)
Campaignion (re:campaign2013)Campaignion (re:campaign2013)
Campaignion (re:campaign2013)
 
T H E L A T E S T T E C H
T H E  L A T E S T  T E C HT H E  L A T E S T  T E C H
T H E L A T E S T T E C H
 
Pizza do amor
Pizza do amorPizza do amor
Pizza do amor
 
SEOGuardian - Lencería Online - Informe SEO y SEM
SEOGuardian - Lencería Online - Informe SEO y SEMSEOGuardian - Lencería Online - Informe SEO y SEM
SEOGuardian - Lencería Online - Informe SEO y SEM
 
Reputation management tips from Shashi Bellamkonda of Network Solutions
Reputation management tips from Shashi Bellamkonda of Network SolutionsReputation management tips from Shashi Bellamkonda of Network Solutions
Reputation management tips from Shashi Bellamkonda of Network Solutions
 
Marcus Taylor - Getting Practical: Facebook Marketing (Darker Music Talks Jun...
Marcus Taylor - Getting Practical: Facebook Marketing (Darker Music Talks Jun...Marcus Taylor - Getting Practical: Facebook Marketing (Darker Music Talks Jun...
Marcus Taylor - Getting Practical: Facebook Marketing (Darker Music Talks Jun...
 
What Women Want?
What Women Want?What Women Want?
What Women Want?
 
Ejyle company profile
Ejyle   company profileEjyle   company profile
Ejyle company profile
 
Agile Australia 2016 - Rescuing Legacy Software from Impending Doom
Agile Australia 2016 - Rescuing Legacy Software from Impending DoomAgile Australia 2016 - Rescuing Legacy Software from Impending Doom
Agile Australia 2016 - Rescuing Legacy Software from Impending Doom
 
020.guerra.civil. .x-factor.v2.08.hq.br.07 mar07.os.impossiveis.br.gibihq
020.guerra.civil. .x-factor.v2.08.hq.br.07 mar07.os.impossiveis.br.gibihq020.guerra.civil. .x-factor.v2.08.hq.br.07 mar07.os.impossiveis.br.gibihq
020.guerra.civil. .x-factor.v2.08.hq.br.07 mar07.os.impossiveis.br.gibihq
 
Ghana Capability Statement
Ghana Capability StatementGhana Capability Statement
Ghana Capability Statement
 
DeeDeeMikasa Resume
DeeDeeMikasa ResumeDeeDeeMikasa Resume
DeeDeeMikasa Resume
 
Patrick Zandl: Energy industry post Edison, Křižík & IoT
Patrick Zandl: Energy industry post Edison, Křižík & IoTPatrick Zandl: Energy industry post Edison, Křižík & IoT
Patrick Zandl: Energy industry post Edison, Křižík & IoT
 
20120513 repeatsinsymbolicsequences shur_lecture05-06
20120513 repeatsinsymbolicsequences shur_lecture05-0620120513 repeatsinsymbolicsequences shur_lecture05-06
20120513 repeatsinsymbolicsequences shur_lecture05-06
 
Benjamin Holmquist - Rhetorical Criticism Project
Benjamin Holmquist - Rhetorical Criticism ProjectBenjamin Holmquist - Rhetorical Criticism Project
Benjamin Holmquist - Rhetorical Criticism Project
 
Chapter 48
Chapter 48Chapter 48
Chapter 48
 
20080309 efficientalgorithms kulikov_lecture15
20080309 efficientalgorithms kulikov_lecture1520080309 efficientalgorithms kulikov_lecture15
20080309 efficientalgorithms kulikov_lecture15
 

Similar to Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email Experience for Millions of Users

AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAmazon Web Services
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceSalesforce Developers
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Etu Solution
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_SparkSunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_SparkMopuru Babu
 
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_SparkSunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_SparkMopuru Babu
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
Stratebi Big Data
Stratebi Big DataStratebi Big Data
Stratebi Big DataStratebi
 

Similar to Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email Experience for Millions of Users (20)

AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution Showcase
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_SparkSunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
 
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_SparkSunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Stratebi Big Data
Stratebi Big DataStratebi Big Data
Stratebi Big Data
 

More from WebExpo

Jakub Vrána: Code Reviews with Phabricator
Jakub Vrána: Code Reviews with PhabricatorJakub Vrána: Code Reviews with Phabricator
Jakub Vrána: Code Reviews with PhabricatorWebExpo
 
Jaroslav Šnajdr: Getting a Business Collaboration Service Into Cloud: A Case ...
Jaroslav Šnajdr: Getting a Business Collaboration Service Into Cloud: A Case ...Jaroslav Šnajdr: Getting a Business Collaboration Service Into Cloud: A Case ...
Jaroslav Šnajdr: Getting a Business Collaboration Service Into Cloud: A Case ...WebExpo
 
Steve Corona: Scaling LAMP doesn't have to suck
Steve Corona: Scaling LAMP doesn't have to suckSteve Corona: Scaling LAMP doesn't have to suck
Steve Corona: Scaling LAMP doesn't have to suckWebExpo
 
Adii Pienaar: Lessons learnt running a global startup from the edge of the world
Adii Pienaar: Lessons learnt running a global startup from the edge of the worldAdii Pienaar: Lessons learnt running a global startup from the edge of the world
Adii Pienaar: Lessons learnt running a global startup from the edge of the worldWebExpo
 
Marli Mesibov - What's in a Story?
Marli Mesibov - What's in a Story?Marli Mesibov - What's in a Story?
Marli Mesibov - What's in a Story?WebExpo
 
Tomáš Procházka: Moje zápisky z designu
Tomáš Procházka: Moje zápisky z designuTomáš Procházka: Moje zápisky z designu
Tomáš Procházka: Moje zápisky z designuWebExpo
 
Jiří Knesl: Souboj frameworků
Jiří Knesl: Souboj frameworkůJiří Knesl: Souboj frameworků
Jiří Knesl: Souboj frameworkůWebExpo
 
Richard Fridrich: Buď punkový konzument!
Richard Fridrich: Buď punkový konzument!Richard Fridrich: Buď punkový konzument!
Richard Fridrich: Buď punkový konzument!WebExpo
 
Jakub Nešetřil: Jak (ne)dělat API
Jakub Nešetřil: Jak (ne)dělat APIJakub Nešetřil: Jak (ne)dělat API
Jakub Nešetřil: Jak (ne)dělat APIWebExpo
 
Michal Blažej: Zbavte sa account managementu
Michal Blažej: Zbavte sa account managementuMichal Blažej: Zbavte sa account managementu
Michal Blažej: Zbavte sa account managementuWebExpo
 
Denisa Lorencová: UX Designer - Anděl s ďáblem v těle
Denisa Lorencová: UX Designer - Anděl s ďáblem v těleDenisa Lorencová: UX Designer - Anděl s ďáblem v těle
Denisa Lorencová: UX Designer - Anděl s ďáblem v těleWebExpo
 
Petr Ludwig: Jak bojovat s prokrastinací?
Petr Ludwig: Jak bojovat s prokrastinací?Petr Ludwig: Jak bojovat s prokrastinací?
Petr Ludwig: Jak bojovat s prokrastinací?WebExpo
 
Jan Vlček: Gamifikace 101
Jan Vlček: Gamifikace 101Jan Vlček: Gamifikace 101
Jan Vlček: Gamifikace 101WebExpo
 
Luke Wroblewski: Mobile First
Luke Wroblewski: Mobile FirstLuke Wroblewski: Mobile First
Luke Wroblewski: Mobile FirstWebExpo
 
Adam Hrubý: Evoluce designéra
Adam Hrubý: Evoluce designéraAdam Hrubý: Evoluce designéra
Adam Hrubý: Evoluce designéraWebExpo
 
Jan Sotorník: Grafika e-shopu jako sexy a chytrá prodavačka
Jan Sotorník: Grafika e-shopu jako sexy a chytrá prodavačkaJan Sotorník: Grafika e-shopu jako sexy a chytrá prodavačka
Jan Sotorník: Grafika e-shopu jako sexy a chytrá prodavačkaWebExpo
 
Jana Štěpánová: Neziskovky Goes Web
Jana Štěpánová: Neziskovky Goes WebJana Štěpánová: Neziskovky Goes Web
Jana Štěpánová: Neziskovky Goes WebWebExpo
 
Douglas Crockford: Serversideness
Douglas Crockford: ServersidenessDouglas Crockford: Serversideness
Douglas Crockford: ServersidenessWebExpo
 
Richard Fridrich: 5 x *, * a */5
Richard Fridrich: 5 x *, * a */5Richard Fridrich: 5 x *, * a */5
Richard Fridrich: 5 x *, * a */5WebExpo
 
Jiří Močička: Design as Storytelling
Jiří Močička: Design as StorytellingJiří Močička: Design as Storytelling
Jiří Močička: Design as StorytellingWebExpo
 

More from WebExpo (20)

Jakub Vrána: Code Reviews with Phabricator
Jakub Vrána: Code Reviews with PhabricatorJakub Vrána: Code Reviews with Phabricator
Jakub Vrána: Code Reviews with Phabricator
 
Jaroslav Šnajdr: Getting a Business Collaboration Service Into Cloud: A Case ...
Jaroslav Šnajdr: Getting a Business Collaboration Service Into Cloud: A Case ...Jaroslav Šnajdr: Getting a Business Collaboration Service Into Cloud: A Case ...
Jaroslav Šnajdr: Getting a Business Collaboration Service Into Cloud: A Case ...
 
Steve Corona: Scaling LAMP doesn't have to suck
Steve Corona: Scaling LAMP doesn't have to suckSteve Corona: Scaling LAMP doesn't have to suck
Steve Corona: Scaling LAMP doesn't have to suck
 
Adii Pienaar: Lessons learnt running a global startup from the edge of the world
Adii Pienaar: Lessons learnt running a global startup from the edge of the worldAdii Pienaar: Lessons learnt running a global startup from the edge of the world
Adii Pienaar: Lessons learnt running a global startup from the edge of the world
 
Marli Mesibov - What's in a Story?
Marli Mesibov - What's in a Story?Marli Mesibov - What's in a Story?
Marli Mesibov - What's in a Story?
 
Tomáš Procházka: Moje zápisky z designu
Tomáš Procházka: Moje zápisky z designuTomáš Procházka: Moje zápisky z designu
Tomáš Procházka: Moje zápisky z designu
 
Jiří Knesl: Souboj frameworků
Jiří Knesl: Souboj frameworkůJiří Knesl: Souboj frameworků
Jiří Knesl: Souboj frameworků
 
Richard Fridrich: Buď punkový konzument!
Richard Fridrich: Buď punkový konzument!Richard Fridrich: Buď punkový konzument!
Richard Fridrich: Buď punkový konzument!
 
Jakub Nešetřil: Jak (ne)dělat API
Jakub Nešetřil: Jak (ne)dělat APIJakub Nešetřil: Jak (ne)dělat API
Jakub Nešetřil: Jak (ne)dělat API
 
Michal Blažej: Zbavte sa account managementu
Michal Blažej: Zbavte sa account managementuMichal Blažej: Zbavte sa account managementu
Michal Blažej: Zbavte sa account managementu
 
Denisa Lorencová: UX Designer - Anděl s ďáblem v těle
Denisa Lorencová: UX Designer - Anděl s ďáblem v těleDenisa Lorencová: UX Designer - Anděl s ďáblem v těle
Denisa Lorencová: UX Designer - Anděl s ďáblem v těle
 
Petr Ludwig: Jak bojovat s prokrastinací?
Petr Ludwig: Jak bojovat s prokrastinací?Petr Ludwig: Jak bojovat s prokrastinací?
Petr Ludwig: Jak bojovat s prokrastinací?
 
Jan Vlček: Gamifikace 101
Jan Vlček: Gamifikace 101Jan Vlček: Gamifikace 101
Jan Vlček: Gamifikace 101
 
Luke Wroblewski: Mobile First
Luke Wroblewski: Mobile FirstLuke Wroblewski: Mobile First
Luke Wroblewski: Mobile First
 
Adam Hrubý: Evoluce designéra
Adam Hrubý: Evoluce designéraAdam Hrubý: Evoluce designéra
Adam Hrubý: Evoluce designéra
 
Jan Sotorník: Grafika e-shopu jako sexy a chytrá prodavačka
Jan Sotorník: Grafika e-shopu jako sexy a chytrá prodavačkaJan Sotorník: Grafika e-shopu jako sexy a chytrá prodavačka
Jan Sotorník: Grafika e-shopu jako sexy a chytrá prodavačka
 
Jana Štěpánová: Neziskovky Goes Web
Jana Štěpánová: Neziskovky Goes WebJana Štěpánová: Neziskovky Goes Web
Jana Štěpánová: Neziskovky Goes Web
 
Douglas Crockford: Serversideness
Douglas Crockford: ServersidenessDouglas Crockford: Serversideness
Douglas Crockford: Serversideness
 
Richard Fridrich: 5 x *, * a */5
Richard Fridrich: 5 x *, * a */5Richard Fridrich: 5 x *, * a */5
Richard Fridrich: 5 x *, * a */5
 
Jiří Močička: Design as Storytelling
Jiří Močička: Design as StorytellingJiří Močička: Design as Storytelling
Jiří Močička: Design as Storytelling
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email Experience for Millions of Users

  • 1. USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar
  • 2. Ameya Kanitkar – That‟s me! • Big Data Infrastructure Engineer @ Groupon, Palo Alto USA (Working on Deal Relevance & Personalization Systems) ameya.kanitkar@gmail.com http://www.linkedin.com/in/ameyakanitkar @aktwits
  • 3. Agenda  Basics of Hadoop & HBase  How you can use Hadoop & HBase for big data application  Case Study: Deal Relevance and Personalization Systems at Groupon with Hadoop & HBase
  • 4. Big Data Application Examples  Recommendation Systems  Ad targeting  Personalization Systems  BI/ DW  Log Analysis  Natural Language Processing
  • 5. So what is Hadoop?  General purpose framework for processing huge amounts of data.  Open Source  Batch / Offline Oriented
  • 6. Hadoop - HDFS  Open Source Distributed File System.  Store large files. Can easily be accessed via application built on top of HDFS.  Data is distributed and replicated over multiple machines  Linux Style commands eg. ls, cp, mv, touchz etc
  • 7. Hadoop – HDFS  Example: hadoop fs –dus /data/ 185453399927478 bytes =~ 168 TB (One of the folders from one of our hadoop cluster)
  • 8. Hadoop – Map Reduce  Application Framework built on top of HDFS to process your big data  Operates on key-value pairs  Mappers filter and transform input data  Reducers aggregate mapper output
  • 9. Example • Given web logs, calculate landing page conversion rate for each product • So basically we need to see how many impressions each product received and then calculate conversion rate of for each product
  • 10. Map Reduce Example Map Phase Reduce Phase Map 1: Process Log File: Output: Key (Product ID), Value (Impression Count) Map 2: Process Log File: Output: Key (Product ID), Value (Impression Count) Map N: Process Log File: Output: Key (Product ID), Value (Impression Count) Reducer: Here we receive all data for a given product. Just run simple for loop to calculate conversion rate. (Output: Product ID, Conversion Rate
  • 11. Recap  We just processed terabytes of data, and calculated conversion rate across millions of products.  Note: This is batch process only. It takes time. You can not start this process after some one visits your website. How about we generate recommendations in batch process and serve them in real time?
  • 12. HBase  Provides real time random read/ write access over HDFS  Built on Google‟s „Big Table‟ design  Open Sourced This is not RDBMS, so no joins. Access patterns are generally simple like get(key), put(key, value) etc.
  • 13. Row Cf:<qual> Cf:<qual> Row 1 Cf1:qual1 Cf1:qual2 Row 11 Cf1:qual2 Cf1:qual22 Row 2 …. Cf2:qual1 Cf1:qual3 Row N  Dynamic Column Names. No need to define columns upfront.  Both rows and columns are (lexicological) sorted Cf:<qual>
  • 14. …. Row Cf:<qual> user1 Cf1:click_history:{actual_cl Cf1:purchases:{actual_pur icks_data} chases} user11 Cf1:purchases:{actual_pur chases} user20 Cf1:mobile_impressions:{a Cf1:purchases:{actual_pur ctual mobile impressions} chases} Note: Each row has different columns, So think about this as a hash map rather than at table with rows and columns
  • 15. Putting it all together Store data in HDFS Web Generate Recommendations (Map Reduce) Serve Real Time Requests (HBase) Analyze Data (Map Reduce) Do offline analysis in Hadoop, and serve real time requests with HBase Mobile
  • 16. Use Case: Deal Relevance & Personalization @ Groupon
  • 19. Our Relevance Scenario How do we surface relevant deals ? Users  Deals are perishable (Deals expire or are sold out)  No direct user intent (As in traditional search advertising)  Relatively Limited User Information  Deals are highly local
  • 20. Two Sides to the Relevance Problem Algorithmic Issues Scaling Issues How to find relevant deals for individual users given a set of optimization criteria How to handle relevance for all users across multiple delivery platforms
  • 21. Developing Deal Ranking Algorithms • Exploring Data • Understanding signals, finding patterns • Building Models/Heuristics • Employ both classical machine learning techniques and heuristic adjustments to estimate user purchasing behavior • Conduct Experiments • Try out ideas on real users and evaluate their effect
  • 22. Data Infrastructure Growing Deals 2011 2012 Growing Users 2013  100 Million+ subscribers  We need to store data 20+ like, user click history, 400+ email records, service logs etc. This tunes to 2000+ billions of data points and TB‟s of data
  • 23. Deal Personalization Infrastructure Use Cases • Deliver Personalized Emails • Deliver Personalized Website & Mobile Experience Email Personalize billions of emails for hundredsof millions of users Offline System Personalize one of the most popular e-commerce mobile & web app for hundreds of millions of users & page views Online System
  • 24. Architecture • We can now maintain different SLA on online and offline systems Email Real Time Relevance Relevance Map/Reduce HBase Offline System Data Pipeline Replication HBase for Online System • We can tune HBase cluster differently for online and offline systems
  • 25. HBase Schema Design User ID Column Family 1 Column Family 2 Unique Identifier for Users User History and Profile Information Email History For Users Overwrite user history and profile info Append email history for each day as a separate columns. (On avg each row has over 200 columns) • Most of our data access patterns are via “User Key” • This makes it easy to design HBase schema • The actual data is kept in JSON
  • 26. Cluster Sizing HBase Replication Hadoop + HBase Cluster 100+ machine Hadoop cluster, this runs heavy map reduce jobs The same cluster also hosts 15 node HBase cluster Online HBase Cluster 10 Machine dedicated HBase cluster to serve real time SLA • Machine Profile • 96 GB RAM (HBase 25 GB) • 24 Virtual Cores CPU • 8 2TB Disks • Data Profile • 100 Million+ Records • 2TB+ Data • Over 4.2 Billion Data Points
  • 27. Questions? Thank You! (We are hiring!) www.groupon.com/techjobs

Editor's Notes

  1. The relevance problem can coarsely be divided into to conceptual parts: algorithmic aspects and scale-related issues. We’ll start on the algorithmic side of things.