As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve this problem.
As Data Manager, you know the challenges ahead:
- Multitudes of technology choices to make
- Building a team and solving the skill-set disconnect
- Data can be deceiving...
- Figuring out what the successful data product must be
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising, and gaming industries, holding various data or CTO roles. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains encountered by data teams all around.
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
1.
2. Hi ! I’m FLORIAN DOUETTEAU, CEO of Dataiku
x 54 +
x 1+
+
58
++
It’s Me !!
It’s our software !!
3. …and our software is
The most complete Data Science platform
Deployment
4. Dataiku - Data Tuesday
Meet Hal Alowne
Big Guys
• 10B$+ Revenue
• 100M+ customers
• 100+ Data Scientist
Hal Alowne
BI Manager
Dim’s Private Showroom
Hey Hal ! We need
a big data platform
like the big guys.
Let’s just do as they do!
‟
”Average E-commerce Web site
• 100M$ Revenue
• 1 Million customer
• 1 Data Analyst (Hal Himself)
Dim Sum
CEO & Founder
Dim’s Private Showroom
Big Data
Copy Cat
Project
7. LOL PLATFORM ANTI-PATTERN
Test and Invest in Infrastructure == Skilled People
or
Go For Cloud / Packaged Infrastructure
Your Brand New Hadoop Cluster
is perceived as slow, not so used
and not reliable
8. TECHNO MISMATCH ANTI-PATTERN
Assume Being Polyglot
or
Be a Dictator
VS
VS
The Python
Clan
The R
Tribe
The Old Elephant
Fraternity
The New Elephant
Club
9. PREDICTIVE ANALYTICS DEPLOYMENT
STRATEGY
Website 2000’ winners
Companies that were able to release fast
"Artificial Intelligence with Data for
Internet of Things" 2010’ winners
Companies able to put intelligence in production
?
Design a way to put “PREDITICTIVE MODELS”
IN PRODUCTION
11. Classic Business Intelligence Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
DBA / IT Data Owner
Specs
12. Data Science Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor Data Team Manager
Data Engineer
Data Analyst
Data System Engineer /
Data Architect
Specs
Data Scientist
13. Built From Scratch
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
DBA / IT Data Owner
Specs
DATA SCIENTISTS EVERYWHERE
14. Built From Engineering
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
Specs
DATA ENGINEERS
DATA ANALYSTS
22. What is the main reason for data project to fail ?
DATA
NOT
AVAILABLE
23. BUT FOR ONLY INCREMENTAL GAIN
50 30 20
0% 25% 50% 75% 100%
Contribution to the overall project performance
Business Goal Definition and Data Feature Engineering Algorithm
24. How to Get Data if you don’t have it
THE GRASSHOPER THE SPIDER THE FOX
25.
26. The Cicada : Optimistic and Opportunistic Data
THE CICADA
As a startup
As a group inside a company
- Build a new product using open data
- Benefit from the data sharing initiative within your company
- Wait for data to be available in your data lake
27. The Spider: Power of the Network
THE SPIDER
As a startup
As a group inside a company
- Create a network of (web trackers | sensors)
- Make it available for free
- Build your service on people’s collected data
- Make a web service available to collect data
- Promote it internally so that people use it
28. The Fox: Hunt for the Big Money first
THE FOX
As a startup
As a group inside a company
- Hunt for a Business Group within a large company with a problem
- Build a SaaS solution using their data
- Replicate to competitors
- Take in a charge a critical problem as per the CEO’s request
- Build your own integrated tech team to solve it
- Use those ressources to reset data services internally
31. The Age Of Distributed Intelligence
Global, Personalised
and Real Time Data
Driven Services
32. Data to Visualize or Data to Automate ?
2013 2014 2015 2016 2017 2018
Automated Decision VIsualize To Decide
Moving to a world of automated decision making
33. Involve product team
Product Feature
Personalised Item Ranking
Product Feature
Notify User Only when Needed
Product Feature:
Historical Data For Path Optimisation
Have Product Management Deeply Involved
In the Data Team
34. Where is your added value ?
Is the problem at the Core of
my Business Process?
Is it a common problem / with
share data ?
Go for Best of
Breed SAAS
Solution
Can I Solve it on my own ?
Really ?
Build by the
data team
Build by the
data team ?
Build by the
data team
Hire
Consultants
and Learn
Yes
Yes No
I can’t Ok, I can try
Yes!
No!
No
35. Be aware of the confort zone
Mission
Critical
Small
Structured
Large
Diverse
Sheer
Curiosity
Reporting
for Finance
in Any Industry
Analyze
Each Tweet
Web Navigation
For E-Merchant
Ticket Data
For Discounts
in Retail
Phone Call
Logs for Security
RTB Data
For Advertising
Customer
Consumption
For Anti-Churn
in Utilities
Optimization
Filings
For Fraud
in Insurance
Not Enough
Data To Learn
From ?
Not Enough
“Hard" Examples
So that you can learn
36. Create an "API" Culture
Do not share
• Random Piece of Code
• Flat File
Do share
• Reproductible documented workflows
• Clean, documented APIs
What do we do ?
We help insurers develop new analytic data-driven products and platform withouth having to pay any technical debt from leaving already existing platform (that comes from the 90s)
We help people build Data Labs