SlideShare una empresa de Scribd logo
1 de 68
Big Data: The Magic to
Attain New Heights
Ken Johnston Principal Data Science Manager
Twitter – @rkjohnston
Blog – http://linkedin.com/in/rkjohnston
Email – kenj@Microsoft.com
LinkedIn - http://linkedin.com/in/rkjohnston
@rkjohnston #DataMagic
Data Scientist
in Core Data
Science Team
Office Live,
WebApps,
Office Online
Cosmos,
AutoPilot,
Local,
Shopping
About Ken
Kanban and
Data Science
series on
LinkedIn
EaaSy&MVQ
– Everything as
a Service &
Minimum Viable
Quality
Write Books and Blog
and some fiction
I have a lot of love in my life
My Kids
@rkjohnston #DataMagic
Team of Amazing Magicians
Getting hands
dirty in the data
Connect the
Dots
Create
Deep
Insights
Taking on Sudden
Infant Death
Syndrome
Big Data and Magic
So, My son
gets this kids
“Magic Kit in
a Box” for his
8th birthday
Open our own Magic Show
Six Keys to a “Big” Magic Show
Try, Try, Try
Again
The Tyrany of
Counting
Magic
Tricks
(A/B Testing,
Runtime Flags)
The Venue
(Big Data
Infrastructure)
Foundation
(Tools for Big
Data)
Security
(Protection,
Privacy, Fraud)
The
Assistant
Recruit, Train,
& Retain
“Big Data” Search Trends
@rkjohnston #DataMagic
The Venue
Your Big Data Infrastructure
Common Design Patterns
Good Paper to Read
IDC: Six Patterns of Big Data and Analytics Adoption:
The Importance of the Information Architecture
Ingest
From Services, IOT, Apps
Via Streams
Into Storage
Process
Build Pipelines
Reduce, Transform, Join
Pipe out
Analyze
From Services, IOT, Apps
Via Streams
Into Storage
Azure Model
Cindy Gross – Technical Fellow: Big Data and Cloud
Twitter: @SQLCindy cindyg@NealAnalytics.com
Ingest
Process
Analyze
Hybrid: Azure and Hadoop Model
Ingest Process Analyze
Amazon Model
Ingest Process Analyze
How we do
it in
Windows
Prototypical Big Data PlatformClient1Client2Client3
TelemetryFrontEndService
Fast pipeline for high priority Data
Alerting
DB
Alerting
Dashboard
Big Data Map
Reduce Cloud
PIIScrubbingService
DataExtractionService
Insights
DB 1
Insights
DB N
Additional
Reporting
Dashboards
Personally Identifiable
Information (PII)
Management very critical.
Data Driven Quality (DDQ)
and big data pipelines will
need a cloud platform
Superfast pipeline typically
(not always) bypasses cloud.
Also void of PII.
Big Data & ML Model Orchestration
@rkjohnston #DataMagic
Prototypical Big Data PlatformClient1Client2Client3
TelemetryFrontEndService
Fast pipeline for high priority Data
Alerting
DB
Alerting
Dashboard
Big Data Map
Reduce Cloud
PIIScrubbingService
DataExtractionService
Insights
DB 1
Insights
DB N
Additional
Reporting
Dashboards
Big Data & ML Model Orchestration
Ingest Process Analyze
@rkjohnston #DataMagic
User Segmentation Approaches
• Risk Tolerance Model
• Users Segment themselves
• Opt in for greater risk with a reward in mind
• Profile Based
• Usage behaviors
• new vs. power users
• Browser type
• Connection Type
• Device and Device OS
@rkjohnston #DataMagic
Ring 2 External Beta
UsersRing 2: Company
& NDA
Balancing Speed and Risk with Rings
Ring 1: My Team
Ring 4: Everyone
Ring 0: Buddy Build
Red Line demarks disclosure risk
and possible loss of patent rights
Risk Tolerance
is highest
No desire
for risk
@rkjohnston #DataMagic
Date
Security
Protection, Privacy, Fraud
Office 365 Advanced Threat Protection
Big Data Only Solution
Safe Link is powered by
Cloud Exchange & Bing data
AI Model powered by data
from thousands of
companies and attachments
@rkjohnston #DataMagic
Short lived identifiers
Increase
transparency and
control for users
Build privacy into the
OS and all apps
How the
Windows Store
Security Team
made the
Insights Leap
@rkjohnston #DataMagic
App Store Data Architecture
App Certification
and Analysis
Pipeline
Store Services Log
and Telemetry
Bing Spam and
Malware
Windows Services Safety
Platform
(MSA, SmartScreen, Etc..)
MMPC/Spynet
Network IPs
File Hashes
PhotoDNA
Strings
API Called
User Install Data
Ratings and
Reviews
Purchases
Geographic Data
Account
Reputation
Bad URLs
Botnet infected
Clients
Cosmos Storage
and Compute
BTW this
was not Big
Data
NoName was Learning basic DS
Look at how I did this k-means
clustering and found these weird
outliers in buying circles from Dev
accounts created the same week and
same IP address
Check it out, I found this guys FB
page. We have his picture!
NoName and I were Spitballing Ideas
Bad Dev
‘N’
Bad Dev
‘N’
Fraud Network Identification
Bad Dev 1
Payment Instruments
App Similarity
Social Networks
3rd party app stores
Bad Dev 2
XXXDeveloper
Created 40 Different Store Developer Accounts and 100s of Apps
App Metadata
(URL, Websites)
Developer Watering
Holes
Shared Fraudulent Payment Instruments
Bad Dev
‘N’
New Identity
Metadata
Shared Fraudulent Payment Instruments
App Similarity
App Similarity
lights out
Date
Foundation
Tools and Skills for Big Data
The Big Red Switch
This used to require humans
Sidebar: I
had an
Epiphany
Speed is your
friend because…
Six week coding milestone
Code churn is
cumulative
Imagine this as part of a
larger multi-layered
project
Layer 1
Layer 2
Layer 3
• Tightly coupled layers
• Long stabilization phase
• Complicated end-to-end integration
Sim-ship increases
risk
Maximum point of
instability is at end of
milestone
Code Churn Example 1
@rkjohnston #DataMagic
Code Churn Example 2 (Continuous
Deployment)
Layer 1
Layer 2
Layer 3
• Risk per release decreases because of more
incremental change
• You still must be careful of Risk within
Production but…
• Total risk over time can be less with
incremental change
Rapid release cadence
(weekly or daily)
Max Risk is Production
Layer N
@rkjohnston #DataMagic
As Speed Accelerates
Up Front & Post Deploy Testing Decreases
Measures = Test Cases
• We do Measures
• What is a post release test
case?
• Automation validates the
golden path
• We measure the golden
path
• Measures are the same
as test cases
• Monitor the golden path
@rkjohnston #DataMagic
>1.5*IQR = Outlier = Bug (probably)
• What is a Test Case?
• What I expect to happen vs.
What does happen
• A Test Case is Binary
• Measures can observe
success and fail
• Measures have history of
pass fail
• When pass or fail drift from
standard expected rates we
find outliers
• Outliers are often bugs
Rings + Speed + Data = Success
• When speed increases the need for telemetry increases
• The rings model provides a buffer
@rkjohnston #DataMagic
Tricks
Flighting and
A/B testing
are mostly the
same thing
@rkjohnston #DataMagic
Runtime Flags =
Continuous Deployment
Generic Service Stack
Service UX Front Door
Service Auth/Identity
Layer A vCurrent
Layer B vCurrent
Service Layer C
(Persistent Data Store)
DefaultPath
Production
Traffic
Front door servers for logging
and access management
UX rendering layers
Identity or authentication layers
Persistent data layers
@rkjohnston #DataMagic
Runtime Flags Example 1
Side-by-Side Deployments
Service UX Front Door
Service Auth/Identity
Runtime Flags
• Flags direct traffic through the stack
• Used to test vNext before full
release
Layer A vCurrent
Layer B vCurrent
Service Layer C
(Persistent Data Store)
DefaultPath
Runtime
Production
Traffic
Test or Forked
Traffic
Runtime
RuntimeRuntime
Layer B vNext
Runtime Flags Example 2
N Test Environments
Service UX Front Door
Service Auth/Identity
Layer A vCurrent
Layer B vCurrent
Service Layer C
(Persistent Data Store)
Production
Traffic
Test
Case
Checkin
Tests
DefaultPath
Runtime
Runtime
Runtime
Runtime
Layer A Test Path
Layer B Test Path
Apps as a Service: Facebook
How Facebook secretly redesigned its iPhone app with
your help
…a system for creating alternate versions… within the
native app.
The team could then turn on certain new features for a
subset of its users, directly,
…a system of "different types of Legos... and see the
results on the server in real time."
From article on The Verge by Dieter Bohn September 18, 2013
@rkjohnston #DataMagic
All
Magicians
need an
Assistant
Visualization
Machine Learning
Data Scientist Data Engineer
Extract Load
Transform
Data Architecture
Operations and
Monitoring
Big Data Infrastructure
& Storage
DB Administration
Statistics
Math
Programming
Modeling
Story Telling
Data Exploration
http://www.datasciencecentral.com/profiles/blogs/difference-
between-data-engineers-and-data-scientists
Typical Industry Staffing
Blended Role for Agile
Visualization
Machine Learning
Data Scientist/Data Engineer
Extract Load
Transform
Data Architecture
Operations and
Monitoring
Big Data Infrastructure
& Storage
DB Administration
Statistics
Math
Programming
Modeling
Story Telling
Data
Exploration
@rkjohnston #DataMagic
LDA vs PCA vs A13
before stratified
sampling
Backlog Doing Validation Done
MLADS ARPD
Rehearsal
Submit Abstract to
Strata + Hadoop World
Edge Experiment 1
Data Processing
Edge Experiment 2
Customer Sat and Post
Sales Monetization
Factors Analysis
Install Base Decay Rate
estimation using
Baysian Model
Friday Review Slides
for Edge Experiment 1
Edge Experiment 1
Insights Analysis
Top Enterprise DSAT
list from textual
analysis
Business Entity Graph
with DUNS, Domain
Name, & TaxIDs
Open Source Entity
Graph visualization
technology research
Submit Paper to
Informs 2016
ARPD V3 Model with
FFF
MLADS ARPD Slides
Draft 1
Device Lifetime Value
(LTV) model 2
Process and Culture impact Retention
• Kanban for Project Management
• Balance long and short term impact
• Participate in Industry papers and reviews
@rkjohnston #DataMagic
Trying Again & Again
Advantages and
Disadvantages of
the counting
culture
KPIs drive companies and behavior
The 5 Vs of Big Data
Nine months ago there were only three Vs
Variety VelocityVolume Verify
Verification – managing data quality and access control at all points
Value
Must Count More
Counting More Granular
Make it go up and to the
right
Is vs Likely
Business Impact is a
Given
Drives behavior
(especially if tied to
compensation)
Viable
Possible Features
Minimum + Viable
Good features to test the
users responses
Bad user experience. Too
minimal a set or wrong set of
features. Will not engage users
enough to gain valuable
insights
The product you want to
build but to deliver all
features will take too long
Wasted work adding features
that do not add critical value for
winning and retaining customers
Minimum
MVP in a Nutshell
Possible Data
Viable
Model should provide
enough coverage that it can
be used for core insights.
Many models try to include all data
and large numbers of attributes but
that slows down innovation
If precision is too low then the
model can’t be trusted for even
first level insights.
Minimum
More features can increase
complexity without
significant improvement in
precision and recall
Minimum Viable Model (MVM)
Possible Features
Minimum + Viable
An Ideal MVM uses a modest
amount of data, implements
a relatively simple initial
algorithm, has good
precision (we aim for 98% or
more) and enough recall to
be used for core insights.
Keep your eye on the target
The goal is not
to get a bulls eye
every time
The goal is to
get the data and
Learn
Test & Ops = Data Science
Six Keys to a “Big” Magic Show
Try, Try, Try
Again
The Tyrany of
Counting
Magic
Tricks
(A/B Testing,
Runtime Flags)
The Venue
(Big Data
Infrastructure)
Foundation
(Tools for Big
Data)
Security
(Protection,
Privacy, Fraud)
The
Assistant
Recruit, Train,
& Retain
“Big Data” Search Trends
@rkjohnston #DataMagic
Big Data: The Magic to
Attain New Heights
Ken Johnston Principal Data Science Manager
Twitter – @rkjohnston
Blog – http://linkedin.com/in/rkjohnston
Email – kenj@Microsoft.com
LinkedIn - http://linkedin.com/in/rkjohnston
@rkjohnston #DataMagic

Más contenido relacionado

Similar a Big Data: The Magic to Attain New Heights

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Time Difference: How Tomorrow's Companies Will Outpace Today's
Time Difference: How Tomorrow's Companies Will Outpace Today'sTime Difference: How Tomorrow's Companies Will Outpace Today's
Time Difference: How Tomorrow's Companies Will Outpace Today'sInside Analysis
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptxarpit206900
 
Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)ShehryarSH1
 
IT Operation Analytic for security- MiSSconf(sp1)
IT Operation Analytic for security- MiSSconf(sp1)IT Operation Analytic for security- MiSSconf(sp1)
IT Operation Analytic for security- MiSSconf(sp1)stelligence
 
Performance Quality Metrics for Mobile Web and Mobile Native - Agile Testing ...
Performance Quality Metrics for Mobile Web and Mobile Native - Agile Testing ...Performance Quality Metrics for Mobile Web and Mobile Native - Agile Testing ...
Performance Quality Metrics for Mobile Web and Mobile Native - Agile Testing ...Andreas Grabner
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
 
2018-10-17 J1 6D - Draw your imagination with Microsoft Graph API - Dipti Chh...
2018-10-17 J1 6D - Draw your imagination with Microsoft Graph API - Dipti Chh...2018-10-17 J1 6D - Draw your imagination with Microsoft Graph API - Dipti Chh...
2018-10-17 J1 6D - Draw your imagination with Microsoft Graph API - Dipti Chh...Modern Workplace Conference Paris
 
PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent Jonny Daenen
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesDatabricks
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Srinath Perera
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
MassTLC Opening Slides and Simulation Session
MassTLC Opening Slides and Simulation SessionMassTLC Opening Slides and Simulation Session
MassTLC Opening Slides and Simulation SessionMassTLC
 

Similar a Big Data: The Magic to Attain New Heights (20)

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Time Difference: How Tomorrow's Companies Will Outpace Today's
Time Difference: How Tomorrow's Companies Will Outpace Today'sTime Difference: How Tomorrow's Companies Will Outpace Today's
Time Difference: How Tomorrow's Companies Will Outpace Today's
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
 
Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)
 
IT Operation Analytic for security- MiSSconf(sp1)
IT Operation Analytic for security- MiSSconf(sp1)IT Operation Analytic for security- MiSSconf(sp1)
IT Operation Analytic for security- MiSSconf(sp1)
 
Performance Quality Metrics for Mobile Web and Mobile Native - Agile Testing ...
Performance Quality Metrics for Mobile Web and Mobile Native - Agile Testing ...Performance Quality Metrics for Mobile Web and Mobile Native - Agile Testing ...
Performance Quality Metrics for Mobile Web and Mobile Native - Agile Testing ...
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Grandata
GrandataGrandata
Grandata
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
2018-10-17 J1 6D - Draw your imagination with Microsoft Graph API - Dipti Chh...
2018-10-17 J1 6D - Draw your imagination with Microsoft Graph API - Dipti Chh...2018-10-17 J1 6D - Draw your imagination with Microsoft Graph API - Dipti Chh...
2018-10-17 J1 6D - Draw your imagination with Microsoft Graph API - Dipti Chh...
 
PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
MassTLC Opening Slides and Simulation Session
MassTLC Opening Slides and Simulation SessionMassTLC Opening Slides and Simulation Session
MassTLC Opening Slides and Simulation Session
 

Más de TEST Huddle

Why We Need Diversity in Testing- Accenture
Why We Need Diversity in Testing- AccentureWhy We Need Diversity in Testing- Accenture
Why We Need Diversity in Testing- AccentureTEST Huddle
 
Keys to continuous testing for faster delivery euro star webinar
Keys to continuous testing for faster delivery euro star webinar Keys to continuous testing for faster delivery euro star webinar
Keys to continuous testing for faster delivery euro star webinar TEST Huddle
 
Why you Shouldnt Automated But You Will Anyway
Why you Shouldnt Automated But You Will Anyway Why you Shouldnt Automated But You Will Anyway
Why you Shouldnt Automated But You Will Anyway TEST Huddle
 
Being a Tester in Scrum
Being a Tester in ScrumBeing a Tester in Scrum
Being a Tester in ScrumTEST Huddle
 
Leveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional TestsLeveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional TestsTEST Huddle
 
Using Test Trees to get an Overview of Test Work
Using Test Trees to get an Overview of Test WorkUsing Test Trees to get an Overview of Test Work
Using Test Trees to get an Overview of Test WorkTEST Huddle
 
Will Robots Replace Testers?
Will Robots Replace Testers?Will Robots Replace Testers?
Will Robots Replace Testers?TEST Huddle
 
TDD For The Rest Of Us
TDD For The Rest Of UsTDD For The Rest Of Us
TDD For The Rest Of UsTEST Huddle
 
Scaling Agile with LeSS (Large Scale Scrum)
Scaling Agile with LeSS (Large Scale Scrum)Scaling Agile with LeSS (Large Scale Scrum)
Scaling Agile with LeSS (Large Scale Scrum)TEST Huddle
 
Creating Agile Test Strategies for Larger Enterprises
Creating Agile Test Strategies for Larger EnterprisesCreating Agile Test Strategies for Larger Enterprises
Creating Agile Test Strategies for Larger EnterprisesTEST Huddle
 
Is There A Risk?
Is There A Risk?Is There A Risk?
Is There A Risk?TEST Huddle
 
Are Your Tests Well-Travelled? Thoughts About Test Coverage
Are Your Tests Well-Travelled? Thoughts About Test CoverageAre Your Tests Well-Travelled? Thoughts About Test Coverage
Are Your Tests Well-Travelled? Thoughts About Test CoverageTEST Huddle
 
Growing a Company Test Community: Roles and Paths for Testers
Growing a Company Test Community: Roles and Paths for TestersGrowing a Company Test Community: Roles and Paths for Testers
Growing a Company Test Community: Roles and Paths for TestersTEST Huddle
 
Do we need testers on agile teams?
Do we need testers on agile teams?Do we need testers on agile teams?
Do we need testers on agile teams?TEST Huddle
 
How to use selenium successfully
How to use selenium successfullyHow to use selenium successfully
How to use selenium successfullyTEST Huddle
 
Testers & Teams on the Agile Fluency™ Journey
Testers & Teams on the Agile Fluency™ Journey Testers & Teams on the Agile Fluency™ Journey
Testers & Teams on the Agile Fluency™ Journey TEST Huddle
 
Practical Test Strategy Using Heuristics
Practical Test Strategy Using HeuristicsPractical Test Strategy Using Heuristics
Practical Test Strategy Using HeuristicsTEST Huddle
 
Thinking Through Your Role
Thinking Through Your RoleThinking Through Your Role
Thinking Through Your RoleTEST Huddle
 
Using Selenium 3 0
Using Selenium 3 0Using Selenium 3 0
Using Selenium 3 0TEST Huddle
 
New Model Testing: A New Test Process and Tool
New Model Testing:  A New Test Process and ToolNew Model Testing:  A New Test Process and Tool
New Model Testing: A New Test Process and ToolTEST Huddle
 

Más de TEST Huddle (20)

Why We Need Diversity in Testing- Accenture
Why We Need Diversity in Testing- AccentureWhy We Need Diversity in Testing- Accenture
Why We Need Diversity in Testing- Accenture
 
Keys to continuous testing for faster delivery euro star webinar
Keys to continuous testing for faster delivery euro star webinar Keys to continuous testing for faster delivery euro star webinar
Keys to continuous testing for faster delivery euro star webinar
 
Why you Shouldnt Automated But You Will Anyway
Why you Shouldnt Automated But You Will Anyway Why you Shouldnt Automated But You Will Anyway
Why you Shouldnt Automated But You Will Anyway
 
Being a Tester in Scrum
Being a Tester in ScrumBeing a Tester in Scrum
Being a Tester in Scrum
 
Leveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional TestsLeveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional Tests
 
Using Test Trees to get an Overview of Test Work
Using Test Trees to get an Overview of Test WorkUsing Test Trees to get an Overview of Test Work
Using Test Trees to get an Overview of Test Work
 
Will Robots Replace Testers?
Will Robots Replace Testers?Will Robots Replace Testers?
Will Robots Replace Testers?
 
TDD For The Rest Of Us
TDD For The Rest Of UsTDD For The Rest Of Us
TDD For The Rest Of Us
 
Scaling Agile with LeSS (Large Scale Scrum)
Scaling Agile with LeSS (Large Scale Scrum)Scaling Agile with LeSS (Large Scale Scrum)
Scaling Agile with LeSS (Large Scale Scrum)
 
Creating Agile Test Strategies for Larger Enterprises
Creating Agile Test Strategies for Larger EnterprisesCreating Agile Test Strategies for Larger Enterprises
Creating Agile Test Strategies for Larger Enterprises
 
Is There A Risk?
Is There A Risk?Is There A Risk?
Is There A Risk?
 
Are Your Tests Well-Travelled? Thoughts About Test Coverage
Are Your Tests Well-Travelled? Thoughts About Test CoverageAre Your Tests Well-Travelled? Thoughts About Test Coverage
Are Your Tests Well-Travelled? Thoughts About Test Coverage
 
Growing a Company Test Community: Roles and Paths for Testers
Growing a Company Test Community: Roles and Paths for TestersGrowing a Company Test Community: Roles and Paths for Testers
Growing a Company Test Community: Roles and Paths for Testers
 
Do we need testers on agile teams?
Do we need testers on agile teams?Do we need testers on agile teams?
Do we need testers on agile teams?
 
How to use selenium successfully
How to use selenium successfullyHow to use selenium successfully
How to use selenium successfully
 
Testers & Teams on the Agile Fluency™ Journey
Testers & Teams on the Agile Fluency™ Journey Testers & Teams on the Agile Fluency™ Journey
Testers & Teams on the Agile Fluency™ Journey
 
Practical Test Strategy Using Heuristics
Practical Test Strategy Using HeuristicsPractical Test Strategy Using Heuristics
Practical Test Strategy Using Heuristics
 
Thinking Through Your Role
Thinking Through Your RoleThinking Through Your Role
Thinking Through Your Role
 
Using Selenium 3 0
Using Selenium 3 0Using Selenium 3 0
Using Selenium 3 0
 
New Model Testing: A New Test Process and Tool
New Model Testing:  A New Test Process and ToolNew Model Testing:  A New Test Process and Tool
New Model Testing: A New Test Process and Tool
 

Último

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 

Último (20)

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 

Big Data: The Magic to Attain New Heights

  • 1. Big Data: The Magic to Attain New Heights Ken Johnston Principal Data Science Manager Twitter – @rkjohnston Blog – http://linkedin.com/in/rkjohnston Email – kenj@Microsoft.com LinkedIn - http://linkedin.com/in/rkjohnston @rkjohnston #DataMagic
  • 2. Data Scientist in Core Data Science Team Office Live, WebApps, Office Online Cosmos, AutoPilot, Local, Shopping About Ken Kanban and Data Science series on LinkedIn EaaSy&MVQ – Everything as a Service & Minimum Viable Quality Write Books and Blog and some fiction
  • 3. I have a lot of love in my life
  • 4.
  • 7. Team of Amazing Magicians
  • 11.
  • 12. Taking on Sudden Infant Death Syndrome
  • 13. Big Data and Magic
  • 14. So, My son gets this kids “Magic Kit in a Box” for his 8th birthday
  • 15. Open our own Magic Show
  • 16. Six Keys to a “Big” Magic Show Try, Try, Try Again The Tyrany of Counting Magic Tricks (A/B Testing, Runtime Flags) The Venue (Big Data Infrastructure) Foundation (Tools for Big Data) Security (Protection, Privacy, Fraud) The Assistant Recruit, Train, & Retain “Big Data” Search Trends @rkjohnston #DataMagic
  • 17. The Venue Your Big Data Infrastructure
  • 18. Common Design Patterns Good Paper to Read IDC: Six Patterns of Big Data and Analytics Adoption: The Importance of the Information Architecture Ingest From Services, IOT, Apps Via Streams Into Storage Process Build Pipelines Reduce, Transform, Join Pipe out Analyze From Services, IOT, Apps Via Streams Into Storage
  • 19. Azure Model Cindy Gross – Technical Fellow: Big Data and Cloud Twitter: @SQLCindy cindyg@NealAnalytics.com Ingest Process Analyze
  • 20. Hybrid: Azure and Hadoop Model Ingest Process Analyze
  • 22. How we do it in Windows
  • 23. Prototypical Big Data PlatformClient1Client2Client3 TelemetryFrontEndService Fast pipeline for high priority Data Alerting DB Alerting Dashboard Big Data Map Reduce Cloud PIIScrubbingService DataExtractionService Insights DB 1 Insights DB N Additional Reporting Dashboards Personally Identifiable Information (PII) Management very critical. Data Driven Quality (DDQ) and big data pipelines will need a cloud platform Superfast pipeline typically (not always) bypasses cloud. Also void of PII. Big Data & ML Model Orchestration @rkjohnston #DataMagic
  • 24. Prototypical Big Data PlatformClient1Client2Client3 TelemetryFrontEndService Fast pipeline for high priority Data Alerting DB Alerting Dashboard Big Data Map Reduce Cloud PIIScrubbingService DataExtractionService Insights DB 1 Insights DB N Additional Reporting Dashboards Big Data & ML Model Orchestration Ingest Process Analyze @rkjohnston #DataMagic
  • 25. User Segmentation Approaches • Risk Tolerance Model • Users Segment themselves • Opt in for greater risk with a reward in mind • Profile Based • Usage behaviors • new vs. power users • Browser type • Connection Type • Device and Device OS @rkjohnston #DataMagic
  • 26. Ring 2 External Beta UsersRing 2: Company & NDA Balancing Speed and Risk with Rings Ring 1: My Team Ring 4: Everyone Ring 0: Buddy Build Red Line demarks disclosure risk and possible loss of patent rights Risk Tolerance is highest No desire for risk @rkjohnston #DataMagic
  • 28.
  • 29. Office 365 Advanced Threat Protection Big Data Only Solution Safe Link is powered by Cloud Exchange & Bing data AI Model powered by data from thousands of companies and attachments @rkjohnston #DataMagic
  • 30. Short lived identifiers Increase transparency and control for users Build privacy into the OS and all apps
  • 31.
  • 32. How the Windows Store Security Team made the Insights Leap @rkjohnston #DataMagic
  • 33. App Store Data Architecture App Certification and Analysis Pipeline Store Services Log and Telemetry Bing Spam and Malware Windows Services Safety Platform (MSA, SmartScreen, Etc..) MMPC/Spynet Network IPs File Hashes PhotoDNA Strings API Called User Install Data Ratings and Reviews Purchases Geographic Data Account Reputation Bad URLs Botnet infected Clients Cosmos Storage and Compute BTW this was not Big Data
  • 34. NoName was Learning basic DS Look at how I did this k-means clustering and found these weird outliers in buying circles from Dev accounts created the same week and same IP address Check it out, I found this guys FB page. We have his picture! NoName and I were Spitballing Ideas
  • 35. Bad Dev ‘N’ Bad Dev ‘N’ Fraud Network Identification Bad Dev 1 Payment Instruments App Similarity Social Networks 3rd party app stores Bad Dev 2 XXXDeveloper Created 40 Different Store Developer Accounts and 100s of Apps App Metadata (URL, Websites) Developer Watering Holes Shared Fraudulent Payment Instruments Bad Dev ‘N’ New Identity Metadata Shared Fraudulent Payment Instruments App Similarity App Similarity
  • 38. The Big Red Switch This used to require humans
  • 40. Speed is your friend because…
  • 41. Six week coding milestone Code churn is cumulative Imagine this as part of a larger multi-layered project Layer 1 Layer 2 Layer 3 • Tightly coupled layers • Long stabilization phase • Complicated end-to-end integration Sim-ship increases risk Maximum point of instability is at end of milestone Code Churn Example 1 @rkjohnston #DataMagic
  • 42. Code Churn Example 2 (Continuous Deployment) Layer 1 Layer 2 Layer 3 • Risk per release decreases because of more incremental change • You still must be careful of Risk within Production but… • Total risk over time can be less with incremental change Rapid release cadence (weekly or daily) Max Risk is Production Layer N @rkjohnston #DataMagic
  • 43. As Speed Accelerates Up Front & Post Deploy Testing Decreases
  • 44.
  • 45. Measures = Test Cases • We do Measures • What is a post release test case? • Automation validates the golden path • We measure the golden path • Measures are the same as test cases • Monitor the golden path @rkjohnston #DataMagic
  • 46. >1.5*IQR = Outlier = Bug (probably) • What is a Test Case? • What I expect to happen vs. What does happen • A Test Case is Binary • Measures can observe success and fail • Measures have history of pass fail • When pass or fail drift from standard expected rates we find outliers • Outliers are often bugs
  • 47. Rings + Speed + Data = Success • When speed increases the need for telemetry increases • The rings model provides a buffer @rkjohnston #DataMagic
  • 49. Flighting and A/B testing are mostly the same thing @rkjohnston #DataMagic
  • 51. Generic Service Stack Service UX Front Door Service Auth/Identity Layer A vCurrent Layer B vCurrent Service Layer C (Persistent Data Store) DefaultPath Production Traffic Front door servers for logging and access management UX rendering layers Identity or authentication layers Persistent data layers @rkjohnston #DataMagic
  • 52. Runtime Flags Example 1 Side-by-Side Deployments Service UX Front Door Service Auth/Identity Runtime Flags • Flags direct traffic through the stack • Used to test vNext before full release Layer A vCurrent Layer B vCurrent Service Layer C (Persistent Data Store) DefaultPath Runtime Production Traffic Test or Forked Traffic Runtime RuntimeRuntime Layer B vNext
  • 53. Runtime Flags Example 2 N Test Environments Service UX Front Door Service Auth/Identity Layer A vCurrent Layer B vCurrent Service Layer C (Persistent Data Store) Production Traffic Test Case Checkin Tests DefaultPath Runtime Runtime Runtime Runtime Layer A Test Path Layer B Test Path
  • 54. Apps as a Service: Facebook How Facebook secretly redesigned its iPhone app with your help …a system for creating alternate versions… within the native app. The team could then turn on certain new features for a subset of its users, directly, …a system of "different types of Legos... and see the results on the server in real time." From article on The Verge by Dieter Bohn September 18, 2013 @rkjohnston #DataMagic
  • 56. Visualization Machine Learning Data Scientist Data Engineer Extract Load Transform Data Architecture Operations and Monitoring Big Data Infrastructure & Storage DB Administration Statistics Math Programming Modeling Story Telling Data Exploration http://www.datasciencecentral.com/profiles/blogs/difference- between-data-engineers-and-data-scientists Typical Industry Staffing
  • 57. Blended Role for Agile Visualization Machine Learning Data Scientist/Data Engineer Extract Load Transform Data Architecture Operations and Monitoring Big Data Infrastructure & Storage DB Administration Statistics Math Programming Modeling Story Telling Data Exploration @rkjohnston #DataMagic
  • 58. LDA vs PCA vs A13 before stratified sampling Backlog Doing Validation Done MLADS ARPD Rehearsal Submit Abstract to Strata + Hadoop World Edge Experiment 1 Data Processing Edge Experiment 2 Customer Sat and Post Sales Monetization Factors Analysis Install Base Decay Rate estimation using Baysian Model Friday Review Slides for Edge Experiment 1 Edge Experiment 1 Insights Analysis Top Enterprise DSAT list from textual analysis Business Entity Graph with DUNS, Domain Name, & TaxIDs Open Source Entity Graph visualization technology research Submit Paper to Informs 2016 ARPD V3 Model with FFF MLADS ARPD Slides Draft 1 Device Lifetime Value (LTV) model 2 Process and Culture impact Retention • Kanban for Project Management • Balance long and short term impact • Participate in Industry papers and reviews @rkjohnston #DataMagic
  • 59. Trying Again & Again Advantages and Disadvantages of the counting culture
  • 60. KPIs drive companies and behavior
  • 61. The 5 Vs of Big Data Nine months ago there were only three Vs Variety VelocityVolume Verify Verification – managing data quality and access control at all points Value
  • 62. Must Count More Counting More Granular Make it go up and to the right Is vs Likely Business Impact is a Given Drives behavior (especially if tied to compensation)
  • 63. Viable Possible Features Minimum + Viable Good features to test the users responses Bad user experience. Too minimal a set or wrong set of features. Will not engage users enough to gain valuable insights The product you want to build but to deliver all features will take too long Wasted work adding features that do not add critical value for winning and retaining customers Minimum MVP in a Nutshell
  • 64. Possible Data Viable Model should provide enough coverage that it can be used for core insights. Many models try to include all data and large numbers of attributes but that slows down innovation If precision is too low then the model can’t be trusted for even first level insights. Minimum More features can increase complexity without significant improvement in precision and recall Minimum Viable Model (MVM) Possible Features Minimum + Viable An Ideal MVM uses a modest amount of data, implements a relatively simple initial algorithm, has good precision (we aim for 98% or more) and enough recall to be used for core insights.
  • 65. Keep your eye on the target The goal is not to get a bulls eye every time The goal is to get the data and Learn
  • 66. Test & Ops = Data Science
  • 67. Six Keys to a “Big” Magic Show Try, Try, Try Again The Tyrany of Counting Magic Tricks (A/B Testing, Runtime Flags) The Venue (Big Data Infrastructure) Foundation (Tools for Big Data) Security (Protection, Privacy, Fraud) The Assistant Recruit, Train, & Retain “Big Data” Search Trends @rkjohnston #DataMagic
  • 68. Big Data: The Magic to Attain New Heights Ken Johnston Principal Data Science Manager Twitter – @rkjohnston Blog – http://linkedin.com/in/rkjohnston Email – kenj@Microsoft.com LinkedIn - http://linkedin.com/in/rkjohnston @rkjohnston #DataMagic

Notas del editor

  1. They aren’t afraid to get their hands dirty in the data.
  2. They are uniquely gifted at connecting the dots.
  3. Through data they make original and deep insights.
  4. I have to tell them al the time just how amazing they are.
  5. My son gets this magic kit in a box. Within an hour of playing with it he comes to tell me how he’s going to be a magician and we have to throw a magic show.
  6. I thought I’d use his idea of creating a magic show as a way to talk about the magic of data science.
  7. Presenter guidance: Share how we think about the data platform in the cloud. Today, we’ll specifically talk about SQL in a VM (briefly), SQL DB, DocumentDB, HBase on HDInsight, and Tables/Blobs. There are lots of other adjacent services such as Redis Cache, Event Hubs, HDInsight, Azure ML, Data Factory, Stream Analytics that will not be addressed in this deck. Slide talk track: The top row is Power BI – you’re making decisions based on data The middle row is ML, Stream Analytics, HDInsight, and Data Factory – processing and making sense of the data The bottom row is where you ingest and store data - With Azure, organizations have access to a whole range of services that allow them to use the right tool for the right job when developing applications. In the cloud, organizations can collect and manage data in the form in which it’s born and store it in the form that best suits an application’s needs.
  8. Clients: Common Library but support multiple OS. Front End: Telemetry and debug data come through Front End. PII Scrubbing: Happens at client and again upon ingestion. Cloud Platform: large scale, many developers, shared structured data. Cloud allows for elastic scaling APIs and Query Service: Allows access to refined data. Often data is piped to a SQL Server for KPIs and deep analysis Databases and Reporting Services: Deep analysis is usually done with tools like R Studio and Power Pivot for visualization. Dashboards monitor well known KPIs but are not insights.
  9. Clients: Common Library but support multiple OS. Front End: Telemetry and debug data come through Front End. PII Scrubbing: Happens at client and again upon ingestion. Cloud Platform: large scale, many developers, shared structured data. Cloud allows for elastic scaling APIs and Query Service: Allows access to refined data. Often data is piped to a SQL Server for KPIs and deep analysis Databases and Reporting Services: Deep analysis is usually done with tools like R Studio and Power Pivot for visualization. Dashboards monitor well known KPIs but are not insights.
  10. Ring 0: Buddy Build – Build may not have been checked in, pass component to buddy developer Ring 1: My Team – Should pass Unit and check-in tests Ring 2: Company and NDA – Pushing to these users is based upon quality gates and telemetry measures. Further progression all telemetry based. Ring 3: External Beta Users – Release based upon telemetry results. Release is metered by % and device models. Ring 4: Everyone – Product is available to for general adoption but may still use metered rollout. Rings 2-4: leverage rolling deployments (small % at a time) with metrics to stop and roll back
  11. Volume – How much data do you have and how much do you really need Variety – What Data sources do you have and how can they be combined for more value Velocity – Speed of data to insight impacts how you use it Verification – managing data quality and access control at all points Value – Big Data can be expensive and must produce valuable insights