SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
How to Use Survival Analytics
to Predict Churn
(Or, Why You Shouldn't Use Logistic Regression to Predict Attrition)
Talent Analytics, Corp.
Pasha Roberts, Chief Scientist
June 20, 2017
Chicago USA
Demo Code: https://github.com/talentanalytics/class_survival_101/
www.talentanalytics.com © 2017 Talent Analytics, Corp 1 / 41
▸
▸
▹
▹
▸
▸
Who is Talent Analytics?
Predictive Modeling Platform – Advisor™
We predict employee attrition and performance pre-hire.
Much like credit risk modeling:
Predict likelihood to pay / default on mortgage, before extending credit
Predict likelihood to perform / leave role or company early, before extending job offer
PAAS (Prediction As A Service)
Seamless deployment of predictions into talent acquisition process
www.talentanalytics.com © 2017 Talent Analytics, Corp 2 / 41
Turnover and Tenure:
A Timey-Wimey Relationship
www.talentanalytics.com © 2017 Talent Analytics, Corp 3 / 41
▸
▸
Two Different (But Related) Things
Turnover: Percent of employees (customers)
that terminate within a period of time
Tenure: Employees’ (customers’) length of
time with company
Is it a Wave or a Particle?
www.talentanalytics.com © 2017 Talent Analytics, Corp 4 / 41
Low Tenure, High Turnover
www.talentanalytics.com © 2017 Talent Analytics, Corp 5 / 41
High Tenure, Low Turnover
www.talentanalytics.com © 2017 Talent Analytics, Corp 6 / 41
High Tenure, High Turnover (in 2015)
www.talentanalytics.com © 2017 Talent Analytics, Corp 7 / 41
Low Tenure, Low Turnover (in Nov, Dec)
www.talentanalytics.com © 2017 Talent Analytics, Corp 8 / 41
▸
▸
▸
Main Problem: We Can't See the Future
Everyone will terminate.
Someday, somehow... But when?
Technical Lingo: Right Censoring
www.talentanalytics.com © 2017 Talent Analytics, Corp 9 / 41
▸
▸
▸
Each Side Has Missing Information
Turnover Has No Information About Tenure
Treats a temp the same as a seasoned veteran
Ignores patterns - such as people quitting right after a bonus
Tenure Has No Information About Turnover
High Tenure = staying power or retirement risk?
Each Side Varies Depending on When You Look
www.talentanalytics.com © 2017 Talent Analytics, Corp 10 / 41
▸
▸
Annual Attrition Timing is Arbitrary
What if you could see attrition
at every point in time?
“One Year” only matters to
accountants and astronomers
Cumulative Breakeven matters
more to the business
www.talentanalytics.com © 2017 Talent Analytics, Corp 11 / 41
Survival 101
www.talentanalytics.com © 2017 Talent Analytics, Corp 12 / 41
▸
▸
▸
▸
▸
▸
Survival Analytics
Medical Roots - How long will a patient survive a disease?
Engineering Roots - How long until a machine fails?
Social Science - How long do people live?
Employment - How long until an employee terminates? or promotes?
In General - “Expected duration of time until one or more events happen”
Artfully combines turnover and tenure
Lots of competing jargon and syntax from many application domains:
Failure Rate, Hazard Rate, Force of Mortality, ...
www.talentanalytics.com © 2017 Talent Analytics, Corp 13 / 41
▸
▹
▹
▹
▸
▸
▸
Hazard Rate:
Conditional Daily Risk of Termination
Every day in tenure there is a (small)
probability that the employee will terminate
or the customer will drop...
or the patient will die...
or the event will occur
Conditional on survival to the prior day
Like an actuarial life table
Can rise at different times:
e.g. training, review or bonus time
www.talentanalytics.com © 2017 Talent Analytics, Corp 14 / 41
▸
▸
▸
▸
Survival Rate: Daily Probability
of Being Employed at Time t
How likely are you to be working here,
t days after being hired?
Near 100% on Day 1
(some people actually never show up)
Downhill from there - usually not linear
One role at a time
www.talentanalytics.com © 2017 Talent Analytics, Corp 15 / 41
▸
▸
▸
▸
▸
▸
▹
▹
Survival Rate: Daily Probability
of Being Employed at Time t
How likely are you to be working here,
t days after being hired?
Near 100% on Day 1
(some people actually never show up)
Downhill from there - usually not linear
One role at a time
This is the full picture
Simple transformation of the Hazard Curve
Best to use statistical tool (R, SAS)
Apparently possible in Excel
www.talentanalytics.com © 2017 Talent Analytics, Corp 15 / 41
Attrition is Built Into Survival Curves
www.talentanalytics.com © 2017 Talent Analytics, Corp 16 / 41
Shelf Pattern Sparse Pattern
Many Shapes
www.talentanalytics.com © 2017 Talent Analytics, Corp 17 / 41
Heavy Terms During Training Heavy Terms After 1-Year Bonus
Cliffs or Kinks in the Survival Curve
www.talentanalytics.com © 2017 Talent Analytics, Corp 18 / 41
Everyone Has a Survival Curve
www.talentanalytics.com © 2017 Talent Analytics, Corp 19 / 41
Predicting Survival
To the Individual
www.talentanalytics.com © 2017 Talent Analytics, Corp 20 / 41
▸
▸
▸
▹
▸
▸
▸
Don't Use Logistic Methods
to Predict Attrition!
Commonly Done:
Predict one-year attrition with logistic
Use tenure as a variable along with others
Common Result:
“The biggest cause of termination is tenure”
Not an actionable coef cient
Ignores nuances available to survival methods
Mishandles current employee tenure
Less accurate
www.talentanalytics.com © 2017 Talent Analytics, Corp 21 / 41
▸
▸
▸
▸
Proportional Hazard
Most Survival models are “Proportional Hazard”
Start with Base Survival rate
Convert to Base Cumulative Hazard rate
Multiply Cumulative Hazard by a single factor
(hence “Proportional Hazard")
Linear changes to Hazard lead
to a rotation of Survival Curve
Predictive Goal:
Predict the amount of Survival Curve
rotation for each candidate
www.talentanalytics.com © 2017 Talent Analytics, Corp 22 / 41
▸
▹
▹
▹
▹
▸
▸
▸
▸
Proportional Hazard Prediction Methods
Predicting a Single Number
Cox regression
Vanilla Cox
Stepwise
ElasticNet or LASSO
Variable selection with Random Forest
Decision Trees
Random Forests
Neural Networks
SVM, etc
There are also other survival algorithms,
including parametric methods
www.talentanalytics.com © 2017 Talent Analytics, Corp 23 / 41
▸
▸
▸
▸
Build/Validate a Survival Model
With Thresholds
The Model Predicts:
“Dark Blue” hires will survive the longest
“Light Blue” hires will survive above-median
“Grey” survival curve is current median
survival.
“Light Red” and “Dark Red” candidates are not
likely to survive.
www.talentanalytics.com © 2017 Talent Analytics, Corp 24 / 41
Deeper Dive:
Predicting Survival
with Proportional Hazard
Follow Demo Code on GitHub:
https://github.com/talentanalytics/class_survival_101/
www.talentanalytics.com © 2017 Talent Analytics, Corp 25 / 41
▸
▸
▸
▸
▸
▸
Formulas and Jargon
Hazard Function:   
Probability of termination at time t, conditional on survival up to time t
Ranges from 0 to 1 (typically very small)
Cumulative Hazard Function:  
Cumulative conditional probability of termination up to time t
Ranges from 0 to 4ish (theoretically to In nity)
Survival Function:   
Probability that termination will be later than time t
Ranges from 1 to 0
Theoretically the formulas above are continuous measures, but in practice are discrete daily.
www.talentanalytics.com © 2017 Talent Analytics, Corp
h(t)
H(t) = h(x) dx = −log(S(t))∫
t
0
S(t) = e
−H(t)
26 / 41
Calculating Survival in R
Four Simple Steps with R survival package:
1. Gather and Prepare the Data
2. Calculate Baseline Survival Function
3. Model Proportional Hazard with Cox Regression
4. Validate Model
www.talentanalytics.com © 2017 Talent Analytics, Corp
(t)S0
27 / 41
▸
▸
▸
Calculating Survival in R
Four Simple Steps with R survival package:
1. Gather and Prepare the Data
2. Calculate Baseline Survival Function
3. Model Proportional Hazard with Cox Regression
4. Validate Model
5. Deploy Model into hiring
6. Hire people likely to stay on the job longer
7. Save and earn money
Avoid attributing savings to soft factors like ripple effect, morale, engagement
Hard savings from reduced Replacement Cost
Hard earnings from increased Employee Lifetime Value
www.talentanalytics.com © 2017 Talent Analytics, Corp
(t)S0
27 / 41
▸
▹
▸
▹
▹
▹
1) Gather Employee Data
> attr.data
emp.id hire.date term.date factor.x factor.y factor.z
<int> <dttm> <dttm> <dbl> <dbl> <dbl>
1 52 2016-02-23 <NA> 0.09555369 1.515028 102.3811
2 261 2014-10-30 <NA> 2.40144024 9.065855 100.5727
3 193 2013-12-25 2015-03-28 1.88463382 4.331642 105.6887
4 150 2013-03-27 2013-09-19 3.81625316 7.363454 110.6842
5 31 2015-03-25 <NA> 5.17392728 11.824601 110.0590
Simple Requirements
Just hire.date and term.date is all you need from HRIS
Leave term.date as NA if the employee haven't terminated yet
Merge with predictive input (independent) variables
e.g. assessment results, experience, CV detail, social media, semantic
Here we use factor.x, factor.y and factor.z
Are these variables available pre-hire?
www.talentanalytics.com © 2017 Talent Analytics, Corp 28 / 41
▸
▸
▹
▹
▸
▹
Calculate Fields for Survival
> attr.data %>% dplyr::select(-dplyr::contains("factor"))
emp.id hire.date term.date is.term end.date tenure.years
<int> <dttm> <dttm> <lgl> <dttm> <dbl>
1 52 2016-02-23 <NA> FALSE 2017-05-01 1.1854894
2 261 2014-10-30 <NA> FALSE 2017-05-01 2.5023956
3 193 2013-12-25 2015-03-28 TRUE 2015-03-28 1.2539357
4 150 2013-03-27 2013-09-19 TRUE 2013-09-19 0.4818617
5 31 2015-03-25 <NA> FALSE 2017-05-01 2.1026694
Simple Derivations
is.term: Have they terminated? Is term.date NA?
end.date: Either term.date, or censor date, depending on is.term
Could put transfer.date here with no is.term
The last date we know the employee was at work
tenure.years: What is employee tenure from hire.date to end.date?
All we know about the employee's tenure, as of censor date
www.talentanalytics.com © 2017 Talent Analytics, Corp 29 / 41
▸
▸
▸
Dataset is Ready
> glimpse(attr.data)
Observations: 400
Variables: 14
$ label <fctr> a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a,...
$ hire.date <dttm> 2013-04-20, 2016-01-14, 2013-09-20, 2014-05-17, 2016-...
$ term.date <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2016-...
$ is.term <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE...
$ end.date <dttm> 2017-05-01, 2017-05-01, 2017-05-01, 2017-05-01, 2017-...
$ tenure.years <dbl> 4.02975576, 1.29469635, 3.61029191, 2.95626651, 1.1463...
$ factor.x <dbl> 7.44846858, 1.08376058, 2.81983987, 0.56880027, 0.7868...
$ factor.y <dbl> 16.838313, 9.335225, 9.519951, 12.697147, 12.912898, 1...
$ factor.z <dbl> 103.10732, 100.37171, 99.76382, 95.88645, 97.61509, 10...
$ emp.id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
$ scale.x <dbl> 2.41351313, -0.24754913, 0.47829958, -0.46285224, -0.3...
$ scale.y <dbl> 1.31773198, -0.57718839, -0.53053544, 0.27187182, 0.32...
$ scale.z <dbl> 0.3407242, -0.1849069, -0.3017096, -1.0467221, -0.7145...
$ is.training <lgl> TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRU...
Example is 400-row simulated dataset
Scaled the input factors to consistent z-scale
Separated into 80% training and 20% validation datasets
www.talentanalytics.com © 2017 Talent Analytics, Corp 30 / 41
▸
2) Calculate Baseline Survival ( )
survival::Surv is a single object to hold Tenure and Turnover at once
> surv.obj <- survival::Surv(training.data$tenure.years, training.data$is.term)
Needs “time variable” tenure.years and “event variable” is.term
survival::survfit is the Kaplan-Meier Estimator of
> surv.fit <- survival::survfit(surv.obj ~ 1)
> summary(surv.fit)
Call: survfit(formula = surv.obj ~ 1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0.000 320 1 0.997 0.00312 0.991 1.000
0.249 308 1 0.994 0.00448 0.985 1.000
0.298 303 1 0.990 0.00554 0.980 1.000
0.320 298 1 0.987 0.00644 0.974 1.000
www.talentanalytics.com © 2017 Talent Analytics, Corp
(t)S0
(t)S0
31 / 41
Native Survival Plot Crafted ggplot2 Plot
Plot Baseline Survival
www.talentanalytics.com © 2017 Talent Analytics, Corp 32 / 41
▸
▸
▸
▸
▸
3) Back to Proportional Hazard
Consider a multiplier x that centers on 1
Multiply
for proportional (linear) changes to Hazard
Survival Curve rotates in response:
Beware - inverse relationship:
x ← 1.1 will increase , decrease
x ← 0.9 will decrease , increase
So we just need to predict x to predict Survival
One number drives the entire curve, based on baseline
www.talentanalytics.com © 2017 Talent Analytics, Corp
(t) = (t)xHx H0
(t) =Sx e
− (t)xH0
(t)Hx (t)Sx
(t)Hx (t)Sx
(t)H0
33 / 41
Model Proportional Hazard
with Cox Regression
> cox.model <- survival::coxph(formula = surv.obj ~ scale.x + scale.y + scale.z,
data = training.data)
> cox.model
Call:
survival::coxph(formula = surv.obj ~ scale.x + scale.y + scale.z,
data = training.data)
coef exp(coef) se(coef) z p
scale.x -0.4298 0.6506 0.0918 -4.68 0.0000028
scale.y -0.3204 0.7258 0.0957 -3.35 0.00082
scale.z -0.1967 0.8215 0.0984 -2.00 0.04558
Likelihood ratio test=38.2 on 3 df, p=0.000000025
n= 320, number of events= 125
www.talentanalytics.com © 2017 Talent Analytics, Corp 34 / 41
▸
▹
▹
▸
▹
▹
4) Predict validation.data
with New Cox Model
> cox.pred <- predict(cox.model, newdata = validation.data, type = "lp")
Note: we built model with training.data, now we predict against validation.data
Our model has never seen anything in validation.data
Great test to see how closely our predictions match known outcomes
The predict.coxph() function returns 5 different types of predictions
We want type = "lp" which is log(multiplier) to baseline
Other types won't work with survivalROC()
www.talentanalytics.com © 2017 Talent Analytics, Corp
(t)H0
35 / 41
▸
▸
▸
▸
How does ROC Work with Survival?
> roc.obj <- survivalROC::survivalROC(Stime = validation.data$tenure.years,
status = validation.data$is.term,
marker = cox.pred,
predict.time = 1,
lambda = 0.003)
> roc.obj$AUC
[1] 0.7102442
ROC typically evaluates classification methods
So, we make this a classi cation problem
Classify: Was the employee terminated at t = 1 or not?
Compare actual vs. predicted classi cations at all cutpoints
Our AUC is 0.71
Not bad for a censored attrition prediction!
www.talentanalytics.com © 2017 Talent Analytics, Corp 36 / 41
Plot AUC with ggplot2
www.talentanalytics.com © 2017 Talent Analytics, Corp 37 / 41
▸
▸
▸
▸
Don't Get Fooled by Randomness
We are predicting AUC, too
Let's try predicting 1000 random samples of the same underlying pattern
> test.repl <- replicate(1000, demoPrediction(verbose = FALSE))
> summary(test.repl)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3470 0.6191 0.6833 0.6771 0.7356 0.9446
AUC generally fell between 0.619 and 0.735, but 50% were outside this range
The likely AUC is closer to 0.68 - still good
What if you got a lucky (or unlucky) 80/20 validation split?
www.talentanalytics.com © 2017 Talent Analytics, Corp 38 / 41
▸
▸
Multiple Cross-Validation
Raises the Bar
Against Getting Fooled
Typically run 20-100 CV iterations
Can consume lots of core-hours if nested for
hyperparameters
www.talentanalytics.com © 2017 Talent Analytics, Corp 39 / 41
▸
▸
▸
▸
▸
Download and Experiment With this Code
These slides will be on PAW website
Download demo code and doc from
https://github.com/talentanalytics/class_survival_101/
Clone it, fork it, run it
Send bug xes
Let me know how it goes
www.talentanalytics.com © 2017 Talent Analytics, Corp 40 / 41
Questions and Discussion
Pasha Roberts
pasha@talentanalytics.com
Follow Demo Code on GitHub:
https://github.com/talentanalytics/class_survival_101/
www.talentanalytics.com © 2017 Talent Analytics, Corp 41 / 41

Más contenido relacionado

La actualidad más candente

1555 track 1 huang_using his mac
1555 track 1 huang_using his mac1555 track 1 huang_using his mac
1555 track 1 huang_using his macRising Media, Inc.
 
925 plenary rexer_using our laptop
925 plenary rexer_using our laptop925 plenary rexer_using our laptop
925 plenary rexer_using our laptopRising Media, Inc.
 
1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptop1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptopRising Media, Inc.
 
1000 track 1 groves_using our laptop
1000 track 1 groves_using our laptop1000 track 1 groves_using our laptop
1000 track 1 groves_using our laptopRising Media, Inc.
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptopRising Media, Inc.
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
 
20131108 - Big Data presentation at the 2nd GDG Brunei Darussalam DevFest
20131108 - Big Data presentation at the 2nd GDG Brunei Darussalam DevFest20131108 - Big Data presentation at the 2nd GDG Brunei Darussalam DevFest
20131108 - Big Data presentation at the 2nd GDG Brunei Darussalam DevFestIzam Ryan
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)Open Analytics
 
The Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingThe Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingDecision Management Solutions
 

La actualidad más candente (20)

1120 track1 taylor
1120 track1 taylor1120 track1 taylor
1120 track1 taylor
 
1555 track 1 huang_using his mac
1555 track 1 huang_using his mac1555 track 1 huang_using his mac
1555 track 1 huang_using his mac
 
1530 track2 reid
1530 track2 reid1530 track2 reid
1530 track2 reid
 
1615 track1 schleicher
1615 track1 schleicher1615 track1 schleicher
1615 track1 schleicher
 
925 plenary rexer_using our laptop
925 plenary rexer_using our laptop925 plenary rexer_using our laptop
925 plenary rexer_using our laptop
 
1055 track3 soules
1055 track3 soules1055 track3 soules
1055 track3 soules
 
940 diamond sponsor sengupta
940 diamond sponsor sengupta940 diamond sponsor sengupta
940 diamond sponsor sengupta
 
1120 track1 grossman
1120 track1 grossman1120 track1 grossman
1120 track1 grossman
 
1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptop1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptop
 
1315 keynote jopia_shareable
1315 keynote jopia_shareable1315 keynote jopia_shareable
1315 keynote jopia_shareable
 
1000 track 1 groves_using our laptop
1000 track 1 groves_using our laptop1000 track 1 groves_using our laptop
1000 track 1 groves_using our laptop
 
1645 track 3 porter
1645 track 3 porter1645 track 3 porter
1645 track 3 porter
 
1620 track1 lee
1620 track1 lee1620 track1 lee
1620 track1 lee
 
1530 track1 rosenbaum
1530 track1 rosenbaum1530 track1 rosenbaum
1530 track1 rosenbaum
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptop
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 
20131108 - Big Data presentation at the 2nd GDG Brunei Darussalam DevFest
20131108 - Big Data presentation at the 2nd GDG Brunei Darussalam DevFest20131108 - Big Data presentation at the 2nd GDG Brunei Darussalam DevFest
20131108 - Big Data presentation at the 2nd GDG Brunei Darussalam DevFest
 
920 plenary elder
920 plenary elder920 plenary elder
920 plenary elder
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)
 
The Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingThe Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision Modeling
 

Similar a 1440 track2 roberts

1505 Statistical Thinking course extract
1505 Statistical Thinking course extract1505 Statistical Thinking course extract
1505 Statistical Thinking course extractJefferson Lynch
 
Rethinking Risk-Based Project Management in the Emerging IT initiatives.pptx
Rethinking Risk-Based Project Management in the Emerging IT initiatives.pptxRethinking Risk-Based Project Management in the Emerging IT initiatives.pptx
Rethinking Risk-Based Project Management in the Emerging IT initiatives.pptxInflectra
 
The Datafication of HR: Graduating from Metrics to Analytics
The Datafication of HR: Graduating from Metrics to AnalyticsThe Datafication of HR: Graduating from Metrics to Analytics
The Datafication of HR: Graduating from Metrics to AnalyticsVisier
 
Cloud: Risk profiles, Preference and Decisions in Cloud
Cloud: Risk profiles, Preference and Decisions in CloudCloud: Risk profiles, Preference and Decisions in Cloud
Cloud: Risk profiles, Preference and Decisions in CloudAmazon Web Services
 
Scalable agile metrics to optimize flow and reduce delivery risks
Scalable agile metrics to optimize flow and reduce delivery risksScalable agile metrics to optimize flow and reduce delivery risks
Scalable agile metrics to optimize flow and reduce delivery risksAdonis El Fakih
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Siemens Resource Planning Case Study
Siemens Resource Planning Case StudySiemens Resource Planning Case Study
Siemens Resource Planning Case StudyGreg Bailey
 
project managment.ppt
project managment.pptproject managment.ppt
project managment.pptHinaAsghar16
 
Stop Talking About What You Do and Start Talking About What You Deliver - Pro...
Stop Talking About What You Do and Start Talking About What You Deliver - Pro...Stop Talking About What You Do and Start Talking About What You Deliver - Pro...
Stop Talking About What You Do and Start Talking About What You Deliver - Pro...Tim Creasey
 
Asking When, Not If in Predictive Modeling
Asking When, Not If in Predictive ModelingAsking When, Not If in Predictive Modeling
Asking When, Not If in Predictive ModelingAndrea Kropp
 

Similar a 1440 track2 roberts (20)

1505 Statistical Thinking course extract
1505 Statistical Thinking course extract1505 Statistical Thinking course extract
1505 Statistical Thinking course extract
 
Rethinking Risk-Based Project Management in the Emerging IT initiatives.pptx
Rethinking Risk-Based Project Management in the Emerging IT initiatives.pptxRethinking Risk-Based Project Management in the Emerging IT initiatives.pptx
Rethinking Risk-Based Project Management in the Emerging IT initiatives.pptx
 
The Datafication of HR: Graduating from Metrics to Analytics
The Datafication of HR: Graduating from Metrics to AnalyticsThe Datafication of HR: Graduating from Metrics to Analytics
The Datafication of HR: Graduating from Metrics to Analytics
 
Cloud: Risk profiles, Preference and Decisions in Cloud
Cloud: Risk profiles, Preference and Decisions in CloudCloud: Risk profiles, Preference and Decisions in Cloud
Cloud: Risk profiles, Preference and Decisions in Cloud
 
Scalable agile metrics to optimize flow and reduce delivery risks
Scalable agile metrics to optimize flow and reduce delivery risksScalable agile metrics to optimize flow and reduce delivery risks
Scalable agile metrics to optimize flow and reduce delivery risks
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
DFSS short
DFSS shortDFSS short
DFSS short
 
Siemens Resource Planning Case Study
Siemens Resource Planning Case StudySiemens Resource Planning Case Study
Siemens Resource Planning Case Study
 
9. Risk.ppt
9. Risk.ppt9. Risk.ppt
9. Risk.ppt
 
9. Risk.ppt
9. Risk.ppt9. Risk.ppt
9. Risk.ppt
 
project managment.ppt
project managment.pptproject managment.ppt
project managment.ppt
 
9. Risk.ppt
9. Risk.ppt9. Risk.ppt
9. Risk.ppt
 
9. Risk.ppt
9. Risk.ppt9. Risk.ppt
9. Risk.ppt
 
9. Risk.ppt
9. Risk.ppt9. Risk.ppt
9. Risk.ppt
 
9. Risk.ppt
9. Risk.ppt9. Risk.ppt
9. Risk.ppt
 
9. Risk.ppt
9. Risk.ppt9. Risk.ppt
9. Risk.ppt
 
Dear Dad
Dear DadDear Dad
Dear Dad
 
Project Management 08
Project Management 08Project Management 08
Project Management 08
 
Stop Talking About What You Do and Start Talking About What You Deliver - Pro...
Stop Talking About What You Do and Start Talking About What You Deliver - Pro...Stop Talking About What You Do and Start Talking About What You Deliver - Pro...
Stop Talking About What You Do and Start Talking About What You Deliver - Pro...
 
Asking When, Not If in Predictive Modeling
Asking When, Not If in Predictive ModelingAsking When, Not If in Predictive Modeling
Asking When, Not If in Predictive Modeling
 

Más de Rising Media, Inc.

1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptopRising Media, Inc.
 
1620 keynote olson_using our laptop
1620 keynote olson_using our laptop1620 keynote olson_using our laptop
1620 keynote olson_using our laptopRising Media, Inc.
 
1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptopRising Media, Inc.
 
1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptopRising Media, Inc.
 
1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptopRising Media, Inc.
 
1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptopRising Media, Inc.
 
855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptopRising Media, Inc.
 
1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareableRising Media, Inc.
 
905 keynote peele_using our laptop
905 keynote peele_using our laptop905 keynote peele_using our laptop
905 keynote peele_using our laptopRising Media, Inc.
 

Más de Rising Media, Inc. (20)

1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop
 
Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
 
Keynote adam greco
Keynote adam grecoKeynote adam greco
Keynote adam greco
 
1620 keynote olson_using our laptop
1620 keynote olson_using our laptop1620 keynote olson_using our laptop
1620 keynote olson_using our laptop
 
1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop
 
1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop
 
1415 track 2 richardson
1415 track 2 richardson1415 track 2 richardson
1415 track 2 richardson
 
1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop
 
1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop
 
915 e metrics_claudia perlich
915 e metrics_claudia perlich915 e metrics_claudia perlich
915 e metrics_claudia perlich
 
855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop
 
1615 plack using our laptop
1615 plack using our laptop1615 plack using our laptop
1615 plack using our laptop
 
1530 rimmele do not share
1530 rimmele do not share1530 rimmele do not share
1530 rimmele do not share
 
1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable
 
1115 fiztgerald schuchardt
1115 fiztgerald schuchardt1115 fiztgerald schuchardt
1115 fiztgerald schuchardt
 
1000 kondic do not share
1000 kondic do not share1000 kondic do not share
1000 kondic do not share
 
905 keynote peele_using our laptop
905 keynote peele_using our laptop905 keynote peele_using our laptop
905 keynote peele_using our laptop
 
Stephen morse sharable
Stephen morse sharableStephen morse sharable
Stephen morse sharable
 
Elder shareable
Elder shareableElder shareable
Elder shareable
 
1115 ramirez using our laptop
1115 ramirez using our laptop1115 ramirez using our laptop
1115 ramirez using our laptop
 

Último

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 

Último (20)

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 

1440 track2 roberts

  • 1. How to Use Survival Analytics to Predict Churn (Or, Why You Shouldn't Use Logistic Regression to Predict Attrition) Talent Analytics, Corp. Pasha Roberts, Chief Scientist June 20, 2017 Chicago USA Demo Code: https://github.com/talentanalytics/class_survival_101/ www.talentanalytics.com © 2017 Talent Analytics, Corp 1 / 41
  • 2. ▸ ▸ ▹ ▹ ▸ ▸ Who is Talent Analytics? Predictive Modeling Platform – Advisor™ We predict employee attrition and performance pre-hire. Much like credit risk modeling: Predict likelihood to pay / default on mortgage, before extending credit Predict likelihood to perform / leave role or company early, before extending job offer PAAS (Prediction As A Service) Seamless deployment of predictions into talent acquisition process www.talentanalytics.com © 2017 Talent Analytics, Corp 2 / 41
  • 3. Turnover and Tenure: A Timey-Wimey Relationship www.talentanalytics.com © 2017 Talent Analytics, Corp 3 / 41
  • 4. ▸ ▸ Two Different (But Related) Things Turnover: Percent of employees (customers) that terminate within a period of time Tenure: Employees’ (customers’) length of time with company Is it a Wave or a Particle? www.talentanalytics.com © 2017 Talent Analytics, Corp 4 / 41
  • 5. Low Tenure, High Turnover www.talentanalytics.com © 2017 Talent Analytics, Corp 5 / 41
  • 6. High Tenure, Low Turnover www.talentanalytics.com © 2017 Talent Analytics, Corp 6 / 41
  • 7. High Tenure, High Turnover (in 2015) www.talentanalytics.com © 2017 Talent Analytics, Corp 7 / 41
  • 8. Low Tenure, Low Turnover (in Nov, Dec) www.talentanalytics.com © 2017 Talent Analytics, Corp 8 / 41
  • 9. ▸ ▸ ▸ Main Problem: We Can't See the Future Everyone will terminate. Someday, somehow... But when? Technical Lingo: Right Censoring www.talentanalytics.com © 2017 Talent Analytics, Corp 9 / 41
  • 10. ▸ ▸ ▸ Each Side Has Missing Information Turnover Has No Information About Tenure Treats a temp the same as a seasoned veteran Ignores patterns - such as people quitting right after a bonus Tenure Has No Information About Turnover High Tenure = staying power or retirement risk? Each Side Varies Depending on When You Look www.talentanalytics.com © 2017 Talent Analytics, Corp 10 / 41
  • 11. ▸ ▸ Annual Attrition Timing is Arbitrary What if you could see attrition at every point in time? “One Year” only matters to accountants and astronomers Cumulative Breakeven matters more to the business www.talentanalytics.com © 2017 Talent Analytics, Corp 11 / 41
  • 12. Survival 101 www.talentanalytics.com © 2017 Talent Analytics, Corp 12 / 41
  • 13. ▸ ▸ ▸ ▸ ▸ ▸ Survival Analytics Medical Roots - How long will a patient survive a disease? Engineering Roots - How long until a machine fails? Social Science - How long do people live? Employment - How long until an employee terminates? or promotes? In General - “Expected duration of time until one or more events happen” Artfully combines turnover and tenure Lots of competing jargon and syntax from many application domains: Failure Rate, Hazard Rate, Force of Mortality, ... www.talentanalytics.com © 2017 Talent Analytics, Corp 13 / 41
  • 14. ▸ ▹ ▹ ▹ ▸ ▸ ▸ Hazard Rate: Conditional Daily Risk of Termination Every day in tenure there is a (small) probability that the employee will terminate or the customer will drop... or the patient will die... or the event will occur Conditional on survival to the prior day Like an actuarial life table Can rise at different times: e.g. training, review or bonus time www.talentanalytics.com © 2017 Talent Analytics, Corp 14 / 41
  • 15. ▸ ▸ ▸ ▸ Survival Rate: Daily Probability of Being Employed at Time t How likely are you to be working here, t days after being hired? Near 100% on Day 1 (some people actually never show up) Downhill from there - usually not linear One role at a time www.talentanalytics.com © 2017 Talent Analytics, Corp 15 / 41
  • 16. ▸ ▸ ▸ ▸ ▸ ▸ ▹ ▹ Survival Rate: Daily Probability of Being Employed at Time t How likely are you to be working here, t days after being hired? Near 100% on Day 1 (some people actually never show up) Downhill from there - usually not linear One role at a time This is the full picture Simple transformation of the Hazard Curve Best to use statistical tool (R, SAS) Apparently possible in Excel www.talentanalytics.com © 2017 Talent Analytics, Corp 15 / 41
  • 17. Attrition is Built Into Survival Curves www.talentanalytics.com © 2017 Talent Analytics, Corp 16 / 41
  • 18. Shelf Pattern Sparse Pattern Many Shapes www.talentanalytics.com © 2017 Talent Analytics, Corp 17 / 41
  • 19. Heavy Terms During Training Heavy Terms After 1-Year Bonus Cliffs or Kinks in the Survival Curve www.talentanalytics.com © 2017 Talent Analytics, Corp 18 / 41
  • 20. Everyone Has a Survival Curve www.talentanalytics.com © 2017 Talent Analytics, Corp 19 / 41
  • 21. Predicting Survival To the Individual www.talentanalytics.com © 2017 Talent Analytics, Corp 20 / 41
  • 22. ▸ ▸ ▸ ▹ ▸ ▸ ▸ Don't Use Logistic Methods to Predict Attrition! Commonly Done: Predict one-year attrition with logistic Use tenure as a variable along with others Common Result: “The biggest cause of termination is tenure” Not an actionable coef cient Ignores nuances available to survival methods Mishandles current employee tenure Less accurate www.talentanalytics.com © 2017 Talent Analytics, Corp 21 / 41
  • 23. ▸ ▸ ▸ ▸ Proportional Hazard Most Survival models are “Proportional Hazard” Start with Base Survival rate Convert to Base Cumulative Hazard rate Multiply Cumulative Hazard by a single factor (hence “Proportional Hazard") Linear changes to Hazard lead to a rotation of Survival Curve Predictive Goal: Predict the amount of Survival Curve rotation for each candidate www.talentanalytics.com © 2017 Talent Analytics, Corp 22 / 41
  • 24. ▸ ▹ ▹ ▹ ▹ ▸ ▸ ▸ ▸ Proportional Hazard Prediction Methods Predicting a Single Number Cox regression Vanilla Cox Stepwise ElasticNet or LASSO Variable selection with Random Forest Decision Trees Random Forests Neural Networks SVM, etc There are also other survival algorithms, including parametric methods www.talentanalytics.com © 2017 Talent Analytics, Corp 23 / 41
  • 25. ▸ ▸ ▸ ▸ Build/Validate a Survival Model With Thresholds The Model Predicts: “Dark Blue” hires will survive the longest “Light Blue” hires will survive above-median “Grey” survival curve is current median survival. “Light Red” and “Dark Red” candidates are not likely to survive. www.talentanalytics.com © 2017 Talent Analytics, Corp 24 / 41
  • 26. Deeper Dive: Predicting Survival with Proportional Hazard Follow Demo Code on GitHub: https://github.com/talentanalytics/class_survival_101/ www.talentanalytics.com © 2017 Talent Analytics, Corp 25 / 41
  • 27. ▸ ▸ ▸ ▸ ▸ ▸ Formulas and Jargon Hazard Function:    Probability of termination at time t, conditional on survival up to time t Ranges from 0 to 1 (typically very small) Cumulative Hazard Function:   Cumulative conditional probability of termination up to time t Ranges from 0 to 4ish (theoretically to In nity) Survival Function:    Probability that termination will be later than time t Ranges from 1 to 0 Theoretically the formulas above are continuous measures, but in practice are discrete daily. www.talentanalytics.com © 2017 Talent Analytics, Corp h(t) H(t) = h(x) dx = −log(S(t))∫ t 0 S(t) = e −H(t) 26 / 41
  • 28. Calculating Survival in R Four Simple Steps with R survival package: 1. Gather and Prepare the Data 2. Calculate Baseline Survival Function 3. Model Proportional Hazard with Cox Regression 4. Validate Model www.talentanalytics.com © 2017 Talent Analytics, Corp (t)S0 27 / 41
  • 29. ▸ ▸ ▸ Calculating Survival in R Four Simple Steps with R survival package: 1. Gather and Prepare the Data 2. Calculate Baseline Survival Function 3. Model Proportional Hazard with Cox Regression 4. Validate Model 5. Deploy Model into hiring 6. Hire people likely to stay on the job longer 7. Save and earn money Avoid attributing savings to soft factors like ripple effect, morale, engagement Hard savings from reduced Replacement Cost Hard earnings from increased Employee Lifetime Value www.talentanalytics.com © 2017 Talent Analytics, Corp (t)S0 27 / 41
  • 30. ▸ ▹ ▸ ▹ ▹ ▹ 1) Gather Employee Data > attr.data emp.id hire.date term.date factor.x factor.y factor.z <int> <dttm> <dttm> <dbl> <dbl> <dbl> 1 52 2016-02-23 <NA> 0.09555369 1.515028 102.3811 2 261 2014-10-30 <NA> 2.40144024 9.065855 100.5727 3 193 2013-12-25 2015-03-28 1.88463382 4.331642 105.6887 4 150 2013-03-27 2013-09-19 3.81625316 7.363454 110.6842 5 31 2015-03-25 <NA> 5.17392728 11.824601 110.0590 Simple Requirements Just hire.date and term.date is all you need from HRIS Leave term.date as NA if the employee haven't terminated yet Merge with predictive input (independent) variables e.g. assessment results, experience, CV detail, social media, semantic Here we use factor.x, factor.y and factor.z Are these variables available pre-hire? www.talentanalytics.com © 2017 Talent Analytics, Corp 28 / 41
  • 31. ▸ ▸ ▹ ▹ ▸ ▹ Calculate Fields for Survival > attr.data %>% dplyr::select(-dplyr::contains("factor")) emp.id hire.date term.date is.term end.date tenure.years <int> <dttm> <dttm> <lgl> <dttm> <dbl> 1 52 2016-02-23 <NA> FALSE 2017-05-01 1.1854894 2 261 2014-10-30 <NA> FALSE 2017-05-01 2.5023956 3 193 2013-12-25 2015-03-28 TRUE 2015-03-28 1.2539357 4 150 2013-03-27 2013-09-19 TRUE 2013-09-19 0.4818617 5 31 2015-03-25 <NA> FALSE 2017-05-01 2.1026694 Simple Derivations is.term: Have they terminated? Is term.date NA? end.date: Either term.date, or censor date, depending on is.term Could put transfer.date here with no is.term The last date we know the employee was at work tenure.years: What is employee tenure from hire.date to end.date? All we know about the employee's tenure, as of censor date www.talentanalytics.com © 2017 Talent Analytics, Corp 29 / 41
  • 32. ▸ ▸ ▸ Dataset is Ready > glimpse(attr.data) Observations: 400 Variables: 14 $ label <fctr> a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a,... $ hire.date <dttm> 2013-04-20, 2016-01-14, 2013-09-20, 2014-05-17, 2016-... $ term.date <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2016-... $ is.term <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE... $ end.date <dttm> 2017-05-01, 2017-05-01, 2017-05-01, 2017-05-01, 2017-... $ tenure.years <dbl> 4.02975576, 1.29469635, 3.61029191, 2.95626651, 1.1463... $ factor.x <dbl> 7.44846858, 1.08376058, 2.81983987, 0.56880027, 0.7868... $ factor.y <dbl> 16.838313, 9.335225, 9.519951, 12.697147, 12.912898, 1... $ factor.z <dbl> 103.10732, 100.37171, 99.76382, 95.88645, 97.61509, 10... $ emp.id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,... $ scale.x <dbl> 2.41351313, -0.24754913, 0.47829958, -0.46285224, -0.3... $ scale.y <dbl> 1.31773198, -0.57718839, -0.53053544, 0.27187182, 0.32... $ scale.z <dbl> 0.3407242, -0.1849069, -0.3017096, -1.0467221, -0.7145... $ is.training <lgl> TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRU... Example is 400-row simulated dataset Scaled the input factors to consistent z-scale Separated into 80% training and 20% validation datasets www.talentanalytics.com © 2017 Talent Analytics, Corp 30 / 41
  • 33. ▸ 2) Calculate Baseline Survival ( ) survival::Surv is a single object to hold Tenure and Turnover at once > surv.obj <- survival::Surv(training.data$tenure.years, training.data$is.term) Needs “time variable” tenure.years and “event variable” is.term survival::survfit is the Kaplan-Meier Estimator of > surv.fit <- survival::survfit(surv.obj ~ 1) > summary(surv.fit) Call: survfit(formula = surv.obj ~ 1) time n.risk n.event survival std.err lower 95% CI upper 95% CI 0.000 320 1 0.997 0.00312 0.991 1.000 0.249 308 1 0.994 0.00448 0.985 1.000 0.298 303 1 0.990 0.00554 0.980 1.000 0.320 298 1 0.987 0.00644 0.974 1.000 www.talentanalytics.com © 2017 Talent Analytics, Corp (t)S0 (t)S0 31 / 41
  • 34. Native Survival Plot Crafted ggplot2 Plot Plot Baseline Survival www.talentanalytics.com © 2017 Talent Analytics, Corp 32 / 41
  • 35. ▸ ▸ ▸ ▸ ▸ 3) Back to Proportional Hazard Consider a multiplier x that centers on 1 Multiply for proportional (linear) changes to Hazard Survival Curve rotates in response: Beware - inverse relationship: x ← 1.1 will increase , decrease x ← 0.9 will decrease , increase So we just need to predict x to predict Survival One number drives the entire curve, based on baseline www.talentanalytics.com © 2017 Talent Analytics, Corp (t) = (t)xHx H0 (t) =Sx e − (t)xH0 (t)Hx (t)Sx (t)Hx (t)Sx (t)H0 33 / 41
  • 36. Model Proportional Hazard with Cox Regression > cox.model <- survival::coxph(formula = surv.obj ~ scale.x + scale.y + scale.z, data = training.data) > cox.model Call: survival::coxph(formula = surv.obj ~ scale.x + scale.y + scale.z, data = training.data) coef exp(coef) se(coef) z p scale.x -0.4298 0.6506 0.0918 -4.68 0.0000028 scale.y -0.3204 0.7258 0.0957 -3.35 0.00082 scale.z -0.1967 0.8215 0.0984 -2.00 0.04558 Likelihood ratio test=38.2 on 3 df, p=0.000000025 n= 320, number of events= 125 www.talentanalytics.com © 2017 Talent Analytics, Corp 34 / 41
  • 37. ▸ ▹ ▹ ▸ ▹ ▹ 4) Predict validation.data with New Cox Model > cox.pred <- predict(cox.model, newdata = validation.data, type = "lp") Note: we built model with training.data, now we predict against validation.data Our model has never seen anything in validation.data Great test to see how closely our predictions match known outcomes The predict.coxph() function returns 5 different types of predictions We want type = "lp" which is log(multiplier) to baseline Other types won't work with survivalROC() www.talentanalytics.com © 2017 Talent Analytics, Corp (t)H0 35 / 41
  • 38. ▸ ▸ ▸ ▸ How does ROC Work with Survival? > roc.obj <- survivalROC::survivalROC(Stime = validation.data$tenure.years, status = validation.data$is.term, marker = cox.pred, predict.time = 1, lambda = 0.003) > roc.obj$AUC [1] 0.7102442 ROC typically evaluates classification methods So, we make this a classi cation problem Classify: Was the employee terminated at t = 1 or not? Compare actual vs. predicted classi cations at all cutpoints Our AUC is 0.71 Not bad for a censored attrition prediction! www.talentanalytics.com © 2017 Talent Analytics, Corp 36 / 41
  • 39. Plot AUC with ggplot2 www.talentanalytics.com © 2017 Talent Analytics, Corp 37 / 41
  • 40. ▸ ▸ ▸ ▸ Don't Get Fooled by Randomness We are predicting AUC, too Let's try predicting 1000 random samples of the same underlying pattern > test.repl <- replicate(1000, demoPrediction(verbose = FALSE)) > summary(test.repl) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.3470 0.6191 0.6833 0.6771 0.7356 0.9446 AUC generally fell between 0.619 and 0.735, but 50% were outside this range The likely AUC is closer to 0.68 - still good What if you got a lucky (or unlucky) 80/20 validation split? www.talentanalytics.com © 2017 Talent Analytics, Corp 38 / 41
  • 41. ▸ ▸ Multiple Cross-Validation Raises the Bar Against Getting Fooled Typically run 20-100 CV iterations Can consume lots of core-hours if nested for hyperparameters www.talentanalytics.com © 2017 Talent Analytics, Corp 39 / 41
  • 42. ▸ ▸ ▸ ▸ ▸ Download and Experiment With this Code These slides will be on PAW website Download demo code and doc from https://github.com/talentanalytics/class_survival_101/ Clone it, fork it, run it Send bug xes Let me know how it goes www.talentanalytics.com © 2017 Talent Analytics, Corp 40 / 41
  • 43. Questions and Discussion Pasha Roberts pasha@talentanalytics.com Follow Demo Code on GitHub: https://github.com/talentanalytics/class_survival_101/ www.talentanalytics.com © 2017 Talent Analytics, Corp 41 / 41