As the first session in this four part series, the discussion will be aimed at getting everyone on the same page for later sessions.
We will look at mathematical notation, probability, expectation, variance, and end this session with common probability distributions and use cases.
Slides created by and workshop taught by:
Josh Bernard, Associate Data Science Instructor at Galvanize
3. Introduction to Probability
We are often interested in modeling the world using mathematical
distributions. In order to model the world, we have many mathematical
functions that are used called: Probability Distributions. Before working
with probability distributions, it is important to understand the
fundamentals of probability.
The probability of any event must be between 0 and 1, inclusive.
The sum of all probabilities across all events (or integral of the area
under a curve if continuous) must be 1.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 3 / 66
4. Introduction to Probability
When we model outcomes of the world, we use the following language for
the item we want to model Random Variable. A random variable might
be
Gender
Sales
Whether an individual buys our product or not
How much a stock price will change
Depending on the variable we would like to model, we choose a different
mathematical model (probability distribution). The random variable is
usually denoted by a letter X or Y are quite common.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 4 / 66
5. Introduction to Probability
There are many probability distributions, which are generally notated in
the following way:
f (x) = P(X = x)
Each probability distribution has parameters, which are generally denoted
with greek symbols: λ, α, β, µ, σ, etc. We generally estimate these
parameters using one of two methods: maximum likelihood or method
of moments estimation.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 5 / 66
6. Random Variables
Note: Random variables are usually denoted by uppercase letters near the
end of the alphabet, such as X, Y , and Z. We will use lowercase letters to
represent the values of the random variables, such as x, y, and z.
The term random variable is often abbreviated as r.v.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 6 / 66
7. Random Variables
Two Types of Random Variables
We distinguish two types of random variables:
discrete r.v. continuous r.v.
values can be counted values can assume any numerical
or listed value in one or more intervals on
the real number line
Examples: number of credit card actual filling amount of 11oz cans of soda
uses during the last month/customer percent increase or decrease of financial
number of patients having side stock compared to previous month
effects from taking a certain drug actual selling price of property
number of defectives in a shipment interest rate (in percent) for mortgage loan
Josh Bernhard Meetup 1: Colorado Data Science August 2015 7 / 66
8. Random Variables
Mean of a Discrete Random Variable
Suppose that X is a discrete random variable whose distribution is
value of X x1 x2 x3 · · · xk
probability p1 p2 p3 · · · pk
The mean (expected value) of X can be found by multiplying each
possible outcome of X with its respective probability pi and then adding
all the products:
µX = x1 ∗ p1 + x2 ∗ p2 + · · · + xk ∗ pk
= xi ∗ pi
Josh Bernhard Meetup 1: Colorado Data Science August 2015 8 / 66
9. Random Variables
Variance of a Discrete Random Variable
Suppose that X is a discrete random variable whose distribution is
value of X x1 x2 x3 · · · xk
probability p1 p2 p3 · · · pk
and µX is the mean of X. The variance of X
σ2
X = (x1 − µX )2
∗ p1 + (x2 − µX )2
∗ p2 + · · · + (xk − µX )2
∗ pk
= (xi − µX )2
∗ pi
The standard deviation σX of X is the square root of the variance.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 9 / 66
10. Random Variables
Example: approved # of mortgages
approved mortgages X 0 1 2 3 4 5 6
probability 0.10 0.10 0.20 0.30 0.15 0.10 0.05
For this small example, we could find the:
Mean
Variance, Standard Deviation
Prove this is a probability Distribution
Josh Bernhard Meetup 1: Colorado Data Science August 2015 10 / 66
11. Types of Distributions
Probability Density Functions
Probability Density Functions
Cumulative Density Functions (Risk Profiles)
Discrete Distributions
Binomial Distribution
Geometric Distribution
Negative Binomial Distribution
Poisson Distribution
Continuous Distributions
Exponential Distribution
Beta Distribution
Gamma Distribution
Uniform Distribution
Normal Distribution
Josh Bernhard Meetup 1: Colorado Data Science August 2015 11 / 66
12. Types of Distributions
Probability Density Functions
Probability density (or mass for discrete) functions show the P(X = x).
All probability density functions must meet the following two criteria:
For all distributions we also assume that each trial is independent of
another trial. There are methods for dealing with non-independent data
that we may discuss later this semester.
. . .
Josh Bernhard Meetup 1: Colorado Data Science August 2015 12 / 66
13. Types of Distributions
Cumulative Density Functions (Risk Profiles)
Cumulative density functions show the P(X ≤ x). Therefore, all
cumulative density functions:
Start at 0 and end at 1.
At any point where the slope is zero within the cumulative
distribution, the probability of this event is zero.
For continuous distributions the cumulative distribution function is
smooth (hill).
For discrete distributions the cumulative distribution function takes
steps (stairs).
. . .
Josh Bernhard Meetup 1: Colorado Data Science August 2015 13 / 66
14. Types of Distributions
Computing
In many cases of mathematical modeling, we use a computer to show that
a mathematical model is appropriate for a specific situation by simulating
from our mathematical. We then compare the simulated values to the
truth of our data, and we can show that they are not (or are) statistically
different. A common non-parametric approach for completing this process
is known as a Kolmogrov-Smirnov test.
Python (SciPy), R (Base) and Excel all have built in functions for different
probability density functions to my knowledge, so this process is not
difficult to complete.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 14 / 66
16. Conditional Probability
Another important concept of probability (especially in the Bayesian
world) is that of conditional probability.
Conditional Probability is defined mathematically in terms of events A
and B as:
P(A|B) = P(A∩B)
P(B)
which comes from the fact that:
P(A ∩ B) = P(A|B)P(B).
We can further break down the conditional statement as:
P(A|B) = P(B|A)P(A)
P(B|A)P(A)+P(B|A )P(A )
This break down of conditional probability is known as Bayes’ Theorem.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 16 / 66
17. Conditional Probability
Simple Discrete Example
Example: Consider that a business has 4 departments. We are interested
in using the following table to find conditional probabilities within each
department for male and female employees.
Table : Conditional Distributions
Department Male Female Total
A 342 230 572
B 25 31 56
C 135 144 279
D 487 312 799
Total 989 717 1706
Josh Bernhard Meetup 1: Colorado Data Science August 2015 17 / 66
18. Conditional Probability
Simple Discrete Example
What is the probability that an employee is male? female?
What is the probability that an employee is male given that the
employee is in department A?
What is the probability that an employee is in department A given
that the employee is female?
What is the probability that the individual is in department A or
department B given that we know they are female?
Josh Bernhard Meetup 1: Colorado Data Science August 2015 18 / 66
22. Types of Distributions
Binomial Distribution
The binomial distribution can be used anytime there are two outcomes:
”success” and ”failure.” We provide a 1 for outcomes observed as a
success and a 0 for outcomes ovserved as a failure. We consider some
probability of success denoted as p and the probability of failure is denoted
as 1 − p. The number of trials that occur is denoted as n. Some instances
in which the binomial distribution is used:
We have a machine that creates parts with some probability of defect.
In baseball, each batter has a probability of successfully hitting the
ball.
In any sport, there is a probability of winning.
There is some probability that a stock will increase in price.
Coin flips are binomial trials.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 22 / 66
23. Types of Distributions
Binomial Distribution
The probability mass function for a binomial distribution is:
P(X = x) = n
x px (1 − p)n−x ; x = 0, 1, 2, ..., n
The expected value for a binomial distribution is:
E[X] = np
The variance for a binomial distribution is:
Var[X] = np(1 − p)
Josh Bernhard Meetup 1: Colorado Data Science August 2015 23 / 66
24. Types of Distributions
Binomial Distribution
Example: Consider that a machince creates 100 parts per hour. The
machine has a failure rate of 3%. You are interested in finding the answer
to the following questions for the upcoming hour:
What is the expected number of defective parts?
What is the probability that the machine creates exactly 4 defective
parts?
What is the probability that the machine creates 4 or more defective
parts?
What is the standard deviation of defective parts?
Josh Bernhard Meetup 1: Colorado Data Science August 2015 24 / 66
26. Types of Distributions
Geometric Distribution
The geometric distribution is used when we are interested in the number
of ”failures” that must occur until the first ”success”. It is identified by a
single parameter, p, the probability of success. The larger p is, the faster
the chance you will obtain your first success. Therefore, (1 − p) provides
the probability of failure.
The number of successful parts produced until the first defective part.
The number of successful baskets made until first missed basket.
The number of wins until first loss.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 26 / 66
27. Types of Distributions
Geometric Distribution
The probability that the xth trial results in the first success is provided
by the probability mass function for a geometric distribution as:
P(X = x) = (1 − p)x−1p; x = 1, 2, 3, ..
The expected value for a geometric distribution is:
E[X] = 1
p
The variance for a geometric distribution is:
Var[X] = 1−p
p2
Josh Bernhard Meetup 1: Colorado Data Science August 2015 27 / 66
28. Types of Distributions
Geometric Distribution
Example: Consider a basketball player with a free throw shooting
percentage of 93%. We are interested in answering the following
questions:
What is the expected number of baskets he will make before his first
miss?
What is the probability that the first miss will occur on the 3rd shot?
What is the probability the first miss occurs before the 5th shot?
What is the standard deviation of the number of baskets he will make?
Josh Bernhard Meetup 1: Colorado Data Science August 2015 28 / 66
30. Types of Distributions
Negative Binomial Distribution
The negative binomial distribution is used when we are interested in the
number of ”failures” that must occur until the rth ”success” occurs. It is
identified by a single parameter, p, the probability of a failure. It is
essentially a string of geometric distributions.
The number of successful parts produced until the 7th defective part.
The number of successful baskets made until the 4th missed basket.
The number of wins until the 15th loss.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 30 / 66
31. Types of Distributions
Negative Binomial Distribution
If X is the number of trials required to produce r successes and p is
the probability of success, this provides the probability mass function
for a negative binomial distribution as:
P(X = x) = x−1
r−1 pr (1 − p)x − r; x = 0, 1, 2, 3, ..
The expected value for a negative binomial distribution is:
E[X] = r(1−p)
p
The variance for a negative binomial distribution is:
Var[X] = r(1−p)
p2
Josh Bernhard Meetup 1: Colorado Data Science August 2015 31 / 66
33. Types of Distributions
Poisson Distribution
Poisson distribution pdf and cdf examples
Josh Bernhard Meetup 1: Colorado Data Science August 2015 33 / 66
34. Types of Distributions
Poisson Distribution
The poisson distribution can be used anytime we are interested in
modeling count data. There is one parameter, λ, for the poisson
distribution. For this model, the mean and variance are equal. However,
there are methods to change this relationship that we may discuss later
this semester. Methods include a zero-inflated poisson and including an
overdispersion or underdispersion parameter in modeling. Some instances
in which a poisson can be used include:
The number of defective parts per shipment.
The number of hits per hundred at bats.
The number of chocolate chips in a box of cookies.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 34 / 66
35. Types of Distributions
Poisson Distribution
The probability mass function for a poisson distribution is:
P(X = x) = λx e−λ
x! ; x = (0, 1)
The expected value for a poisson distribution is:
E[X] = λ
The variance for a poisson distribution is:
Var[X] = λ
Josh Bernhard Meetup 1: Colorado Data Science August 2015 35 / 66
36. Types of Distributions
Poisson Distribution
Example: Consider that the number of accidents that occur at a
particular intersection can be considered to follow a poisson distribution
and that the average number of accidents per year at this interesection is
2. You are interested in finding the answer to the following questions for
the upcoming year:
What is the expected number of accidents?
What is the probability that the number of accidents is exactly 1?
What is the probability that the number of accidents is greater than
3?
What is the standard deviation of the number of accidents?
Josh Bernhard Meetup 1: Colorado Data Science August 2015 36 / 66
39. Types of Distributions
Exponential Distribution
Exponential distribution pdf and cdf examples
Josh Bernhard Meetup 1: Colorado Data Science August 2015 39 / 66
40. Types of Distributions
Exponential Distribution
The exponential distribution is a continuous version of the geometric
distribution. Therefore, the distribution only takes on values greater than
0. The distribution is characterized by a single parameter, β. Both the
geometric and exponential distributions share a property called ”the
memoryless” property. This means that the probability of an event
occuring is not contigent on the past. Some instances in which an
exponential can be used include:
The amount of time until a machine fails.
The amount of time until a death occurs.
The amount of time until an insurance claim occurs
Josh Bernhard Meetup 1: Colorado Data Science August 2015 40 / 66
41. Types of Distributions
Exponential Distribution
The probability density function for an exponential distribution is:
P(X = x) = 1
β exp
−x
β ; x = (0, ∞)
The expected value for an exponential distribution is:
E[X] = β
The variance for an exponential distribution is:
Var[X] = β2
Josh Bernhard Meetup 1: Colorado Data Science August 2015 41 / 66
42. Types of Distributions
Exponential Distribution
Example: The amount of time that a union stays on strike follows an
exponential distribution with a mean of 10 days. You are interested in
finding the answer to the following questions:
What is the probability the strike lasts less than one day?
What is the probability the strike lasts less than six days?
What is the probability the strike lasts between six and seven days?
Find the conditional probability that a strike lasts less than 7 days,
given that the strike has already lasted six days.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 42 / 66
44. Types of Distributions
Beta Distribution
Beta distribution pdf and cdf examples
Josh Bernhard Meetup 1: Colorado Data Science August 2015 44 / 66
45. Types of Distributions
Beta Distribution
The beta distribution can be used anytime we are interested in modeling a
situation that is a proportion or probability. For the beta distribution, the
outcome must be between 0 and 1. The beta distribution has two
parameters that combine to control both the center and the spread of the
distirbution. The parameters are generally denoted by two positive
parameters α and β. Some instances in which a beta can be used include:
The probability for a binomial or geometric.
The percentage correct on an exam.
The probability of successfully completing any task.
The beta distribution is a conjugate prior for the binomial distribution.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 45 / 66
46. Types of Distributions
Beta Distribution
The probability density function for a beta distribution is:
P(X = x) = Γ(α+β)
Γ(α)Γ(β) xα−1(1 − x)β−1; x = (0, 1)
The expected value for a beta distribution is:
E[X] = α
α+β
The variance for a beta distribution is:
Var[X] = αβ
(α+β)2(α+β+1)
The Γ(x) function is defined for integer values as:
Γ(x) = (x − 1)!
Josh Bernhard Meetup 1: Colorado Data Science August 2015 46 / 66
47. Types of Distributions
Beta Distribution
Example: Consider that we believe exam scores for a particular class
follow a beta distribution with α = 5 and β = 2. We are interested in
finding the answer to the following questions:
What is the expected score on the exam?
If 90% is considered an A, what proportion of students received an A?
If 60% is considered failing, what percentage of students failed?
What percentage passed?
What is the standard deviation of exam scores?
Josh Bernhard Meetup 1: Colorado Data Science August 2015 47 / 66
48. Gamma and Chi-Squared Distributions
Josh Bernhard Meetup 1: Colorado Data Science August 2015 48 / 66
49. Types of Distributions
Gamma Distribution
Gamma distribution pdf and cdf examples
Josh Bernhard Meetup 1: Colorado Data Science August 2015 49 / 66
50. Types of Distributions
Gamma Distribution
Remember back to the the relationship between the geometric distribution
and the negative binomial distribution. A similar relationship holds for the
exponential distribution and the gamma distribution. Therefore, the
gamma distribution only takes on values greater than zero, and it has two
positive parameters again denoted in most cases by α and β. Some
instances in which a gamma distribution can be used include:
Modeling load on web servers.
Modeling rainfall.
Modeling the probability of ruin, insurance claims, and value at risk
calculations.
A χ2
df is equivalent to a gamma distribution where α = df
2 and β = 2.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 50 / 66
51. Types of Distributions
Gamma Distribution
The probability density function for a gamma distribution is:
P(X = x) = 1
Γ(α)βα xα−1exp
−x
β ; x = (0, ∞)
The expected value for a gamma distribution is:
E[X] = αβ
The variance for a gamma distribution is:
Var[X] = αβ2
The Γ(x) function is defined as:
Γ(x) = (x − 1)!
Josh Bernhard Meetup 1: Colorado Data Science August 2015 51 / 66
52. Types of Distributions
Gamma Distribution
Example: Consider that we believe car insurance claims can be expected
to follow a gamma distribution with α = 1 and β = 3000. We are
interested in answering the following questions:
This above described gamma is the same as what exponential? Show
this. Given that a claim occurs, what do we expect the amount of the
claim to be?
What percentage of claims are less than $1,000?
What percentage of claims are greater than $3,500?
If the distribution actually has α = 2, what are the answers to the two
parts listed above?
Josh Bernhard Meetup 1: Colorado Data Science August 2015 52 / 66
54. Types of Distributions
Uniform Distribution
Uniform distribution pdf and cdf examples
Josh Bernhard Meetup 1: Colorado Data Science August 2015 54 / 66
55. Types of Distributions
Uniform Distribution
The uniform distribution is used to model situations in which the
probability is equal for all possible outcomes. There are two parameters for
the uniform distribution, the lower bound and upper bound of possibilities,
often denoted as a and b. Some instances in which a uniform distribution
can be used include:
Modeling probabilities where the true probability is completely
unknown.
Modeling the percentage of a gas in a mixture.
A beta distribution where both α = 1 and β = 1 is equivalent to a
uniform distribution where a = 0 and b = 1.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 55 / 66
56. Types of Distributions
Uniform Distribution
The probability density function for a uniform distribution is:
P(X = x) = 1
b−a ; x = [a, b]
The expected value for a uniform distribution is:
E[X] = 1
2(a + b)
The variance for a uniform distribution is:
Var[X] = 1
12(b − a)2
Josh Bernhard Meetup 1: Colorado Data Science August 2015 56 / 66
57. Types of Distributions
Uniform Distribution
Example: Consider that your company is having a strike, and at any point
the strike is just as likely to end as at any other point. Therefore, it can be
modeled using a uniform distribution over the next month of 30 days. We
are interested in answering the following questions:
What is the chance the strike is over in the first week (7 days)?
What is the chance the strike lasts longer than the first day and a half
(36 hours)?
What is the amount of time the strike is expected to last?
What is the standard deviation of the number of days the strike is
supposed to last?
Josh Bernhard Meetup 1: Colorado Data Science August 2015 57 / 66
59. Types of Distributions
Normal Distribution
Normal distribution pdf and cdf examples
Josh Bernhard Meetup 1: Colorado Data Science August 2015 59 / 66
60. Types of Distributions
Normal Distribution
The normal distribution is likely the most important distribution in all of
statistics. The perfectly symmetric, bell shaped distribution is used to
model many situations and is one of the most important distributions in
terms of inference due to the Central Limit Theorem (CLT). The
distribution has two parameters µ, the mean of the distribution and σ, the
standard deviation. Some instances in which a normal distribution can be
used include:
Modeling test scores.
Modeling the specifications of some manufactured part.
Modeling where the variability can be considered only random.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 60 / 66
61. Types of Distributions
Normal Distribution
Central Limit Theorem
The central limit theorem states that the mean of any independent and
identically distributed random variable will be normally distributed as long
as the sample size is ”sufficiently large.” It also provides normality for the
sum of independent and identically distributed random variables. This
allows us to build confidence intervals and provide inference for the
population mean (which is the basis of most introductory statistics
courses), as well as many other parameters. Many statistics (combinations
of the raw data) are normally distributed by the central limit theorem.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 61 / 66
62. Types of Distributions
Normal Distribution
Central Limit Theorem
If the sample size is sufficiently large, the sample mean will follow
¯x ∼ N(µx , ( σx√
n
)2)
This allows us to build confidence intervals and conduct hypothesis
tests using T-distributions for the population mean in the following
way:
(¯x − T∗
n−1( s√
n
), ¯x + T∗
n−1( s√
n
)) Tn−1 = ¯x−µ0
s√
n
Josh Bernhard Meetup 1: Colorado Data Science August 2015 62 / 66
63. Types of Distributions
Normal Distribution
The probability density function for a normal distribution is:
P(X = x) = 1√
2πσ
exp( 1
2σ2 (x − µ)2); x = (−∞, ∞)
The expected value for the normal distribution:
E[X] = µ
The variance for a normal distribution is:
Var[X] = σ2
Josh Bernhard Meetup 1: Colorado Data Science August 2015 63 / 66
64. Types of Distributions
Normal Distribution
Example: Consider the amount paid out on any home owner’s insurance
policy (given that there is a claim) is normally distributed with a mean,
µ = $5, 000 and a variance, σ2 = $16, 000 We are interested in answering
the following questions:
Given that a claim occurs, what is the probability the claim is greater
than $10,000?
Given that a claim occurs, what is the probability the claim is less
than $2,500?
Given that a claim occurs, what is the probability the claim is
between $7,500 and $9,000?
Josh Bernhard Meetup 1: Colorado Data Science August 2015 64 / 66
65. Types of Distributions
Normal Distribution
Example: A packaging company uses a machine to fill bags a chips.
These bags are advertised to contain 8.5 ounces. In fact, the manufacture
believes the contents vary according to a gamma distribution with an
α = 900 and a β = 1
100. An independent researcher wants to conduct a
study to determine if the manufactures advertisements are false. This
researcher collects data on 49 randomly selected bags of chips.
Based on the assumption of the manufacturer, can you determine the
probability that a randomly selected bag will have at least the
advertised amount?
What is the sampling distribution for the sample means of 49 bags of
chips?
Josh Bernhard Meetup 1: Colorado Data Science August 2015 65 / 66
66. Types of Distributions
Normal Distribution
Example: A packaging company uses a machine to fill bags a chips.
These bags are advertised to contain 8.5 ounces. In fact, the manufacture
believes the contents vary according to a gamma distribution with an
α = 900 and a β = 1
100. An independent researcher wants to conduct a
study to determine if the manufactures advertisements are false. This
researcher collects data on 49 randomly selected bags of chips.
Briefly explain whether or not the Central Limit Theorem was needed
in specifying the sampling distribution.
What is the probability that for random samples of size 49, the
average weight for bags of chips is greater than 8.5 ounces.
Josh Bernhard Meetup 1: Colorado Data Science August 2015 66 / 66