SlideShare una empresa de Scribd logo
1 de 209
Introduction to Biostatistics
Introduction
• Key words :
– Statistics , data , Biostatistics,
– Variable ,Population ,Sample
Mekele University: Biostatistics 2
Introduction
Some Basic concepts
Statistics is a field of study concerned with
1- collection, organization, summarization and
analysis of data.
2- drawing of inferences about a body of data
when only a part of the data is observed.
 Statisticians try to interpret and
communicate the results to others.
Mekele University: Biostatistics 3
* Biostatistics:
The tools of statistics are employed in many fields:
business, education, psychology, agriculture,
economics, … etc.
When the data analyzed are derived from the
biological science and medicine,
we use the term biostatistics to distinguish this
particular application of statistical tools and
concepts.
Mekele University: Biostatistics 4
Data:
The raw material of Statistics is data.
We may define data as figures. Figures result
from the process of counting or from taking a
measurement.
For example:
- When a hospital administrator counts the
number of patients (counting).
- When a nurse weighs a patient (measurement)
Mekele University: Biostatistics 5
Sources of
data
Records Surveys Experiments
Comprehensive Sample
Mekele University: Biostatistics 6
We search for suitable data to serve as the raw
material for our investigation.
Such data are available from one or more of the
following sources:
1- Routinely kept records.
For example:
- Hospital medical records contain immense
amounts of information on patients.
- Hospital accounting records contain a wealth of
data on the facility’s business activities.
Mekele University: Biostatistics 7
* Sources of Data:
2- Surveys:
The source may be a survey, if the data needed is
about answering certain questions.
For example:
If the administrator of a clinic wishes to obtain
information regarding the mode of transportation
used by patients to visit the clinic, then a survey
may be conducted among patients to obtain this
information.
Mekele University: Biostatistics 8
3- Experiments.
Frequently the data needed to answer
a question are available only as the
result of an experiment.
For example:
If a nurse wishes to know which of several strategies is
best for maximizing patient compliance,
she might conduct an experiment in which the
different strategies of motivating compliance
are tried with different patients.
Mekele University: Biostatistics 9
* A variable:
It is a characteristic that takes on different values
in different persons, places, or things.
For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a dental clinic.
Mekele University: Biostatistics 10
Types of variables
Quantitative Qualitative
Quantitative Variables
It can be measured in the
usual sense.
For example:
- the heights of adult
males,
- the weights of
preschool children,
- the ages of patients
seen in a dental clinic.
Mekele University: Biostatistics 11
Qualitative Variables
Many characteristics are not
capable of being measured.
Some of them can be ordered
or ranked.
For example:
- classification of people into socio-
economic groups,
- social classes based on income,
education, etc.
Types of quantitative
variables
Discrete Continuous
A discrete variable
is characterized by gaps or
interruptions in the
values that it can
assume.
For example:
- The number of daily
admissions to a general
hospital,
- The number of decayed,
missing or filled teeth per
child in an elementary
school.
Mekele University: Biostatistics 12
A continuous variable
can assume any value within a specified
relevant interval of values assumed
by the variable.
For example:
- Height,
- weight,
- skull circumference.
No matter how close together the
observed heights of two people, we
can find another person whose
height falls somewhere in between.
Interval
Types of variables &
scale of measurement
Quantitative variables
Numerical
Qualitative variables
Categorical
Ratio
Nominal
Ordinal
13Mekele University: Biostatistics
Nominal
unordered categories
numbers used to represent categories
averages are meaningless; look at
frequency/proportion in each category
dichotomous e.g. gender: male = 1, female = 0
polytomous e.g. blood type: O = 1, A = 2, B = 3,
AB = 4
Mekele University: Biostatistics 14
Ordinal
ordered categories
numbers used to represent categories
order matters; magnitude does not
differences between categories are
meaningless
Example:- severity of injury:
fatal = 1,
severe = 2,
moderate = 3,
minor = 4
Mekele University: Biostatistics 15
Interval
The differences between observational units is
equal
The zero point is arbitrary and does not infer the
absence of the property being measured
Examples:
Degrees Fahrenheit
The difference between 30 and 40 is the same as that
between 70 and 80 degrees. But 80 is not twice as hot as 40.
years:
The difference between 1993-1994 is the same as 1995-
1996, but year 0 was not the beginning of time.
Mekele University: Biostatistics 16
Ratio
The most detailed and objectively
interpretable of the measurement scales.
Interval scale with an absolute zero-it has a
true zero point (absence of property being
measured) as well as equal intervals
E.g. Height, weight, money, age, time, speed,
class size, the Kelvin scale of temperature
Mekele University: Biostatistics 17
Cont…
Independent variables
Precede dependent variables in time
Are often manipulated by the researcher
The treatment or intervention that is used in a
study
Dependent variables
What is measured as an outcome in a study
Values depend on the independent variable
Mekele University: Biostatistics 18
* A population:
It is the largest collection of values of a random
variable for which we have an interest at a
particular time.
For example:
• headache patients in a chiropractic office;
automobile crash victims in an emergency room
• In research, it is not practical to include all members
of a population
• Thus, a sample (a subset of a population) is taken
• Populations may be finite or infinite.
Mekele University: Biostatistics 19
A Sample
it is a part of a population
e.g. the fraction of these patients
Random sample
Subjects are selected from a population so that each
individual has an equal chance of being selected
Random samples are representative of the source
population
Non-random samples are not representative
May be biased regarding age, severity of the
condition, socioeconomic status etc
Mekele University: Biostatistics 20
21Mekele University: Biostatistics
Types of statistical methods
Descriptive statistics
Describe the data by summarizing them
Inferential statistics
Techniques, by which inferences are drawn for the
population parameters from the sample statistics
OR
sample statistics observed are inferred to the
corresponding population parameters
Mekele University: Biostatistics 22
Cont…
Parameter
Summary data from a population
Statistic
Summary data from a sample
Mekele University: Biostatistics 23
Examples of Scales of
Measurements
• Low income
ordinal
• CD4 count
ratio
• Year of birth
interval
• IQ scores
interval
• Severe injury
ordinal
• Raw score on a
statistics exam
interval
• Room temperature in
Kelvin
ratio
• Nationality of MU
students
nominal
24Mekele University: Biostatistics
Descriptive statistics
Strategies for understanding
the meanings of Data
Mekele University: Biostatistics 26
• Key words
Frequency table, bar chart ,range
width of interval , mid-interval
Histogram , Polygon
Descriptive statistics
Before performing any analyses, you must
first get to know your data
Descriptive statistics are used to
summarize data in the form of tables,
graphs and numerical measures
The summary technique used depends on
the data type under consideration
Mekele University: Biostatistics 27
Presentation techniques for
qualitative/categorical data
statistics
Frequency
Relative frequency
Cumulative frequency
Figure/chart
Pie chart
Bar chart
Mekele University: Biostatistics 28
Frequency Distribution
for Discrete Random Variables
Example:
Suppose that we take a sample of size 16 from children in
a primary school and get the following data about the
number of their decayed teeth,
3,5,2,4,0,1,3,5,2,3,2,3,3,2,4,1
To construct a frequency table:
1- Order the values from the smallest to the largest.
0,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5
2- Count how many numbers are the same.
Relative
Frequency
FrequencyNo. of decayed
teeth
0.0625
0.125
0.25
0.3125
0.125
0.125
1
2
4
5
2
2
0
1
2
3
4
5
116Total
Mekele University: Biostatistics 31
Representing the simple frequency table
using the bar chart
Number of decayed teeth
5.004.003.002.001.00.00
Frequency
6
5
4
3
2
1
0
22
5
4
2
1
We can represent
the above simple
frequency table
using the bar
chart.
Ordinal or nominal
data
Height of each bar
is the frequency of
that category
CONTI …
32Mekele University: Biostatistics
Cont…
0
10
20
30
40
50
%
Single Married Divorced Widowed
Marital status
Male
Female
33Mekele University: Biostatistics
Cont…
instead of “stacks” rising up from
horizontal (bar chart), we could plot
instead the shares of a pie
Recalling that a circle has 360 degrees
50% means 180 degrees
25% means 90 degrees
Mekele University: Biostatistics 34
Cont…
35Mekele University: Biostatistics
Mekele University: Biostatistics 36
Frequency Distribution
for Continuous Random Variables
For large samples, we can’t use the simple frequency table to
represent the data.
We need to divide the data into groups or intervals or
classes.
So, we need to determine:
1- The number of intervals (k).
Too few intervals are not good because information will be
lost.
Too many intervals are not helpful to summarize the data.
A commonly followed rule is that 6 ≤ k ≤ 15,
or the following formula may be used,
k = 1 + 3.322 (log n)
Mekele University: Biostatistics 37
2- The range (R).
It is the difference between the largest and the
smallest observation in the data set.
3- The Width of the interval (w).
Class intervals generally should be of the same
width. Thus, if we want k intervals, then w is
chosen such that
w ≥ R / k.
Mekele University: Biostatistics 38
Example:
Assume that the number of observations
equal 100, then
k = 1+3.322(log 100)
= 1 + 3.3222 (2) = 7.6  8.
Assume that the smallest value = 5 and the largest one of
the data = 61, then
R = 61 – 5 = 56 and
w = 56 / 8 = 7.
To make the summarization more
comprehensible, the class width may be 5
or 10 or the multiples of 10.
Table 1.4.1
Mekele University: Biostatistics 40
Example 2.3.1
• We wish to know how many class interval to have in the
frequency distribution of the data in Table 1.4.1 of ages of 189
subjects who Participated in a study on smoking cessation
Solution :
• Since the number of observations
equal 189, then
• k = 1+3.322(log 189)
• = 1 + 3.3222 (2.276)  9,
• R = 82 – 30 = 52 and
• w = 52 / 9 = 5.778
 It is better to let w = 10, then the intervals will be in the
form:
Mekele University: Biostatistics 41
FrequencyClass interval
1130 – 39
4640 – 49
7050 – 59
4560 – 69
1670 – 79
180 – 89
189Total
Sum of frequency
=sample size=n
Mekele University: Biostatistics 42
The Cumulative Frequency:
It can be computed by adding successive frequencies.
The Cumulative Relative Frequency:
It can be computed by adding successive relative
frequencies.
The Mid-interval:
It can be computed by adding the lower bound of the
interval plus the upper bound of it and then divide
over 2.
Mekele University: Biostatistics 43
For the above example, the following table represents the cumulative
frequency, the relative frequency, the cumulative relative frequency and the
mid-interval.
Cumulative
Relative
Frequency
Relative
Frequency
R.f
Cumulative
Frequency
Frequency
Freq (f)
Mid –
interval
Class
interval
0.05820.0582111134.530 – 39
-0.2434574644.540 – 49
0.6720-127-54.550 – 59
0.91010.2381-45-60 – 69
0.99480.08471881674.570 – 79
10.0053189184.580 – 89
1189Total
R.f= freq/n
Mekele University: Biostatistics 44
Example :
• From the above frequency table, complete the table then
answer the following questions:
1-The number of objects with age less than 50 years ?
2-The number of objects with age between 40-69 years ?
3-Relative frequency of objects with age between 70-79 years ?
4-Relative frequency of objects with age more than 69 years ?
5-The percentage of objects with age between 40-49 years ?
6- The percentage of objects with age less than 60 years ?
7-The Range (R) ?
8- Number of intervals (K)?
9- The width of the interval ( W) ?
Mekele University: Biostatistics 45
Representing the grouped frequency table using the
histogram
To draw the histogram, the true classes limits should be used. They can be
computed by subtracting 0.5 from the lower limit and adding 0.5 to the upper
limit for each interval.
FrequencyTrue class limits
1129.5 – <39.5
4639.5 – < 49.5
7049.5 – < 59.5
4559.5 – < 69.5
1669.5 – < 79.5
179.5 – < 89.5
189Total
0
10
20
30
40
50
60
70
80
34.5 44.5 54.5 64.5 74.5 84.5
Mekele University: Biostatistics 46
Representing the grouped frequency table using
the Polygon
0
10
20
30
40
50
60
70
80
34.5 44.5 54.5 64.5 74.5 84.5
Histogram
•continuous data divided into categories
•graphical representation of frequency
distribution
•height of each bar is the frequency of that
category
•assess skewness and modality of the data
47
Mekele University:
Biostatistics
CONTI…
48
Mekele University:
Biostatistics
frequency polygon
- is an alternative to the histogram
whereas in a histogram
- X-axis shows intervals of values
- Y-axis shows bars of frequencies
. in a frequency polygon
X-axis shows midpoints of intervals of
values
Y-axis shows dot instead of bars
49
Mekele University:
Biostatistics
CONTI…
50
Mekele University:
Biostatistics
Box plots
- discrete or continuous data
- displays the 25th, 50th and 75th percentiles of
the data also known as the first, second and third
quartiles respectively
- whiskers extend to adjacent values which are not
outliers
- outliers indicated as circles
- box shows the interquartile range of the data
can be used to assess skewness
51
Mekele University:
Biostatistics
CONT…
52
Mekele University:
Biostatistics
Two-way scatter plots
• used to assess the relationship between
two discrete or continuous measures
nature of the relationship described as
positive, negative or no relationship
53
Mekele University:
Biostatistics
Line graph
• two continuous measures each x value
has only one corresponding y value useful
for looking at patterns over time can be
used to compare 2 or more groups
54
Mekele University:
Biostatistics
CONTI…
0
0.5
1
1.5
2
2.5
3
3.5
0 0.5 1 1.5 2 2.5 3
55
Mekele University:
Biostatistics
Line Graph
0
10
20
30
40
50
60
1960 1970 1980 1990 2000
Year
MMR/1000
Year MMR
1960 50
1970 45
1980 26
1990 15
2000 12
Figure (1): Maternal mortality rate of (country), 1960-2000
56Mekele University: Biostatistics
GRAPHS & CHARTS – LINE GRAPH
57
Mekele University:
Biostatistics
Chapter-3
Measures of Central
Tendency
Mekele University: Biostatistics 59
key words:
Descriptive Statistic, measure of central
tendency ,statistic, parameter, mean (μ)
,median, mode.
Mekele University: Biostatistics 60
The Statistic and The Parameter
• A Statistic:
It is a descriptive measure computed from the data
of a sample.
• A Parameter:
It is a a descriptive measure computed from the data
of a population.
Since it is difficult to measure a parameter from the
population, a sample is drawn of size n, whose values are
 1 ,  2 , …,  n. From this data, we measure the statistic.
Mekele University: Biostatistics 61
Measures of Central Tendency
A measure of central tendency is a measure which indicates
where the middle of the data is.
The three most commonly used measures of central
tendency are:
The Mean, the Median, and the Mode.
The Mean:
It is the average of the data.
Mekele University: Biostatistics 62
The Population Mean:
 = which is usually unknown, then we use the
sample mean to estimate or approximate it.
The Sample Mean:
=
Example:
Here is a random sample of size 10 of ages, where
 1 = 42,  2 = 28,  3 = 28,  4 = 61,  5 = 31,
 6 = 23,  7 = 50,  8 = 34,  9 = 32,  10 = 37.
= (42 + 28 + … + 37) / 10 = 36.6
x
1
N
i
i
N
X

x
1
n
i
i
n
x

Mekele University: Biostatistics 63
Properties of the Mean:
• Uniqueness. For a given set of data there is one and
only one mean.
• Simplicity. It is easy to understand and to compute.
• Affected by extreme values. Since all values
enter into the computation.
Example: Assume the values are 115, 110, 119, 117, 121 and 126.
The mean = 118.
But assume that the values are 75, 75, 80, 80 and 280.
The mean = 118, a value that is not representative of the set of data
as a whole.
Mekele University: Biostatistics 64
The Median:
When ordering the data, it is the observation that divide
the set of observations into two equal parts such that half
of the data are before it and the other are after it.
• If n is odd, the median will be the middle of observations.
It will be the (n+1)/2 th ordered observation.
When n = 11, then the median is the 6th observation.
• If n is even, there are two middle observations.
• The median will be the mean of these two middle
observations.
It will be the [(n/2)th+((n/2)+1)th]/2 ordered observation.
When n = 12, then the median is an observation halfway
between the 6th and 7th ordered observation.
Mekele University: Biostatistics 65
Example:
For the same random sample, the ordered observations will
be as:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th observation, i.e. =
(32+34)/2 = 33.
Properties of the Median:
• Uniqueness. For a given set of data there is one and
only one median.
• Simplicity. It is easy to calculate.
• It is not affected by extreme values as is the
mean.
Mekele University: Biostatistics 66
The Mode:
It is the value which occurs most frequently.
If all values are different there is no mode.
Sometimes, there are more than one mode.
Example:
For the same random sample, the value 28 is
repeated two times, so it is the mode.
Properties of the Mode:
• Sometimes, it is not unique.
• It may be used for describing qualitative data.
Mekele University: Biostatistics 67
Quintiles
Quintiles
• Dividing the distribution of ordered values into
equal-sized parts
– Quartiles: 4 equal parts
– Deciles: 10 equal parts
– Percentiles: 100 equal parts
First 25% Second 25% Third 25% Fourth 25%
Q1 Q2 Q3
Q1:first quartile
Q2:second quartile = median
Q3:third quartile
Example:
Given the following data set (age of patients):-
18,59,24,42,21,23,24,32 find the third quartile
Solution:
sort the data from lowest to highest
18 21 23 24 24 32 42 59
3rd quartile = {3/4 (n+1)}th observation = (6.75)th
observation
= 32 + (42-32)x .75 = 39.5
Mekele University: Biostatistics 68
Measures of Dispersion
Mekele University: Biostatistics 70
key words:
Descriptive Statistic, measure of dispersion ,
range ,variance, coefficient of variation.
Mekele University: Biostatistics 71
Measures of Dispersion:
• A measure of dispersion conveys information regarding the
amount of variability present in a set of data.
• Note:
1. If all the values are the same
→ There is no dispersion .
2. If all the values are different
→ There is a dispersion:
3.If the values close to each other
→The amount of Dispersion is small.
b) If the values are widely scattered
→ The Dispersion is greater.
Mekele University: Biostatistics 72
Example
• ** Measures of Dispersion are :
1.Range (R).
2. Variance.
3. Standard deviation.
4.Coefficient of variation (C.V).
Mekele University: Biostatistics 73
1.The Range (R):
• Range =Largest value- Smallest value
=
• Note:
– Range concern only onto two values
– Highly sensitive to outliers
– Data: 43,66,61,64,65,38,59,57,57,50.
• Find Range? Range=66-38=28
• Inter-quartile range
– 3rd quartile – 1st quartile (75th – 25th percentile)
– Robust to outliers
– Middle 50% of observations
SL xx 
Mekele University: Biostatistics 74
2.The Variance:
• It measure dispersion relative to the scatter of the values a
bout their mean.
a) Sample Variance ( ) :
• ,where is sample mean
• Find Sample Variance of ages , = 56
• Solution:
• S2= [(43-56) 2 +(66-56) 2+…..+(50-56) 2 ]/ 10
• = 900/10 = 90
x
2
S
1
)(
1
2
2




n
xx
S
n
i
i
x
Mekele University: Biostatistics 75
• b)Population Variance ( ) :
where , is Population mean
3.The Standard Deviation:
• is the square root of variance=
a) Sample Standard Deviation = S =
b) Population Standard Deviation = σ =
2

N
x
N
i
i

 1
2
2
)( 

Varince
2
S
2


STANDARD DEVIATION SD
7 7
7 7 7
7
7 8
7 7 7
6
3 2
7 8 13
9
Mean = 7
SD=0
Mean = 7
SD=0.63
Mean = 7
SD=4.04
76Mekele University: Biostatistics
Standard deviation
Caution must be exercised when using standard
deviation as a comparative index of dispersion
Weights of newborn
elephants (kg)
929 853
878 939
895 972
937 841
801 826
Weights of newborn
mice (kg)
0.72 0.42
0.63 0.31
0.59 0.38
0.79 0.96
1.06 0.89
n=10 =887.1
sd = 56.50
X n=10 = 0.68
sd = 0.255
X
Incorrect to say that elephants show greater variation for birth-
weights than mice because of higher standard deviation77Mekele University: Biostatistics
Mekele University: Biostatistics 78
4.The Coefficient of Variation (C.V):
• Is a measure use to compare the dispersion in
two sets of data which is independent of the
unit of the measurement .
•
where S: Sample standard deviation.
: Sample mean.
)100(.
X
S
VC 
X
Coefficient of variance
Coefficient of variance expresses standard deviation
relative to its mean
X
s
cv 
Weights of newborn
elephants (kg)
929 853
878 939
895 972
937 841
801 826
Weights of newborn
mice (kg)
0.72 0.42
0.63 0.31
0.59 0.38
0.79 0.96
1.06 0.89
n=10, = 887.1
s = 56.50 cv = 0.0637
X n=10, = 0.68
s = 0.255 cv = 0.375
X
Mice show
greater birth-
weight
variation
79Mekele University: Biostatistics
Mekele University: Biostatistics 80
Example:
• Suppose two samples of human males yield the
following data:
Sampe1 Sample2
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound
Mekele University: Biostatistics 81
• We wish to know which is more variable.
Solution:
• c.v (Sample1)= (10/145)*100= 6.9
• c.v (Sample2)= (10/80)*100= 12.5
• Then age of 11-years old(sample2) is more variation
Mekele University: Biostatistics 82
When to use coefficient of variance
• When comparison groups have very different means
(CV is suitable as it expresses the standard deviation
relative to its corresponding mean)
• When different units of measurement are involved,
e.g. group 1 unit is mm, and group 2 unit is gm (CV is
suitable for comparison as it is unit free)
• In such cases, sd should not be used for comparison
Chapter-4
Elementary Probability and
probability distribution
• Key words:
• Probability, objective Probability, subjective
probability, equally likely Mutually exclusive,
multiplicative rule , Conditional Probability,
independent events
Mekele University: Biostatistics 84
Introduction
• The concept of probability is frequently encountered in
everyday communication. For example, a physician may
say that a patient has a 50-50 chance of surviving a certain
operation.
Another physician may say that she is 95 percent certain
that a patient has a particular disease.
• Most people express probabilities in terms of percentages.
• But, it is more convenient to express probabilities as
fractions. Thus, we may measure the probability of the
occurrence of some event by a number between 0 and 1.
• The more likely the event, the closer the number is to one.
An event that can't occur has a probability of zero, and an
event that is certain to occur has a probability of one.
Mekele University: Biostatistics 85
Two views of Probability objective and
subjective:
• *** Objective Probability
• ** Classical and Relative
• Some definitions:
1.Equally likely outcomes:
Are the outcomes that have the same chance of
occurring.
2.Mutually exclusive:
Two events are said to be mutually exclusive if they
cannot occur simultaneously such that A B =Φ .
Mekele University: Biostatistics 86

• The universal Set (S): The set all possible outcomes.
• The empty set Φ : Contain no elements.
• The event ,E : is a set of outcomes in S which has a
certain characteristic.
• Classical Probability : If an event can occur in N
mutually exclusive and equally likely ways, and if m
of these possess a triat, E, the probability of the
occurrence of event E is equal to m/ N .
• For Example: in the rolling of the die , each of the six
sides is equally likely to be observed . So, the
probability that a 4 will be observed is equal to 1/6.
Mekele University: Biostatistics 87
• Relative Frequency Probability:
• Def: If some posses is repeated a large number of
times, n, and if some resulting event E occurs m
times , the relative frequency of occurrence of E ,
m/n will be approximately equal to probability of E .
P(E) = m/n .
• *** Subjective Probability :
• Probability measures the confidence that a particular
individual has in the truth of a particular proposition.
• For Example : the probability that a cure for cancer
will be discovered within the next 10 years.
Mekele University: Biostatistics 88
Elementary Properties of Probability:
• Given some process (or experiment ) with n
mutually exclusive events E1, E2, E3,…………,
En, then
1. P(Ei ) ≥ 0, i= 1,2,3,……n
2. P(E1 )+ P(E2) +……+P(En )=1
3. P(Ei +EJ )= P(Ei )+ P(EJ ), Ei ,EJ are mutually
exclusive
Mekele University: Biostatistics 89
Rules of Probability
1-Addition Rule
P(A U B)= P(A) + P(B) – P (A∩B )
2- If A and B are mutually exclusive (disjoint) ,then
P (A∩B ) = 0
Then , addition rule is
P(A U B)= P(A) + P(B) .
3- Complementary Rule
P(A' )= 1 – P(A)
where, A' = complement event
Mekele University: Biostatistics 90
Example
TotalLater >18
(L)
Early = 18
(E)
Family history of
Mood Disorders
633528Negative(A)
573819Bipolar
Disorder(B)
854441Unipolar (C)
1136053Unipolar and
Bipolar(D)
318177141Total
Mekele University: Biostatistics 91
**Answer the following questions:
Suppose we pick a person at random from this sample.
1-The probability that this person will be 18-years old or younger?
2-The probability that this person has family history of mood orders
Unipolar(C)?
3-The probability that this person has no family history of mood
orders Unipolar( )?
4-The probability that this person is 18-years old or younger or has
no family history of mood orders Negative (A)?
5-The probability that this person is more than18-years old and has
family history of mood orders Unipolar and Bipolar(D)?
Mekele University: Biostatistics 92
C
Solution:
1. P(E)=141/318
2. P(C)=41/318
3. P( )= 1-P(C)=1-41/318
4. P(E U A)=P(E)+P(A)-P(E n A)
= (141/318) + (63/318) - 28/318
=141/318
5. P(L n D) = 60/318
C
Conditional Probability:
P(AB) is the probability of A assuming that B has
happened.
• P(AB)= , P(B)≠ 0
• P(BA)= , P(A)≠ 0
)(
)(
BP
BAP 
)(
)(
AP
BAP 
Mekele University: Biostatistics 94
Example
From previous example , answer
• suppose we pick a person at random and find he is 18
years or younger (E),what is the probability that this
person will be one who has no family history of mood
disorders (A)?
• Solution:
• P(A/E)=28/141, P(E)=141/318, P(AnE)=(28/318)
Mekele University: Biostatistics 95
exercise
• suppose we pick a person at random and
find he has family history of mood (D) what
is the probability that this person will be 18
years or younger (E)?
Mekele University: Biostatistics 96
Multiplicative Rule:
• P(A∩B)= P(AB)P(B)
• P(A∩B)= P(BA)P(A)
Where,
• P(A): marginal probability of A.
• P(B): marginal probability of B.
• P(BA):The conditional probability.
Mekele University: Biostatistics 97
Independent Events:
• If A has no effect on B, we said that A,B are
independent events.
• Then,
1- P(A∩B)= P(B)P(A)
2- P(AB)=P(A)
3- P(BA)=P(B)
Mekele University: Biostatistics 98
Example
• In a certain high school class consisting of 60 girls
and 40 boys, it is observed that 24 girls and 16 boys
wear eyeglasses . If a student is picked at random
from this class ,the probability that the student
wears eyeglasses , P(E), is 40/100 or 0.4 .
• What is the probability that a student picked at
random wears eyeglasses given that the student is a
boy?
• What is the probability of the joint occurrence of the
events of wearing eye glasses and being a boy?
Mekele University: Biostatistics 99
Example
• Suppose that of 1200 admission to a general
hospital during a certain period of time,750 are
private admissions. If we designate these as a set A,
then compute P(A) , P( ).A
Mekele University: Biostatistics 100
The Random Variable (X):
• When the values of a variable (height, weight, or
age) can’t be predicted in advance, the variable is
called a random variable.
• An example is the adult height.
• When a child is born, we can’t predict exactly his
or her height at maturity.
Mekele University: Biostatistics 101
4.2 Probability Distributions for Discrete
Random Variables
• Definition:
• The probability distribution of a discrete
random variable is a table, graph,
formula, or other device used to specify
all possible values of a discrete random
variable along with their respective
probabilities.
Mekele University: Biostatistics 102
The Cumulative Probability Distribution of X,
F(x):
• It shows the probability that the variable X is
less than or equal to a certain value, P(X  x).
Mekele University: Biostatistics 103
Mekele University: Biostatistics 104
Example :
F(x)=
P(X≤ x)
P(X=x)frequencyNumber of
Programs
0.20880.2088621
0.36700.1582472
0.49830.1313393
0.62960.1313394
0.82490.1953585
0.94950.1246376
0.96300.013547
1.00000.0370118
1.0000297Total
• Properties of probability distribution of discrete
random variable.
1.
2.
3. P(a  X  b) = P(X  b) – P(X  a-1)
4. P(X < b) = P(X  b-1)
Mekele University: Biostatistics 105
0 ( ) 1P X x  
( ) 1P X x 
4.3 The Binomial Distribution:
• It is derived from a process known as a Bernoulli
trial.
• Bernoulli trial is :
When a random process or experiment called a
trial can result in only one of two mutually
exclusive outcomes, such as dead or alive, sick or
well, the trial is called a Bernoulli trial.
Mekele University: Biostatistics 106
The Bernoulli Process
• A sequence of Bernoulli trials forms a Bernoulli
process under the following conditions
1- Each trial results in one of two possible, mutually
exclusive, outcomes. One of the possible outcomes
is denoted (arbitrarily) as a success, and the other is
denoted a failure.
2- The probability of a success, denoted by p, remains
constant from trial to trial. The probability of a
failure, 1-p, is denoted by q.
3- The trials are independent, that is the outcome of
any particular trial is not affected by the outcome of
any other trial
Mekele University: Biostatistics 107
• The probability distribution of the binomial
random variable X, the number of successes in
n independent trials is:
• Where is the number of combinations of
n distinct objects taken x of them at a time.
* Note: 0! =1 Mekele University: Biostatistics 108
( ) ( ) , 0,1,2,....,X n X
n
f x P X x p q x n
x

 
     
 
n
x
 
  
 
!
!( )!
n n
x n xx
 
    
! ( 1)( 2)....(1)x x x x  
Properties of the binomial distribution
• 1.
• 2.
• 3.The parameters of the binomial distribution
are n and p
• 4.
• 5.
Mekele University: Biostatistics 109
( ) 0f x 
( ) 1f x 
( )E X np  
2
var( ) (1 )X np p   
Example
• If we examine all birth records from the North
Carolina State Center for Health statistics for year
2001, we find that 85.8 percent of the pregnancies
had delivery in week 37 or later (full- term birth).
If we randomly selected five birth records from this
population what is the probability that exactly three
of the records will be for full-term births?
Mekele University: Biostatistics 110
Example
• Suppose it is known that in a certain
population 10 percent of the population is
color blind. If a random sample of 25 people is
drawn from this population, find the
probability that
a) Five or fewer will be color blind.
b) Six or more will be color blind
c) Between six and nine inclusive will be color
blind.
d) Two, three, or four will be color blind.
Mekele University: Biostatistics 111
Properties of continuous probability Distributions:
*continuous variable is one that can assume any
value within a specified interval of values
assumed by the variable.
1- Area under the curve = 1.
2- P(X = a) = 0, where a is a constant.
3- Area between two points a , b = P(a<x<b) .
Mekele University: Biostatistics 112
4.6 The normal distribution:
• It is one of the most important probability
distributions in statistics.
• The normal density is given by
• , - ∞ < x < ∞, - ∞ < µ < ∞, σ > 0
• π, e : constants
• µ: population mean.
• σ : Population standard deviation.
Mekele University: Biostatistics 113
2
2
2
)(
2
1
)( 





x
exf
Characteristics of the normal distribution
• The following are some important characteristics
of the normal distribution:
1- It is symmetrical about its mean, µ.
2- The mean, the median, and the mode are all
equal.
3- The total area under the curve above the x-axis is
one.
4-The normal distribution is completely determined
by the parameters µ and σ.
Mekele University: Biostatistics 114
5- The normal distribution
depends on the two
parameters  and .
 determines the
location of
the curve.
But,  determines
the scale of the curve, i.e.
the degree of flatness or
peaked ness of the curve.
Mekele University: Biostatistics 115
1 2 3
1 < 2 < 3

1
2
3
1 < 2 < 3
Note that :
1. P( µ- σ < x < µ+ σ) = 0.68
2. P( µ- 2σ< x < µ+ 2σ)= 0.95
3. P( µ-3σ < x < µ+ 3σ) = 0.997
Mekele University: Biostatistics 116
The Standard normal distribution:
• Is a special case of normal distribution with mean
equal 0 and a standard deviation of 1.
• The equation for the standard normal distribution is
written as
• , - ∞ < z < ∞
Mekele University: Biostatistics 117
2
2
2
1
)(
z
ezf



Characteristics of the standard normal
distribution
1- It is symmetrical about 0.
2- The total area under the curve above the x-
axis is one.
3- We can use table D to find the probabilities
and areas.
Mekele University: Biostatistics 118
“How to use tables of Z”
Note that
The cumulative probabilities P(Z  z) are given in
tables for -3.49 < z < 3.49. Thus,
P (-3.49 < Z < 3.49)  1.
For standard normal distribution,
P (Z > 0) = P (Z < 0) = 0.5
Example 4.6.1:
If Z is a standard normal distribution, then
1) P( Z < 2) = 0.9772
is the area to the left to 2
and it equals 0.9772.
Mekele University: Biostatistics 119
2
Example 4.6.2:
P(-2.55 < Z < 2.55) is the area between
-2.55 and 2.55, Then it equals
P(-2.55 < Z < 2.55) =0.9946 – 0.0054
= 0.9892.
Example 4.6.2:
P(-2.74 < Z < 1.53) is the area between
-2.74 and 1.53.
P(-2.74 < Z < 1.53) =0.9370 – 0.0031
= 0.9339.
Mekele University: Biostatistics 120
-2.74 1.53
-2.55 2.55
0
INTRODUCTION TO BIO STATISTICS
Example :
P(Z > 2.71) is the area to the right to 2.71.
So,
P(Z > 2.71) =1 – 0.9966 = 0.0034.
Example :
P(Z = 0.84) is the area at z = 2.71.
So,
P(Z = 0.84) =1 – 0.9966 = 0.0034
Mekele University: Biostatistics 122
0.84
2.71
How to transform normal distribution (X) to
standard normal distribution (Z)?
• This is done by the following formula:
• Example:
• If X is normal with µ = 3, σ = 2. Find the value of
standard normal Z, If X= 6?
• Answer:
Mekele University: Biostatistics 123



x
z
5.1
2
36






x
z
Normal Distribution Applications
The normal distribution can be used to model the distribution of
many variables that are of interest. This allow us to answer
probability questions about these random variables.
Example 4.7.1:
The „Uptime ‟is a custom-made light weight battery-operated
activity monitor that records the amount of time an individual
spend the upright position. In a study of children ages 8 to 15
years. The researchers found that the amount of time children
spend in the upright position followed a normal distribution with
Mean of 5.4 hours and standard deviation of 1.3.Find
Mekele University: Biostatistics 124
If a child selected at random ,then
1-The probability that the child spend less than 3
hours in the upright position 24-hour period
P( X < 3) = P( < ) = P(Z < -1.85) = 0.0322
-------------------------------------------------------------------------
2-The probability that the child spend more than 5
hours in the upright position 24-hour period
P( X > 5) = P( > ) = P(Z > -0.31)
= 1- P(Z < - 0.31) = 1- 0.3520= 0.648
-----------------------------------------------------------------------
3-The probability that the child spend exactly 6.2
hours in the upright position 24-hour period
P( X = 6.2) = 0

X
3.1
4.53
Mekele University: Biostatistics 125

X
3.1
4.55 
4-The probability that the child spend from 4.5 to
7.3 hours in the upright position 24-hour period
P( 4.5 < X < 7.3) = P( < < )
= P( -0.69 < Z < 1.46 ) = P(Z<1.46) – P(Z< -0.69)
= 0.9279 – 0.2451 = 0.6828

X
3.1
4.55.4 
Mekele University: Biostatistics 126
3.1
4.53.7 
Estimation
• Key words:
• Point estimate, interval estimate, estimator,
Confident level ,α , Confident interval for mean μ,
Confident interval for two means,
Confident interval for population proportion P,
Confident interval for two proportions
Mekele University: Biostatistics 128
• 6.1 Introduction:
• Statistical inference is the procedure by which we reach to
a conclusion about a population on the basis of the
information contained in a sample drawn from that
population.
• Suppose that:
• an administrator of a large hospital is interested in the
mean age of patients admitted to his hospital during a
given year.
1. It will be too expensive to go through the records of all
patients admitted during that particular year.
2. He consequently elects to examine a sample of the records
from which he can compute an estimate of the mean age
of patients admitted to his that year.
Mekele University: Biostatistics 129
• To any parameter, we can compute two types of estimate: a
point estimate and an interval estimate.
• A point estimate is a single numerical value used to estimate
the corresponding population parameter.
• An interval estimate consists of two numerical values defining
a range of values that, with a specified degree of confidence,
we feel includes the parameter being estimated.
• The Estimate and The Estimator:
• The estimate is a single computed value, but the estimator is
the rule that tell us how to compute this value, or estimate.
• For example,
• is an estimator of the population mean,. The single
numerical value that results from evaluating this
formula is called an estimate of the parameter .
n
x
x i
i

Mekele University: Biostatistics 130
Confidence Interval for a Population
Mean: (C.I)
Suppose researchers wish to estimate the mean of
some normally distributed population.
• They draw a random sample of size n from the
population and compute , which they use as a point
estimate of .
• Because random sampling involves chance, then
can‟t be expected to be equal to .
• The value of may be greater than or less than .
• It would be much more meaningful to estimate  by
an interval.
x
Mekele University: Biostatistics 131
x
The 1- percent confidence interval (C.I.)
for :
• We want to find two values L and U between which  lies with
high probability, i.e.
P( L ≤  ≤ U ) = 1-
Mekele University: Biostatistics 132
For example:
• When,
•  = 0.01,
then 1-  =
•  = 0.05,
then 1-  =
•  = 0.05,
then 1-  =
Mekele University: Biostatistics 133
INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS
We have the following cases
a) When the population is normal
1) When the variance is known and the sample size is large or small,
the C.I. has the form:
P( - Z (1- /2) /n <  < + Z (1- /2) /n) = 1- 
2) When variance is unknown, and the sample size is small, the C.I. has
the form:
P( - t (1- /2),n-1 s/n <  < + t (1- /2),n-1 s/n) = 1- 
x x
Mekele University: Biostatistics 136
xx
b) When the population is not normal and n
large (n>30)
1) When the variance is known the C.I. has the
form:
P( - Z (1- /2) /n <  < + Z (1- /2) /n) = 1- 
2) When variance is unknown, the C.I. has the
form:
P( - Z (1- /2) s/n <  < + Z (1- /2) s/n) = 1- 
x x
Mekele University: Biostatistics 137
x x
Example:
• Suppose a researcher , interested in obtaining an estimate
of the average level of some enzyme in a certain human
population, takes a sample of 10 individuals, determines
the level of the enzyme in each, and computes a sample
mean of approximately
Suppose further it is known that the variable of
interest is approximately normally distributed with a
variance of 45. We wish to estimate . (=0.05)
22x
Mekele University: Biostatistics 138
Solution:
• 1- =0.95→ =0.05→ /2=0.025,
• variance = σ2 = 45 → σ= 45,n=10
• 95%confidence interval for  is given by:
P( - Z (1- /2) /n <  < + Z (1- /2) /n) = 1- 
• Z (1- /2) = Z 0.975 = 1.96 (refer to table D)
• Z 0.975(/n) =1.96 ( 45 / 10)=4.1578
22 ± 1.96 ( 45 / 10) →
• (22-4.1578, 22+4.1578) → (17.84, 26.16)
• Exercise example 6.2.2 page 169
22x
x
Mekele University: Biostatistics 139
x
Example
The activity values of a certain enzyme measured in normal
gastric tissue of 35 patients with gastric carcinoma has a
mean of 0.718 and a standard deviation of 0.511.We want
to construct a 90 % confidence interval for the population
mean.
• Solution:
• Note that the population is not normal,
• n=35 (n>30) n is large and  is unknown ,s=0.511
• 1- =0.90→ =0.1
• → /2=0.05→ 1-/2=0.95,
Mekele University: Biostatistics 140
Then 90% confident interval for  is given by :
P( - Z (1- /2) s/n <  < + Z (1- /2) s/n) = 1- 
• Z (1- /2) = Z0.95 = 1.645 (refer to table D)
• Z 0.95(s/n) =1.645 (0.511/ 35)=0.1421
0.718 ± 1.645 (0.511) / 35→
(0.718-0.1421, 0.718+0.1421) →
(0.576,0.860).
• Exercise example 6.2.3 page 164:
xx
Mekele University: Biostatistics 141
Example6.3.1 Page 174:
• Suppose a researcher , studied the effectiveness of early
weight bearing and ankle therapies following acute repair
of a ruptured Achilles tendon. One of the variables they
measured following treatment the muscle strength. In 19
subjects, the mean of the strength was 250.8 with
standard deviation of 130.9
we assume that the sample was taken from is
approximately normally distributed population. Calculate
95% confident interval for the mean of the strength ?
Mekele University: Biostatistics 142
Solution:
• 1- =0.95→ =0.05→ /2=0.025,
• Standard deviation= S = 130.9 ,n=19
• 95%confidence interval for  is given by:
P( - t (1- /2),n-1 s/n <  < + t (1- /2),n-1 s/n) = 1- 
• t (1- /2),n-1 = t 0.975,18 = 2.1009 (refer to table E)
• t 0.975,18(s/n) =2.1009 (130.9 / 19)=63.1
• 250.8 ± 2.1009 (130.9 / 19) →
• (250.8- 63.1 , 22+63.1) → (187.7, 313.9)
• Exercise 6.2.1 ,6.2.2
• 6.3.2 page 171
8.250x
x
Mekele University: Biostatistics 143
x
6.3 Confidence Interval for the difference
between two Population Means: (C.I)
If we draw two samples from two independent population
and we want to get the confident interval for the
difference between two population means , then we have
the following cases :
a) When the population is normal
1) When the variance is known and the sample sizes is large
or small, the C.I. has the form:
Mekele University: Biostatistics 144
2
2
2
1
2
1
2
1
2121
2
2
2
1
2
1
2
1
21 )()(
nn
Zxx
nn
Zxx



 

2) When variances are unknown but equal, and the sample size is
small, the C.I. has the form:
2
)1()1(
11
)(
11
)(
21
2
22
2
112
21
)2(,
2
1
2121
21
)2(,
2
1
21
2121





nn
SnSn
S
where
nn
Stxx
nn
Stxx
p
p
nn
p
nn
 
Mekele University: Biostatistics 145
Example 6.4.1 P174:
The researcher team interested in the difference between serum uric
and acid level in a patient with and without Down‟s syndrome .In a
large hospital for the treatment of the mentally retarded, a sample of
12 individual with Down‟s Syndrome yielded a mean of
mg/100 ml. In a general hospital a sample of 15 normal individual of
the same age and sex were found to have a mean value of
If it is reasonable to assume that the two population of values are
normally distributed with variances equal to 1 and 1.5,find the 95%
C.I for μ1 - μ2
Solution:
1- =0.95→ =0.05→ /2=0.025 → Z (1- /2) = Z0.975 = 1.96
• 1.1 1.96(0.4282) = 1.1 0.84 = ( 0.26 , 1.94 )
5.41 x
4.32 x
Mekele University: Biostatistics 146
2
2
2
1
2
1
2
1
21 )(
nn
Zxx

 
 15
5.1
12
1
96.1)4.35.4( 
Example 6.4.1 P178:
The purpose of the study was to determine the effectiveness of an
integrated outpatient dual-diagnosis treatment program for
mentally ill subject. The authors were addressing the problem of substance abuse
issues among people with sever mental disorder. A retrospective chart review was
carried out on 50 patient ,the recherché was interested in the number of inpatient
treatment days for physics disorder during a year following the end of the program.
Among 18 patient with schizophrenia, The mean number of treatment days was 4.7
with standard deviation of 9.3. For 10 subject with bipolar disorder, the mean
number of treatment days was 8.8 with standard deviation of 11.5. We wish to
construct 99% C.I for the difference between the means of the populations
Represented by the two samples
Mekele University: Biostatistics 147
Solution :
• 1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995
• n2 – 2 = 18 + 10 -2 = 26+n1
• t (1- /2),(n1+n2-2) = t0.995,26 = 2.7787, then 99% C.I for μ1 – μ2
• where
•
• then
(4.7-8.8) 2.7787 √102.33 √(1/18)+(1/10)
- 4.1 11.086 =( - 15.186 , 6.986)
Exercises: 6.4.2 , 6.4.6, 6.4.7, 6.4.8 Page 180
Mekele University: Biostatistics 148
21
)2(,
2
1
21
11
)(
21 nn
Stxx p
nn



33.102
21018
)5.119()3.917(
2
)1()1( 22
21
2
22
2
112







xx
nn
SnSn
Sp
6.5 Confidence Interval for a Population
proportion (P):
A sample is drawn from the population of interest ,then
compute the sample proportion such as
This sample proportion is used as the point estimator of the
population proportion . A confident interval is obtained by
the following formula
Pˆ
n
a
p 
samplein theelementofno.Total
isticcharachtarsomewithsamplein theelementofno.
ˆ
Mekele University: Biostatistics 149
n
PP
ZP
)ˆ1(ˆ
ˆ
2
1




Example 6.5.1
The Pew internet life project reported in 2003 that 18%
of internet users have used the internet to search for
information regarding experimental treatments or
medicine . The sample consist of 1220 adult internet
users, and information was collected from telephone
interview. We wish to construct 98% C.I for the
proportion of internet users who have search for
information about experimental treatments or medicine
Mekele University: Biostatistics 150
Solution :
1-α =0.98 → α = 0.02 → α/2 =0.01 → 1- α/2 = 0.99
Z 1- α/2 = Z 0.99 =2.33 , n=1220,
The 98% C. I is
0.18 0.0256 = ( 0.1544 , 0.2056 )
Exercises: 6.5.1 , 6.5.3 Page 187
18.0
100
18
ˆ p
1220
)18.01(18.0
33.218.0
)ˆ1(ˆ
ˆ
2
1




 n
PP
ZP 
Mekele University: Biostatistics 151
Confidence Interval for the difference
between two Population proportions :
Two samples is drawn from two independent population
of interest ,then compute the sample proportion for each
sample for the characteristic of interest. An unbiased
point estimator for the difference between two population
proportions
A 100(1-α)% confident interval for P1 - P2 is given by
21
ˆˆ PP 
Mekele University: Biostatistics 152
2
22
1
11
2
1
21
)ˆ1(ˆ)ˆ1(ˆ
)ˆˆ(
n
PP
n
PP
ZPP






Example
Connor investigated gender differences in proactive and
reactive aggression in a sample of 323 adults (68 female
and 255 males ). In the sample ,31 of the female and 53
of the males were using internet in the internet café. We
wish to construct 99 % confident interval for the
difference between the proportions of adults go to
internet café in the two sampled population .
Mekele University: Biostatistics 153
Solution :
1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995
Z 1- α/2 = Z 0.995 =2.58 , nF=68, nM=255,
The 99% C. I is
0.2481 2.58(0.0655) = ( 0.07914 , 0.4171 )
2078.0
255
53
ˆ,4559.0
68
31
ˆ 
M
M
M
F
F
F n
a
p
n
a
p
M
MM
F
FF
MF
n
PP
n
PP
ZPP
)ˆ1(ˆ)ˆ1(ˆ
)ˆˆ(
2
1






Mekele University: Biostatistics 154
255
)2078.01(2078.0
68
)4559.01(4559.0
58.2)2078.04559.0(




Chapter-8
Hypothesis Testing
Mekele University: Biostatistics 156
• Key words :
• Null hypothesis H0, Alternative hypothesis HA , testing
hypothesis , test statistic , P-value
Mekele University: Biostatistics 157
Hypothesis Testing
• One type of statistical inference, estimation,
was discussed previously.
• The other type ,hypothesis testing ,is
discussed in this session.
Mekele University: Biostatistics 158
Definition of a hypothesis
• It is a statement about one or more populations
.
It is usually concerned with the parameters of
the population. e.g. the hospital administrator
may want to test the hypothesis that the
average length of stay of patients admitted to
the hospital is 5 days
Mekele University: Biostatistics 159
Definition of Statistical hypotheses
• They are hypotheses that are stated in such a way
that they may be evaluated by appropriate statistical
techniques.
• There are two hypotheses involved in hypothesis
testing
• Null hypothesis H0: It is the hypothesis to be tested .
• Alternative hypothesis HA : It is a statement of what
we believe is true if our sample data cause us to
reject the null hypothesis
Mekele University: Biostatistics 160
Testing a hypothesis about the mean of a
population:
• We have the following steps:
1.Data: determine variable, sample size (n), sample
mean( ) , population standard deviation or
sample standard deviation (s) if is unknown
2. Assumptions : We have two cases:
• Case1: Population is normally or approximately
normally distributed with known or unknown
variance (sample size n may be small or large),
• Case 2: Population is not normal with known or
unknown variance (n is large i.e. n≥30).
x
Mekele University: Biostatistics 161
• 3.Hypotheses:
• we have three cases
• Case I : H0: μ=μ0
HA: μ μ0
• e.g. we want to test that the population mean is
different than 50
• Case II : H0: μ = μ0
HA: μ > μ0
• e.g. we want to test that the population mean is greater
than 50
• Case III : H0: μ = μ0
HA: μ< μ0
• e.g. we want to test that the population mean is less
than 50

Mekele University: Biostatistics 162
4.Test Statistic:
• Case 1: population is normal or approximately normal
σ2
is known σ2
is unknown
( n large or small)
n large n small
• Case2: If population is not normally distributed and n is
large
• i)If σ2
is known ii) If σ2
is unknown
n
X
Z

o-

n
s
X
Z o- 

n
s
X
T o- 

n
s
X
Z o- 

n
X
Z

o-

Mekele University: Biostatistics 163
5.Decision Rule:
i) If HA: μ μ0
• Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
(when use Z - test)
Or Reject H 0 if T >t1-α/2,n-1 or T< - t1-α/2,n-1
(when use T- test)
• ii) If HA: μ> μ0
• Reject H0 if Z>Z1-α (when use Z - test)
Or Reject H0 if T>t1-α,n-1 (when use T - test)

Mekele University: Biostatistics 164
• iii) If HA: μ< μ0
Reject H0 if Z< - Z1-α (when use Z - test)
• Or
Reject H0 if T<- t1-α,n-1 (when use T - test)
Note:
Z1-α/2 , Z1-α , Zα are tabulated values obtained
from table
t1-α/2 , t1-α , tα are tabulated values obtained from
table E with (n-1) degree of freedom (df)
Mekele University: Biostatistics 165
• 6.Decision :
• If we reject H0, we can conclude that HA is
true.
• If ,however ,we do not reject H0, we may
conclude that H0 is true.
Mekele University: Biostatistics 166
An Alternative Decision Rule using the
p - value Definition
• The p-value is defined as the smallest value of
α for which the null hypothesis can be
rejected.
• If the p-value is less than or equal to α ,we
reject the null hypothesis (p ≤ α)
• If the p-value is greater than α ,we do not
reject the null hypothesis (p > α)
Mekele University: Biostatistics 167
Example
• Researchers are interested in the mean age of a
certain population.
• A random sample of 10 individuals drawn from the
population of interest has a mean of 27.
• Assuming that the population is approximately
normally distributed with variance 20,can we
conclude that the mean is different from 30 years ?
(α=0.05) .
• If the p - value is 0.0340 how can we use it in making
a decision?
Mekele University: Biostatistics 168
Solution
1-Data: variable is age, n=10, =27 ,σ2=20,α=0.05
2-Assumptions: the population is approximately
normally distributed with variance 20
3-Hypotheses:
• H0 : μ=30
• HA: μ 30
x

Mekele University: Biostatistics 169
4-Test Statistic:
Z = -2.12
5.Decision Rule
• The alternative hypothesis is
• HA: μ > 30
• Hence we reject H0 if Z >Z1-0.025/2= Z0.975
• or Z< - Z1-0.025/2= - Z0.975
• Z0.975=1.96(from table )
Mekele University: Biostatistics 170
• 6.Decision:
• We reject H0 ,since -2.12 is in the rejection
region .
• We can conclude that μ is not equal to 30
• Using the p value ,we note that p-value
=0.0340< 0.05,therefore we reject H0
Mekele University: Biostatistics 171
Example
• Referring to previous example.Suppose that
the researchers have asked: Can we
conclude that μ<30.
1.Data.see previous example
2. Assumptions .see previous example
3.Hypotheses:
• H0 μ =30
• HA: μ < 30
Mekele University: Biostatistics 172
4.Test Statistic :
• = = -2.12
5. Decision Rule: Reject H0 if Z< Z α, where
• Z α= -1.645. (from table)
6. Decision: Reject H0 ,thus we can conclude that the
population mean is smaller than 30.
n
X
Z

o-

10
20
3027 
Mekele University: Biostatistics 173
Example
• Among 157 African-American men ,the mean
systolic blood pressure was 146 mm Hg with a
standard deviation of 27. We wish to know if
on the basis of these data, we may conclude
that the mean systolic blood pressure for a
population of African-American is greater than
140. Use α=0.01.
Mekele University: Biostatistics 174
Solution
1. Data: Variable is systolic blood pressure,
n=157 , x=146, s=27, α=0.01.
2. Assumption: population is not normal, σ2 is
unknown
3. Hypotheses: H0 :μ=140
HA: μ>140
4.Test Statistic:
= = = 2.78
n
s
X
Z o- 

157
27
140146 
1548.2
6
Mekele University: Biostatistics 175
5. Desicion Rule:
we reject H0 if Z>Z1-α
= Z0.99= 2.33
(from table D)
6. Desicion: We reject H0.
Hence we may conclude that the mean systolic
blood pressure for a population of African-
American is greater than 140.
Mekele University: Biostatistics 176
Hypothesis Testing :The Difference between
two population mean :
• We have the following steps:
1.Data: determine variable, sample size (n), sample means,
population standard deviation or samples standard
deviation (s) if is unknown for two population.
2. Assumptions : We have two cases:
• Case1: Population is normally or approximately normally
distributed with known or unknown variance (sample size
n may be small or large),
• Case 2: Population is not normal with known variances (n is
large i.e. n≥30).
Mekele University: Biostatistics 177
• 3.Hypotheses:
• we have three cases
• Case I : H0: μ 1 = μ2 → μ 1 - μ2 = 0
• HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0
• e.g. we want to test that the mean for first population is
different from second population mean.
• Case II : H0: μ 1 = μ2 → μ 1 - μ2 = 0
HA: μ 1 > μ 2 →μ 1 - μ 2 > 0
• e.g. we want to test that the mean for first population is
greater than second population mean.
• Case III : H0: μ 1 = μ2 → μ 1 - μ2 = 0
HA: μ 1 < μ 2 → μ 1 - μ 2 < 0
• e.g. we want to test that the mean for first population
is greater than second population mean.
Mekele University: Biostatistics 178
4.Test Statistic:
• Case 1: Two population is normal or approximately
normal
σ2
is known σ2
is unknown if
( n1 ,n2 large or small) ( n1 ,n2 small)
population population Variances
Variances equal not equal
where
2
2
2
1
2
1
2121 )(-)X-X(
nn
Z





21
2121
11
)(-)X-X(
nn
S
T
p 



2
2
2
1
2
1
2121 )(-)X-X(
n
S
n
S
T




2
)1(n)1(n
21
2
22
2
112



nn
SS
Sp
Mekele University: Biostatistics 179
• Case2: If population is not normally distributed
• and n1, n2 is large(n1 ≥ 0 ,n2≥ 0)
• and population variances is known,
2
2
2
1
2
1
2121 )(-)X-X(
nn
Z





Mekele University: Biostatistics 180
5.Decision Rule:
i) If HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0
• Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
(when use Z - test)
Or Reject H 0 if T >t1-α/2 ,(n1+n2 -2) or T< - t1-α/2,,(n1+n2 -2)
(when use T- test)
• __________________________
• ii) HA: μ 1 > μ 2 → μ 1 - μ 2 > 0
• Reject H0 if Z>Z1-α (when use Z - test)
Or Reject H0 if T>t1-α,(n1+n2 -2) (when use T - test)
Mekele University: Biostatistics 181
• iii) If HA: μ 1 < μ 2 → μ 1 - μ 2 < 0 Reject H0
if Z< - Z1-α (when use Z - test)
• Or
Reject H0 if T<- t1-α, ,(n1+n2 -2) (when use T - test)
Note:
Z1-α/2 , Z1-α , Zα are tabulated values obtained
from table D
t1-α/2 , t1-α , tα are tabulated values obtained from
table E with (n1+n2 -2) degree of freedom (df)
6. Conclusion: reject or fail to reject H0
Mekele University: Biostatistics 182
Example
• Researchers wish to know if the data have collected provide
sufficient evidence to indicate a difference in mean serum
uric acid levels between normal individuals and individual
with Down‟s syndrome. The data consist of serum uric
reading on 12 individuals with Down‟s syndrome from
normal distribution with variance 1 and 15 normal individuals
from normal distribution with variance 1.5 . The mean are
and α=0.05.
Solution:
1. Data: Variable is serum uric acid levels, n1=12 , n2=15,
σ2
1=1, σ2
2=1.5 ,α=0.05.
100/5.41 mgX  100/4.32 mgX 
Mekele University: Biostatistics 183
2. Assumption: Two population are normal, σ2
1 , σ2
2
are known
3. Hypotheses: H0: μ 1 = μ2 → μ 1 - μ2 = 0
• HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0
4.Test Statistic:
• = = 2.57
5. Desicion Rule:
Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
Z1-α/2= Z1-0.05/2= Z0.975=1.96 (from table D)
6-Conclusion: Reject H0 since 2.57 > 1.96
Or if p-value =0.102→ reject H0 if p < α → then reject H0
2
2
2
1
2
1
2121 )(-)X-X(
nn
Z





15
5.1
12
1
)0(-3.4)-(4.5


Mekele University: Biostatistics 184
Example
The purpose of a study by Tam, was to investigate wheelchair
Maneuvering in individuals with over-level spinal cord injury (SCI)
And healthy control (C). Subjects used a modified a wheelchair to
incorporate a rigid seat surface to facilitate the specified
experimental measurements. The data for measurements of the
left ischial tuerosity for SCI and control C are shown below
16915011488117122131124115131C
14313011912113016318013015060SCI
Mekele University: Biostatistics 185
We wish to know if we can conclude, on the
basis of the above data that the mean of
left ischial tuberosity for control C lower
than mean of left ischial tuerosity for SCI,
Assume normal populations equal
variances. α=0.05, p-value = -1.33
Mekele University: Biostatistics 186
Solution:
1. Data:, nC=10 , nSCI=10, SC=21.8, SSCI=133.1 ,α=0.05.
• , (calculated from data)
2.Assumption: Two population are normal, σ2
1 , σ2
2 are
unknown but equal
3. Hypotheses: H0: μ C = μ SCI → μ C - μ SCI = 0
HA: μ C < μ SCI → μ C - μ SCI < 0
4.Test Statistic:
•
Where,
1.126CX 1.133SCIX
569.0
10
1
10
1
04.756
0)1.1331.126(
11
)(-)X-X(
21
2121







nn
S
T
p

04.756
21010
)3.32(9)8.21(9
2
)1(n)1(n 22
21
2
22
2
112







nn
SS
Sp
Mekele University: Biostatistics 187
5. Decision Rule:
Reject H 0 if T< - T1-α,(n1+n2 -2)
T1-α,(n1+n2 -2) = T0.95,18 = 1.7341 (from table E)
6-Conclusion: Fail to reject H0 since -0.569 < - 1.7341
Or
Fail to reject H0 since p = -1.33 > α =0.05
Mekele University: Biostatistics 188
Example
Dernellis and Panaretou examined subjects with hypertension
and healthy control subjects .One of the variables of interest was
the aortic stiffness index. Measures of this variable were
calculated From the aortic diameter evaluated by M-mode and
blood pressure measured by a sphygmomanometer. Physics wish
to reduce aortic stiffness. In the 15 patients with hypertension
(Group 1),the mean aortic stiffness index was 19.16 with a
standard deviation of 5.29. In the30 control subjects (Group 2),the
mean aortic stiffness index was 9.53 with a standard deviation of
2.69. We wish to determine if the two populations represented by
these samples differ with respect to mean stiffness index .we wish
to know if we can conclude that in general a person with
thrombosis have on the average higher IgG levels than persons
without thrombosis at α=0.01, p-value = 0.0559
Mekele University: Biostatistics 189
Solution:
1. Data:, n1=53 , n2=54, S1= 44.89, S2= 34.85 α=0.01.
2.Assumption: Two population are not normal, σ2
1 , σ2
2
are unknown and sample size large
3. Hypotheses: H0: μ 1 = μ 2 → μ 1 - μ 2 = 0
HA: μ 1 > μ 2 → μ 1 - μ 2 > 0
4.Test Statistic:
•
standard deviationSample
Size
Mean LgG levelGroup
44.895359.01Thrombosis
34.855446.61No
Thrombosis
59.1
54
85.34
53
89.44
0)61.4601.59()(-)X-X(
22
2
2
2
1
2
1
2121







n
S
n
S
Z

Mekele University: Biostatistics 190
5. Decision Rule:
Reject H 0 if Z > Z1-α
Z1-α = Z0.99 = 2.33 (from table D)
6-Conclusion: Fail to reject H0 since 1.59 > 2.33
Or
Fail to reject H0 since p = 0.0559 > α =0.01
Mekele University: Biostatistics 191
Hypothesis Testing A single population
proportion:
• Testing hypothesis about population proportion (P) is carried out
in much the same way as for mean when condition is necessary for
using normal curve are met
• We have the following steps:
1.Data: sample size (n), sample proportion( ) , P0
2. Assumptions :normal distribution ,
pˆ
n
a
p 
samplein theelementofno.Total
isticcharachtarsomewithsamplein theelementofno.
ˆ
Mekele University: Biostatistics 192
• 3.Hypotheses:
• we have three cases
• Case I : H0: P = P0
HA: P ≠ P0
• Case II : H0: P = P0
HA: P > P0
• Case III : H0: P = P0
HA: P < P0
4.Test Statistic:
Where H0 is true ,is distributed approximately as the standard
normal
n
qp
pp
Z
00
0
ˆ 

Mekele University: Biostatistics 193
5.Decision Rule:
i) If HA: P ≠ P0
• Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
• _______________________
• ii) If HA: P> P0
• Reject H0 if Z>Z1-α
• _____________________________
• iii) If HA: P< P0
Reject H0 if Z< - Z1-α
Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from
table D
6. Conclusion: reject or fail to reject H0
Mekele University: Biostatistics 194
2. Assumptions : is approximately normally distributed
3.Hypotheses:
• we have three cases
• H0: P = 0.063
HA: P > 0.063
• 4.Test Statistic :
5.Decision Rule: Reject H0 if Z>Z1-α
Where Z1-α = Z1-0.05 =Z0.95= 1.645
21.1
301
)0.937(063.0
063.008.0ˆ
00
0





n
qp
pp
Z
pˆ
Mekele University: Biostatistics 195
6. Conclusion: Fail to reject H0
Since
Z =1.21 > Z1-α=1.645
Or ,
If P-value = 0.1131,
fail to reject H0 → P > α
Mekele University: Biostatistics 196
Example
Wagen collected data on a sample of 301 Hispanic women
Living in Texas .One variable of interest was the percentage
of subjects with impaired fasting glucose (IFG). In the
study,24 women were classified in the (IFG) stage .The article
cites population estimates for (IFG) among Hispanic women
in Texas as 6.3 percent .Is there sufficient evidence to
indicate that the population Hispanic women in Texas has a
prevalence of IFG higher than 6.3 percent ,let α=0.05
Solution:
1.Data: n = 301, p0 = 6.3/100=0.063 ,a=24,
q0 =1- p0 = 1- 0.063 =0.937, α=0.05
08.0
301
24
ˆ 
n
a
p
Mekele University: Biostatistics 197
Hypothesis Testing :The Difference
between two population proportion:
• Testing hypothesis about two population proportion (P1,, P2 ) is
carried out in much the same way as for difference between two
means when condition is necessary for using normal curve are met
• We have the following steps:
1.Data: sample size (n1 n2), sample proportions( ),
Characteristic in two samples (x1 , x2),
2- Assumption : Two populations are independent .
21
ˆ,ˆ PP
21
21
nn
xx
p



Mekele University: Biostatistics 198
• 3.Hypotheses:
• we have three cases
• Case I : H0: P1 = P2 → P1 - P2 = 0
HA: P1 ≠ P2 → P1 - P2 ≠ 0
• Case II : H0: P1 = P2 → P1 - P2 = 0
HA: P1 > P2 → P1 - P2 > 0
• Case III : H0: P1 = P2 → P1 - P2 = 0
HA: P1 < P2 → P1 - P2 < 0
4.Test Statistic:
Where H0 is true ,is distributed approximately as the standard
normal
21
2121
)1()1(
)()ˆˆ(
n
pp
n
pp
pppp
Z





Mekele University: Biostatistics 199
5.Decision Rule:
i) If HA: P1 ≠ P2
• Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
• _______________________
• ii) If HA: P1 > P2
• Reject H0 if Z >Z1-α
• _____________________________
• iii) If HA: P1 < P2
• Reject H0 if Z< - Z1-α
Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from
table D
6. Conclusion: reject or fail to reject H0
Mekele University: Biostatistics 200
Example
Noonan is a genetic condition that can affect the heart growth,
blood clotting and mental and physical development. Noonan examined
the stature of men and women with Noonan. The study contained 29
Male and 44 female adults. One of the cut-off values used to assess
stature was the third percentile of adult height .Eleven of the males fell
below the third percentile of adult male height ,while 24 of the female
fell below the third percentile of female adult height .Does this study
provide sufficient evidence for us to conclude that among subjects with
Noonan ,females are more likely than males to fall below the respective
of adult height? Let α=0.05
Solution:
1.Data: n M = 29, n F = 44 , x M= 11 , x F= 24, α=0.05
479.0
4429
2411







FM
FM
nn
xx
p 545.0
44
24
ˆ,379.0
29
11
ˆ 
F
F
F
M
m
M
n
x
p
n
x
p
Mekele University: Biostatistics 201
2- Assumption : Two populations are independent .
3.Hypotheses:
• Case II : H0: PF = PM → PF - PM = 0
HA: PF > PM → PF - PM > 0
• 4.Test Statistic:
5.Decision Rule:
Reject H0 if Z >Z1-α , Where Z1-α = Z1-0.05 =Z0.95= 1.645
6. Conclusion: Fail to reject H0
Since Z =1.39 > Z1-α=1.645
Or , If P-value = 0.0823 → fail to reject H0 → P > α
39.1
29
)521.0)(479.0(
44
)521.0)(479.0(
0)379.0545.0(
)1()1(
)()ˆˆ(
21
2121









n
pp
n
pp
pppp
Z
An Introduction to the Chi-Square
Distribution
Mekele University: Biostatistics 203
TESTS OF INDEPENDENCE
• To test whether two criteria of classification are
independent . For example socioeconomic status and
area of residence of people in a city are independent.
• We divide our sample according to status, low, medium
and high incomes etc. and the same samples is
categorized according to urban, rural or suburban and
slums etc.
• Put the first criterion in columns equal in number to
classification of 1st criteria ( Socioeconomic status) and
the 2nd in rows, where the no. of rows equal to the no.
of categories of 2nd criteria (areas of cities).
Mekele University: Biostatistics 204
The Contingency Table
• Table Two-Way Classification of sample
First Criterion of Classification →
Second
Criterion ↓
1 2 3 ….. c Total
1
2
3
.
.
r
N11
N21
N31
.
.
Nr1
N12
N22
N32
.
.
Nr2
N13
N 23
N33
.
.
Nr3
……
……
…...
……
N1c
N2c
N3c
.
.
N rc
N1.
N2.
N3.
.
.
Nr.
Total N.1 N.2 N.3 …… N.c N
Mekele University: Biostatistics 205
Observed versus Expected Frequencies
• Oi j : The frequencies in ith row and jth column given in any
contingency table are called observed frequencies that result
form the cross classification according to the two
classifications.
• ei j :Expected frequencies on the assumption of independence
of two criterion are calculated by multiplying the marginal
totals of any cell and then dividing by total frequency
• Formula:
N
NN
e
ji
ij
)(( 

Mekele University: Biostatistics 206
Chi-square Test
• After the calculations of expected frequency,
Prepare a table for expected frequencies and use Chi-square
Where summation is for all values of r xc = k cells.
• D.F.: the degrees of freedom for using the table are (r-1)(c-1)
for α level of significance
• Note that the test is always one-sided.




k
i
e
eo
i
ii
1
2
]
)(
[
2

Mekele University: Biostatistics 207
Example
The researcher are interested to determine that preconception
use of folic acid and race are independent. The data is:
Observed Frequencies Table Expected frequencies Table
Use of
Folic
Acid total
Yes
No
White
Black
Other
260
15
7
299
41
14
559
56
21
Total 282 354 636
Yes no Total
White
Black
Other
s
(282)(559)/636
=247.86
(282)(56)/636
=24.83
(282)((21)
=9.31
(354)(559)/63
6
=311.14
(354)(559)
=
31.17
21x354/636
=11.69
559
56
21
Mekele University: Biostatistics 208
Calculations and Testing
091.969.11/.....
14.311/86.247/
)69.1114(
)14.311299()86.247260(
2
222




• Data: See the given table
• Assumption: Simple random sample
• Hypothesis: H0: race and use of folic acid are independent
HA: the two variables are not independent. Let α = 0.05
• The test statistic is Chi Square given earlier
• Distribution when H0 is true chi-square is valid with (r-1)(c-1) = (3-
1)(2-1)= 2 d.f.
• Decision Rule: Reject H0 if value of is greater than
= 5.991
• Calculations:

2

2
)1)(1(,  cr
Mekele University: Biostatistics 209
Conclusion
• Statistical decision. We reject H0 since 9.08960> 5.991
• Conclusion: we conclude that H0 is false, and that there is a
relationship between race and preconception use of folic
acid.
• P value. Since 7.378< 9.08960< 9.210, 0.01<p <0.025
• We also reject the hypothesis at 0.025 level of significance but
do not reject it at 0.01 level.

Más contenido relacionado

La actualidad más candente

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAileen Balbido
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatisticsshivamdixit57
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation Remyagharishs
 
Epidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notesEpidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notesCharles Ntwale
 
Software packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSSSoftware packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSSANAND BALAJI
 
Biostatistics
BiostatisticsBiostatistics
BiostatisticsPRIYAG63
 
Introduction to Biostatistics and types of sampling methods
Introduction to Biostatistics and types of sampling methodsIntroduction to Biostatistics and types of sampling methods
Introduction to Biostatistics and types of sampling methodsDr. Sunita Ojha
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatisticsVineetha K
 
Ch4 Confidence Interval
Ch4 Confidence IntervalCh4 Confidence Interval
Ch4 Confidence IntervalFarhan Alfin
 
Biostatistics (Basic Definitions & Concepts) | SurgicoMed.com
Biostatistics (Basic Definitions & Concepts) | SurgicoMed.comBiostatistics (Basic Definitions & Concepts) | SurgicoMed.com
Biostatistics (Basic Definitions & Concepts) | SurgicoMed.comMukhdoom BaharAli
 
Various statistical software's in data analysis.
Various statistical software's in data analysis.Various statistical software's in data analysis.
Various statistical software's in data analysis.SelvaMani69
 
role of Biostatistics (new)
role of Biostatistics (new)role of Biostatistics (new)
role of Biostatistics (new)Mmedsc Hahm
 

La actualidad más candente (20)

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
 
Chi – square test
Chi – square testChi – square test
Chi – square test
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation
 
Epidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notesEpidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notes
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Software packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSSSoftware packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSS
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Biostatistics Concept & Definition
Biostatistics Concept & DefinitionBiostatistics Concept & Definition
Biostatistics Concept & Definition
 
Regression
RegressionRegression
Regression
 
Analysis of variance
Analysis of varianceAnalysis of variance
Analysis of variance
 
coefficient variation
coefficient variationcoefficient variation
coefficient variation
 
Measure of Dispersion in statistics
Measure of Dispersion in statisticsMeasure of Dispersion in statistics
Measure of Dispersion in statistics
 
Introduction to Biostatistics and types of sampling methods
Introduction to Biostatistics and types of sampling methodsIntroduction to Biostatistics and types of sampling methods
Introduction to Biostatistics and types of sampling methods
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
 
Ch4 Confidence Interval
Ch4 Confidence IntervalCh4 Confidence Interval
Ch4 Confidence Interval
 
Biostatistics khushbu
Biostatistics khushbuBiostatistics khushbu
Biostatistics khushbu
 
Biostatistics (Basic Definitions & Concepts) | SurgicoMed.com
Biostatistics (Basic Definitions & Concepts) | SurgicoMed.comBiostatistics (Basic Definitions & Concepts) | SurgicoMed.com
Biostatistics (Basic Definitions & Concepts) | SurgicoMed.com
 
Various statistical software's in data analysis.
Various statistical software's in data analysis.Various statistical software's in data analysis.
Various statistical software's in data analysis.
 
role of Biostatistics (new)
role of Biostatistics (new)role of Biostatistics (new)
role of Biostatistics (new)
 

Destacado

Application of Biostatistics
Application of BiostatisticsApplication of Biostatistics
Application of BiostatisticsJippy Jack
 
Fundamentals of biostatistics
Fundamentals of biostatisticsFundamentals of biostatistics
Fundamentals of biostatisticsKingsuk Sarkar
 
Research methodology & Biostatistics
Research methodology & Biostatistics  Research methodology & Biostatistics
Research methodology & Biostatistics Kusum Gaur
 
statistics in pharmaceutical sciences
statistics in pharmaceutical sciencesstatistics in pharmaceutical sciences
statistics in pharmaceutical sciencesTechmasi
 
Biostatistics
BiostatisticsBiostatistics
Biostatisticspriyarokz
 
Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statisticsalbertlaporte
 
Lecture 1 basic concepts2009
Lecture 1 basic concepts2009Lecture 1 basic concepts2009
Lecture 1 basic concepts2009barath r baskaran
 
Introduction to statistics...ppt rahul
Introduction to statistics...ppt rahulIntroduction to statistics...ppt rahul
Introduction to statistics...ppt rahulRahul Dhaker
 
Role of Statistics in Scientific Research
Role of Statistics in Scientific ResearchRole of Statistics in Scientific Research
Role of Statistics in Scientific ResearchVaruna Harshana
 
importance of biostatics in modern reasearch
importance of biostatics in modern reasearchimportance of biostatics in modern reasearch
importance of biostatics in modern reasearchsana sana
 
Simple understanding of biostatistics
Simple understanding of biostatisticsSimple understanding of biostatistics
Simple understanding of biostatisticsHamdi Alhakimi
 
Statistics
StatisticsStatistics
Statisticspikuoec
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatisticso_devinyak
 
Biostatistics : Types of Variable
Biostatistics : Types of VariableBiostatistics : Types of Variable
Biostatistics : Types of VariableTarekk Alazabee
 

Destacado (20)

Application of Biostatistics
Application of BiostatisticsApplication of Biostatistics
Application of Biostatistics
 
Fundamentals of biostatistics
Fundamentals of biostatisticsFundamentals of biostatistics
Fundamentals of biostatistics
 
Research methodology & Biostatistics
Research methodology & Biostatistics  Research methodology & Biostatistics
Research methodology & Biostatistics
 
biostatistics basic
biostatistics basic biostatistics basic
biostatistics basic
 
statistics in pharmaceutical sciences
statistics in pharmaceutical sciencesstatistics in pharmaceutical sciences
statistics in pharmaceutical sciences
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Introduction to Biostatistics
Introduction to BiostatisticsIntroduction to Biostatistics
Introduction to Biostatistics
 
Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
 
Lecture 1 basic concepts2009
Lecture 1 basic concepts2009Lecture 1 basic concepts2009
Lecture 1 basic concepts2009
 
Introduction to statistics...ppt rahul
Introduction to statistics...ppt rahulIntroduction to statistics...ppt rahul
Introduction to statistics...ppt rahul
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Role of Statistics in Scientific Research
Role of Statistics in Scientific ResearchRole of Statistics in Scientific Research
Role of Statistics in Scientific Research
 
importance of biostatics in modern reasearch
importance of biostatics in modern reasearchimportance of biostatics in modern reasearch
importance of biostatics in modern reasearch
 
Simple understanding of biostatistics
Simple understanding of biostatisticsSimple understanding of biostatistics
Simple understanding of biostatistics
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
STATISTICS
STATISTICSSTATISTICS
STATISTICS
 
Statistics
StatisticsStatistics
Statistics
 
Statistical ppt
Statistical pptStatistical ppt
Statistical ppt
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
 
Biostatistics : Types of Variable
Biostatistics : Types of VariableBiostatistics : Types of Variable
Biostatistics : Types of Variable
 

Similar a INTRODUCTION TO BIO STATISTICS

statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.pptCHANDAN PADHAN
 
biostatfinal_(2).ppt
biostatfinal_(2).pptbiostatfinal_(2).ppt
biostatfinal_(2).pptBiswa48
 
1. Introdution to Biostatistics.ppt
1. Introdution to Biostatistics.ppt1. Introdution to Biostatistics.ppt
1. Introdution to Biostatistics.pptFatima117039
 
Biostatistics.school of public healthppt
Biostatistics.school of public healthpptBiostatistics.school of public healthppt
Biostatistics.school of public healthppttemesgengirma0906
 
Lecture 1 Biostatistics Introduciton.pptx
Lecture 1 Biostatistics Introduciton.pptxLecture 1 Biostatistics Introduciton.pptx
Lecture 1 Biostatistics Introduciton.pptxshakirRahman10
 
BIOSTATISTICS + EXERCISES
BIOSTATISTICS + EXERCISESBIOSTATISTICS + EXERCISES
BIOSTATISTICS + EXERCISESMINANI Theobald
 
Biostatistics-C-2-Item-2pdf.pdf
Biostatistics-C-2-Item-2pdf.pdfBiostatistics-C-2-Item-2pdf.pdf
Biostatistics-C-2-Item-2pdf.pdfArifHasan773786
 
Statistical Analysis Of Data Final
Statistical Analysis Of Data FinalStatistical Analysis Of Data Final
Statistical Analysis Of Data FinalSaba Butt
 
Introduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptxIntroduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptxMelba Shaya Sweety
 
When to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptxWhen to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptxAsokan R
 
Biostatistics general introduction central tendency
Biostatistics general introduction central tendencyBiostatistics general introduction central tendency
Biostatistics general introduction central tendencyMohmmad Amil Rahman
 
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghdBiostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghdmadanshresthanepal
 
Ebd1 lecture 3 2010
Ebd1 lecture 3  2010Ebd1 lecture 3  2010
Ebd1 lecture 3 2010Reko Kemo
 

Similar a INTRODUCTION TO BIO STATISTICS (20)

statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.ppt
 
biostatfinal_(2).ppt
biostatfinal_(2).pptbiostatfinal_(2).ppt
biostatfinal_(2).ppt
 
bio 1 & 2.pptx
bio 1 & 2.pptxbio 1 & 2.pptx
bio 1 & 2.pptx
 
1. Introdution to Biostatistics.ppt
1. Introdution to Biostatistics.ppt1. Introdution to Biostatistics.ppt
1. Introdution to Biostatistics.ppt
 
Biostatistics.school of public healthppt
Biostatistics.school of public healthpptBiostatistics.school of public healthppt
Biostatistics.school of public healthppt
 
Introduction.pdf
Introduction.pdfIntroduction.pdf
Introduction.pdf
 
Lecture 1 Biostatistics Introduciton.pptx
Lecture 1 Biostatistics Introduciton.pptxLecture 1 Biostatistics Introduciton.pptx
Lecture 1 Biostatistics Introduciton.pptx
 
BIOSTATISTICS + EXERCISES
BIOSTATISTICS + EXERCISESBIOSTATISTICS + EXERCISES
BIOSTATISTICS + EXERCISES
 
Biostatistics-C-2-Item-2pdf.pdf
Biostatistics-C-2-Item-2pdf.pdfBiostatistics-C-2-Item-2pdf.pdf
Biostatistics-C-2-Item-2pdf.pdf
 
Statistical Analysis Of Data Final
Statistical Analysis Of Data FinalStatistical Analysis Of Data Final
Statistical Analysis Of Data Final
 
brm Assign 5.pdf
brm Assign 5.pdfbrm Assign 5.pdf
brm Assign 5.pdf
 
Biostatistics
Biostatistics Biostatistics
Biostatistics
 
Medical Statistics.pptx
Medical Statistics.pptxMedical Statistics.pptx
Medical Statistics.pptx
 
Biostatics ppt
Biostatics pptBiostatics ppt
Biostatics ppt
 
Intro to Biostat. ppt
Intro to Biostat. pptIntro to Biostat. ppt
Intro to Biostat. ppt
 
Introduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptxIntroduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptx
 
When to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptxWhen to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptx
 
Biostatistics general introduction central tendency
Biostatistics general introduction central tendencyBiostatistics general introduction central tendency
Biostatistics general introduction central tendency
 
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghdBiostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghd
 
Ebd1 lecture 3 2010
Ebd1 lecture 3  2010Ebd1 lecture 3  2010
Ebd1 lecture 3 2010
 

Más de Meklelle university (20)

Chronic obstructive pulmonary disease ppt
Chronic obstructive pulmonary disease   pptChronic obstructive pulmonary disease   ppt
Chronic obstructive pulmonary disease ppt
 
Thyroid neoplasms
Thyroid neoplasmsThyroid neoplasms
Thyroid neoplasms
 
Acute urinary retention mgt
Acute urinary retention mgtAcute urinary retention mgt
Acute urinary retention mgt
 
Lung ca
Lung caLung ca
Lung ca
 
Head injury (2)
Head injury (2)Head injury (2)
Head injury (2)
 
6 gall blader & biliary tree diseases
6 gall blader & biliary tree diseases6 gall blader & biliary tree diseases
6 gall blader & biliary tree diseases
 
Dermatitis and eczema
Dermatitis and eczemaDermatitis and eczema
Dermatitis and eczema
 
Rehab of injuries to the wrist and hand power pt
Rehab of  injuries to the wrist and hand power ptRehab of  injuries to the wrist and hand power pt
Rehab of injuries to the wrist and hand power pt
 
Rehab cervical through cocegeal power pt
Rehab cervical through cocegeal power ptRehab cervical through cocegeal power pt
Rehab cervical through cocegeal power pt
 
Rehab abdomen and thorax power pt
Rehab abdomen and thorax power ptRehab abdomen and thorax power pt
Rehab abdomen and thorax power pt
 
Chapter 9 power pt
Chapter 9  power ptChapter 9  power pt
Chapter 9 power pt
 
Research methodology by hw
 Research methodology by hw Research methodology by hw
Research methodology by hw
 
Prom
PromProm
Prom
 
Thyroid neoplasms
Thyroid neoplasmsThyroid neoplasms
Thyroid neoplasms
 
Goiter
GoiterGoiter
Goiter
 
Diabetes mellitus
Diabetes mellitusDiabetes mellitus
Diabetes mellitus
 
Breast ca
Breast  ca Breast  ca
Breast ca
 
Prenatal diagnosis
Prenatal diagnosisPrenatal diagnosis
Prenatal diagnosis
 
Minor conditions of pregnancy
Minor conditions of pregnancyMinor conditions of pregnancy
Minor conditions of pregnancy
 
Antenatal care
Antenatal careAntenatal care
Antenatal care
 

Último

OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 

Último (20)

OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 

INTRODUCTION TO BIO STATISTICS

  • 2. Introduction • Key words : – Statistics , data , Biostatistics, – Variable ,Population ,Sample Mekele University: Biostatistics 2
  • 3. Introduction Some Basic concepts Statistics is a field of study concerned with 1- collection, organization, summarization and analysis of data. 2- drawing of inferences about a body of data when only a part of the data is observed.  Statisticians try to interpret and communicate the results to others. Mekele University: Biostatistics 3
  • 4. * Biostatistics: The tools of statistics are employed in many fields: business, education, psychology, agriculture, economics, … etc. When the data analyzed are derived from the biological science and medicine, we use the term biostatistics to distinguish this particular application of statistical tools and concepts. Mekele University: Biostatistics 4
  • 5. Data: The raw material of Statistics is data. We may define data as figures. Figures result from the process of counting or from taking a measurement. For example: - When a hospital administrator counts the number of patients (counting). - When a nurse weighs a patient (measurement) Mekele University: Biostatistics 5
  • 6. Sources of data Records Surveys Experiments Comprehensive Sample Mekele University: Biostatistics 6
  • 7. We search for suitable data to serve as the raw material for our investigation. Such data are available from one or more of the following sources: 1- Routinely kept records. For example: - Hospital medical records contain immense amounts of information on patients. - Hospital accounting records contain a wealth of data on the facility’s business activities. Mekele University: Biostatistics 7 * Sources of Data:
  • 8. 2- Surveys: The source may be a survey, if the data needed is about answering certain questions. For example: If the administrator of a clinic wishes to obtain information regarding the mode of transportation used by patients to visit the clinic, then a survey may be conducted among patients to obtain this information. Mekele University: Biostatistics 8
  • 9. 3- Experiments. Frequently the data needed to answer a question are available only as the result of an experiment. For example: If a nurse wishes to know which of several strategies is best for maximizing patient compliance, she might conduct an experiment in which the different strategies of motivating compliance are tried with different patients. Mekele University: Biostatistics 9
  • 10. * A variable: It is a characteristic that takes on different values in different persons, places, or things. For example: - heart rate, - the heights of adult males, - the weights of preschool children, - the ages of patients seen in a dental clinic. Mekele University: Biostatistics 10
  • 11. Types of variables Quantitative Qualitative Quantitative Variables It can be measured in the usual sense. For example: - the heights of adult males, - the weights of preschool children, - the ages of patients seen in a dental clinic. Mekele University: Biostatistics 11 Qualitative Variables Many characteristics are not capable of being measured. Some of them can be ordered or ranked. For example: - classification of people into socio- economic groups, - social classes based on income, education, etc.
  • 12. Types of quantitative variables Discrete Continuous A discrete variable is characterized by gaps or interruptions in the values that it can assume. For example: - The number of daily admissions to a general hospital, - The number of decayed, missing or filled teeth per child in an elementary school. Mekele University: Biostatistics 12 A continuous variable can assume any value within a specified relevant interval of values assumed by the variable. For example: - Height, - weight, - skull circumference. No matter how close together the observed heights of two people, we can find another person whose height falls somewhere in between.
  • 13. Interval Types of variables & scale of measurement Quantitative variables Numerical Qualitative variables Categorical Ratio Nominal Ordinal 13Mekele University: Biostatistics
  • 14. Nominal unordered categories numbers used to represent categories averages are meaningless; look at frequency/proportion in each category dichotomous e.g. gender: male = 1, female = 0 polytomous e.g. blood type: O = 1, A = 2, B = 3, AB = 4 Mekele University: Biostatistics 14
  • 15. Ordinal ordered categories numbers used to represent categories order matters; magnitude does not differences between categories are meaningless Example:- severity of injury: fatal = 1, severe = 2, moderate = 3, minor = 4 Mekele University: Biostatistics 15
  • 16. Interval The differences between observational units is equal The zero point is arbitrary and does not infer the absence of the property being measured Examples: Degrees Fahrenheit The difference between 30 and 40 is the same as that between 70 and 80 degrees. But 80 is not twice as hot as 40. years: The difference between 1993-1994 is the same as 1995- 1996, but year 0 was not the beginning of time. Mekele University: Biostatistics 16
  • 17. Ratio The most detailed and objectively interpretable of the measurement scales. Interval scale with an absolute zero-it has a true zero point (absence of property being measured) as well as equal intervals E.g. Height, weight, money, age, time, speed, class size, the Kelvin scale of temperature Mekele University: Biostatistics 17
  • 18. Cont… Independent variables Precede dependent variables in time Are often manipulated by the researcher The treatment or intervention that is used in a study Dependent variables What is measured as an outcome in a study Values depend on the independent variable Mekele University: Biostatistics 18
  • 19. * A population: It is the largest collection of values of a random variable for which we have an interest at a particular time. For example: • headache patients in a chiropractic office; automobile crash victims in an emergency room • In research, it is not practical to include all members of a population • Thus, a sample (a subset of a population) is taken • Populations may be finite or infinite. Mekele University: Biostatistics 19
  • 20. A Sample it is a part of a population e.g. the fraction of these patients Random sample Subjects are selected from a population so that each individual has an equal chance of being selected Random samples are representative of the source population Non-random samples are not representative May be biased regarding age, severity of the condition, socioeconomic status etc Mekele University: Biostatistics 20
  • 22. Types of statistical methods Descriptive statistics Describe the data by summarizing them Inferential statistics Techniques, by which inferences are drawn for the population parameters from the sample statistics OR sample statistics observed are inferred to the corresponding population parameters Mekele University: Biostatistics 22
  • 23. Cont… Parameter Summary data from a population Statistic Summary data from a sample Mekele University: Biostatistics 23
  • 24. Examples of Scales of Measurements • Low income ordinal • CD4 count ratio • Year of birth interval • IQ scores interval • Severe injury ordinal • Raw score on a statistics exam interval • Room temperature in Kelvin ratio • Nationality of MU students nominal 24Mekele University: Biostatistics
  • 25. Descriptive statistics Strategies for understanding the meanings of Data
  • 26. Mekele University: Biostatistics 26 • Key words Frequency table, bar chart ,range width of interval , mid-interval Histogram , Polygon
  • 27. Descriptive statistics Before performing any analyses, you must first get to know your data Descriptive statistics are used to summarize data in the form of tables, graphs and numerical measures The summary technique used depends on the data type under consideration Mekele University: Biostatistics 27
  • 28. Presentation techniques for qualitative/categorical data statistics Frequency Relative frequency Cumulative frequency Figure/chart Pie chart Bar chart Mekele University: Biostatistics 28
  • 29. Frequency Distribution for Discrete Random Variables Example: Suppose that we take a sample of size 16 from children in a primary school and get the following data about the number of their decayed teeth, 3,5,2,4,0,1,3,5,2,3,2,3,3,2,4,1 To construct a frequency table: 1- Order the values from the smallest to the largest. 0,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5 2- Count how many numbers are the same.
  • 31. Mekele University: Biostatistics 31 Representing the simple frequency table using the bar chart Number of decayed teeth 5.004.003.002.001.00.00 Frequency 6 5 4 3 2 1 0 22 5 4 2 1 We can represent the above simple frequency table using the bar chart. Ordinal or nominal data Height of each bar is the frequency of that category
  • 33. Cont… 0 10 20 30 40 50 % Single Married Divorced Widowed Marital status Male Female 33Mekele University: Biostatistics
  • 34. Cont… instead of “stacks” rising up from horizontal (bar chart), we could plot instead the shares of a pie Recalling that a circle has 360 degrees 50% means 180 degrees 25% means 90 degrees Mekele University: Biostatistics 34
  • 36. Mekele University: Biostatistics 36 Frequency Distribution for Continuous Random Variables For large samples, we can’t use the simple frequency table to represent the data. We need to divide the data into groups or intervals or classes. So, we need to determine: 1- The number of intervals (k). Too few intervals are not good because information will be lost. Too many intervals are not helpful to summarize the data. A commonly followed rule is that 6 ≤ k ≤ 15, or the following formula may be used, k = 1 + 3.322 (log n)
  • 37. Mekele University: Biostatistics 37 2- The range (R). It is the difference between the largest and the smallest observation in the data set. 3- The Width of the interval (w). Class intervals generally should be of the same width. Thus, if we want k intervals, then w is chosen such that w ≥ R / k.
  • 38. Mekele University: Biostatistics 38 Example: Assume that the number of observations equal 100, then k = 1+3.322(log 100) = 1 + 3.3222 (2) = 7.6  8. Assume that the smallest value = 5 and the largest one of the data = 61, then R = 61 – 5 = 56 and w = 56 / 8 = 7. To make the summarization more comprehensible, the class width may be 5 or 10 or the multiples of 10.
  • 40. Mekele University: Biostatistics 40 Example 2.3.1 • We wish to know how many class interval to have in the frequency distribution of the data in Table 1.4.1 of ages of 189 subjects who Participated in a study on smoking cessation Solution : • Since the number of observations equal 189, then • k = 1+3.322(log 189) • = 1 + 3.3222 (2.276)  9, • R = 82 – 30 = 52 and • w = 52 / 9 = 5.778  It is better to let w = 10, then the intervals will be in the form:
  • 41. Mekele University: Biostatistics 41 FrequencyClass interval 1130 – 39 4640 – 49 7050 – 59 4560 – 69 1670 – 79 180 – 89 189Total Sum of frequency =sample size=n
  • 42. Mekele University: Biostatistics 42 The Cumulative Frequency: It can be computed by adding successive frequencies. The Cumulative Relative Frequency: It can be computed by adding successive relative frequencies. The Mid-interval: It can be computed by adding the lower bound of the interval plus the upper bound of it and then divide over 2.
  • 43. Mekele University: Biostatistics 43 For the above example, the following table represents the cumulative frequency, the relative frequency, the cumulative relative frequency and the mid-interval. Cumulative Relative Frequency Relative Frequency R.f Cumulative Frequency Frequency Freq (f) Mid – interval Class interval 0.05820.0582111134.530 – 39 -0.2434574644.540 – 49 0.6720-127-54.550 – 59 0.91010.2381-45-60 – 69 0.99480.08471881674.570 – 79 10.0053189184.580 – 89 1189Total R.f= freq/n
  • 44. Mekele University: Biostatistics 44 Example : • From the above frequency table, complete the table then answer the following questions: 1-The number of objects with age less than 50 years ? 2-The number of objects with age between 40-69 years ? 3-Relative frequency of objects with age between 70-79 years ? 4-Relative frequency of objects with age more than 69 years ? 5-The percentage of objects with age between 40-49 years ? 6- The percentage of objects with age less than 60 years ? 7-The Range (R) ? 8- Number of intervals (K)? 9- The width of the interval ( W) ?
  • 45. Mekele University: Biostatistics 45 Representing the grouped frequency table using the histogram To draw the histogram, the true classes limits should be used. They can be computed by subtracting 0.5 from the lower limit and adding 0.5 to the upper limit for each interval. FrequencyTrue class limits 1129.5 – <39.5 4639.5 – < 49.5 7049.5 – < 59.5 4559.5 – < 69.5 1669.5 – < 79.5 179.5 – < 89.5 189Total 0 10 20 30 40 50 60 70 80 34.5 44.5 54.5 64.5 74.5 84.5
  • 46. Mekele University: Biostatistics 46 Representing the grouped frequency table using the Polygon 0 10 20 30 40 50 60 70 80 34.5 44.5 54.5 64.5 74.5 84.5
  • 47. Histogram •continuous data divided into categories •graphical representation of frequency distribution •height of each bar is the frequency of that category •assess skewness and modality of the data 47 Mekele University: Biostatistics
  • 49. frequency polygon - is an alternative to the histogram whereas in a histogram - X-axis shows intervals of values - Y-axis shows bars of frequencies . in a frequency polygon X-axis shows midpoints of intervals of values Y-axis shows dot instead of bars 49 Mekele University: Biostatistics
  • 51. Box plots - discrete or continuous data - displays the 25th, 50th and 75th percentiles of the data also known as the first, second and third quartiles respectively - whiskers extend to adjacent values which are not outliers - outliers indicated as circles - box shows the interquartile range of the data can be used to assess skewness 51 Mekele University: Biostatistics
  • 53. Two-way scatter plots • used to assess the relationship between two discrete or continuous measures nature of the relationship described as positive, negative or no relationship 53 Mekele University: Biostatistics
  • 54. Line graph • two continuous measures each x value has only one corresponding y value useful for looking at patterns over time can be used to compare 2 or more groups 54 Mekele University: Biostatistics
  • 55. CONTI… 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 55 Mekele University: Biostatistics
  • 56. Line Graph 0 10 20 30 40 50 60 1960 1970 1980 1990 2000 Year MMR/1000 Year MMR 1960 50 1970 45 1980 26 1990 15 2000 12 Figure (1): Maternal mortality rate of (country), 1960-2000 56Mekele University: Biostatistics
  • 57. GRAPHS & CHARTS – LINE GRAPH 57 Mekele University: Biostatistics
  • 59. Mekele University: Biostatistics 59 key words: Descriptive Statistic, measure of central tendency ,statistic, parameter, mean (μ) ,median, mode.
  • 60. Mekele University: Biostatistics 60 The Statistic and The Parameter • A Statistic: It is a descriptive measure computed from the data of a sample. • A Parameter: It is a a descriptive measure computed from the data of a population. Since it is difficult to measure a parameter from the population, a sample is drawn of size n, whose values are  1 ,  2 , …,  n. From this data, we measure the statistic.
  • 61. Mekele University: Biostatistics 61 Measures of Central Tendency A measure of central tendency is a measure which indicates where the middle of the data is. The three most commonly used measures of central tendency are: The Mean, the Median, and the Mode. The Mean: It is the average of the data.
  • 62. Mekele University: Biostatistics 62 The Population Mean:  = which is usually unknown, then we use the sample mean to estimate or approximate it. The Sample Mean: = Example: Here is a random sample of size 10 of ages, where  1 = 42,  2 = 28,  3 = 28,  4 = 61,  5 = 31,  6 = 23,  7 = 50,  8 = 34,  9 = 32,  10 = 37. = (42 + 28 + … + 37) / 10 = 36.6 x 1 N i i N X  x 1 n i i n x 
  • 63. Mekele University: Biostatistics 63 Properties of the Mean: • Uniqueness. For a given set of data there is one and only one mean. • Simplicity. It is easy to understand and to compute. • Affected by extreme values. Since all values enter into the computation. Example: Assume the values are 115, 110, 119, 117, 121 and 126. The mean = 118. But assume that the values are 75, 75, 80, 80 and 280. The mean = 118, a value that is not representative of the set of data as a whole.
  • 64. Mekele University: Biostatistics 64 The Median: When ordering the data, it is the observation that divide the set of observations into two equal parts such that half of the data are before it and the other are after it. • If n is odd, the median will be the middle of observations. It will be the (n+1)/2 th ordered observation. When n = 11, then the median is the 6th observation. • If n is even, there are two middle observations. • The median will be the mean of these two middle observations. It will be the [(n/2)th+((n/2)+1)th]/2 ordered observation. When n = 12, then the median is an observation halfway between the 6th and 7th ordered observation.
  • 65. Mekele University: Biostatistics 65 Example: For the same random sample, the ordered observations will be as: 23, 28, 28, 31, 32, 34, 37, 42, 50, 61. Since n = 10, then the median is the 5.5th observation, i.e. = (32+34)/2 = 33. Properties of the Median: • Uniqueness. For a given set of data there is one and only one median. • Simplicity. It is easy to calculate. • It is not affected by extreme values as is the mean.
  • 66. Mekele University: Biostatistics 66 The Mode: It is the value which occurs most frequently. If all values are different there is no mode. Sometimes, there are more than one mode. Example: For the same random sample, the value 28 is repeated two times, so it is the mode. Properties of the Mode: • Sometimes, it is not unique. • It may be used for describing qualitative data.
  • 67. Mekele University: Biostatistics 67 Quintiles Quintiles • Dividing the distribution of ordered values into equal-sized parts – Quartiles: 4 equal parts – Deciles: 10 equal parts – Percentiles: 100 equal parts First 25% Second 25% Third 25% Fourth 25% Q1 Q2 Q3 Q1:first quartile Q2:second quartile = median Q3:third quartile
  • 68. Example: Given the following data set (age of patients):- 18,59,24,42,21,23,24,32 find the third quartile Solution: sort the data from lowest to highest 18 21 23 24 24 32 42 59 3rd quartile = {3/4 (n+1)}th observation = (6.75)th observation = 32 + (42-32)x .75 = 39.5 Mekele University: Biostatistics 68
  • 70. Mekele University: Biostatistics 70 key words: Descriptive Statistic, measure of dispersion , range ,variance, coefficient of variation.
  • 71. Mekele University: Biostatistics 71 Measures of Dispersion: • A measure of dispersion conveys information regarding the amount of variability present in a set of data. • Note: 1. If all the values are the same → There is no dispersion . 2. If all the values are different → There is a dispersion: 3.If the values close to each other →The amount of Dispersion is small. b) If the values are widely scattered → The Dispersion is greater.
  • 72. Mekele University: Biostatistics 72 Example • ** Measures of Dispersion are : 1.Range (R). 2. Variance. 3. Standard deviation. 4.Coefficient of variation (C.V).
  • 73. Mekele University: Biostatistics 73 1.The Range (R): • Range =Largest value- Smallest value = • Note: – Range concern only onto two values – Highly sensitive to outliers – Data: 43,66,61,64,65,38,59,57,57,50. • Find Range? Range=66-38=28 • Inter-quartile range – 3rd quartile – 1st quartile (75th – 25th percentile) – Robust to outliers – Middle 50% of observations SL xx 
  • 74. Mekele University: Biostatistics 74 2.The Variance: • It measure dispersion relative to the scatter of the values a bout their mean. a) Sample Variance ( ) : • ,where is sample mean • Find Sample Variance of ages , = 56 • Solution: • S2= [(43-56) 2 +(66-56) 2+…..+(50-56) 2 ]/ 10 • = 900/10 = 90 x 2 S 1 )( 1 2 2     n xx S n i i x
  • 75. Mekele University: Biostatistics 75 • b)Population Variance ( ) : where , is Population mean 3.The Standard Deviation: • is the square root of variance= a) Sample Standard Deviation = S = b) Population Standard Deviation = σ = 2  N x N i i   1 2 2 )(   Varince 2 S 2  
  • 76. STANDARD DEVIATION SD 7 7 7 7 7 7 7 8 7 7 7 6 3 2 7 8 13 9 Mean = 7 SD=0 Mean = 7 SD=0.63 Mean = 7 SD=4.04 76Mekele University: Biostatistics
  • 77. Standard deviation Caution must be exercised when using standard deviation as a comparative index of dispersion Weights of newborn elephants (kg) 929 853 878 939 895 972 937 841 801 826 Weights of newborn mice (kg) 0.72 0.42 0.63 0.31 0.59 0.38 0.79 0.96 1.06 0.89 n=10 =887.1 sd = 56.50 X n=10 = 0.68 sd = 0.255 X Incorrect to say that elephants show greater variation for birth- weights than mice because of higher standard deviation77Mekele University: Biostatistics
  • 78. Mekele University: Biostatistics 78 4.The Coefficient of Variation (C.V): • Is a measure use to compare the dispersion in two sets of data which is independent of the unit of the measurement . • where S: Sample standard deviation. : Sample mean. )100(. X S VC  X
  • 79. Coefficient of variance Coefficient of variance expresses standard deviation relative to its mean X s cv  Weights of newborn elephants (kg) 929 853 878 939 895 972 937 841 801 826 Weights of newborn mice (kg) 0.72 0.42 0.63 0.31 0.59 0.38 0.79 0.96 1.06 0.89 n=10, = 887.1 s = 56.50 cv = 0.0637 X n=10, = 0.68 s = 0.255 cv = 0.375 X Mice show greater birth- weight variation 79Mekele University: Biostatistics
  • 80. Mekele University: Biostatistics 80 Example: • Suppose two samples of human males yield the following data: Sampe1 Sample2 Age 25-year-olds 11year-olds Mean weight 145 pound 80 pound Standard deviation 10 pound 10 pound
  • 81. Mekele University: Biostatistics 81 • We wish to know which is more variable. Solution: • c.v (Sample1)= (10/145)*100= 6.9 • c.v (Sample2)= (10/80)*100= 12.5 • Then age of 11-years old(sample2) is more variation
  • 82. Mekele University: Biostatistics 82 When to use coefficient of variance • When comparison groups have very different means (CV is suitable as it expresses the standard deviation relative to its corresponding mean) • When different units of measurement are involved, e.g. group 1 unit is mm, and group 2 unit is gm (CV is suitable for comparison as it is unit free) • In such cases, sd should not be used for comparison
  • 84. • Key words: • Probability, objective Probability, subjective probability, equally likely Mutually exclusive, multiplicative rule , Conditional Probability, independent events Mekele University: Biostatistics 84
  • 85. Introduction • The concept of probability is frequently encountered in everyday communication. For example, a physician may say that a patient has a 50-50 chance of surviving a certain operation. Another physician may say that she is 95 percent certain that a patient has a particular disease. • Most people express probabilities in terms of percentages. • But, it is more convenient to express probabilities as fractions. Thus, we may measure the probability of the occurrence of some event by a number between 0 and 1. • The more likely the event, the closer the number is to one. An event that can't occur has a probability of zero, and an event that is certain to occur has a probability of one. Mekele University: Biostatistics 85
  • 86. Two views of Probability objective and subjective: • *** Objective Probability • ** Classical and Relative • Some definitions: 1.Equally likely outcomes: Are the outcomes that have the same chance of occurring. 2.Mutually exclusive: Two events are said to be mutually exclusive if they cannot occur simultaneously such that A B =Φ . Mekele University: Biostatistics 86 
  • 87. • The universal Set (S): The set all possible outcomes. • The empty set Φ : Contain no elements. • The event ,E : is a set of outcomes in S which has a certain characteristic. • Classical Probability : If an event can occur in N mutually exclusive and equally likely ways, and if m of these possess a triat, E, the probability of the occurrence of event E is equal to m/ N . • For Example: in the rolling of the die , each of the six sides is equally likely to be observed . So, the probability that a 4 will be observed is equal to 1/6. Mekele University: Biostatistics 87
  • 88. • Relative Frequency Probability: • Def: If some posses is repeated a large number of times, n, and if some resulting event E occurs m times , the relative frequency of occurrence of E , m/n will be approximately equal to probability of E . P(E) = m/n . • *** Subjective Probability : • Probability measures the confidence that a particular individual has in the truth of a particular proposition. • For Example : the probability that a cure for cancer will be discovered within the next 10 years. Mekele University: Biostatistics 88
  • 89. Elementary Properties of Probability: • Given some process (or experiment ) with n mutually exclusive events E1, E2, E3,…………, En, then 1. P(Ei ) ≥ 0, i= 1,2,3,……n 2. P(E1 )+ P(E2) +……+P(En )=1 3. P(Ei +EJ )= P(Ei )+ P(EJ ), Ei ,EJ are mutually exclusive Mekele University: Biostatistics 89
  • 90. Rules of Probability 1-Addition Rule P(A U B)= P(A) + P(B) – P (A∩B ) 2- If A and B are mutually exclusive (disjoint) ,then P (A∩B ) = 0 Then , addition rule is P(A U B)= P(A) + P(B) . 3- Complementary Rule P(A' )= 1 – P(A) where, A' = complement event Mekele University: Biostatistics 90
  • 91. Example TotalLater >18 (L) Early = 18 (E) Family history of Mood Disorders 633528Negative(A) 573819Bipolar Disorder(B) 854441Unipolar (C) 1136053Unipolar and Bipolar(D) 318177141Total Mekele University: Biostatistics 91
  • 92. **Answer the following questions: Suppose we pick a person at random from this sample. 1-The probability that this person will be 18-years old or younger? 2-The probability that this person has family history of mood orders Unipolar(C)? 3-The probability that this person has no family history of mood orders Unipolar( )? 4-The probability that this person is 18-years old or younger or has no family history of mood orders Negative (A)? 5-The probability that this person is more than18-years old and has family history of mood orders Unipolar and Bipolar(D)? Mekele University: Biostatistics 92 C
  • 93. Solution: 1. P(E)=141/318 2. P(C)=41/318 3. P( )= 1-P(C)=1-41/318 4. P(E U A)=P(E)+P(A)-P(E n A) = (141/318) + (63/318) - 28/318 =141/318 5. P(L n D) = 60/318 C
  • 94. Conditional Probability: P(AB) is the probability of A assuming that B has happened. • P(AB)= , P(B)≠ 0 • P(BA)= , P(A)≠ 0 )( )( BP BAP  )( )( AP BAP  Mekele University: Biostatistics 94
  • 95. Example From previous example , answer • suppose we pick a person at random and find he is 18 years or younger (E),what is the probability that this person will be one who has no family history of mood disorders (A)? • Solution: • P(A/E)=28/141, P(E)=141/318, P(AnE)=(28/318) Mekele University: Biostatistics 95
  • 96. exercise • suppose we pick a person at random and find he has family history of mood (D) what is the probability that this person will be 18 years or younger (E)? Mekele University: Biostatistics 96
  • 97. Multiplicative Rule: • P(A∩B)= P(AB)P(B) • P(A∩B)= P(BA)P(A) Where, • P(A): marginal probability of A. • P(B): marginal probability of B. • P(BA):The conditional probability. Mekele University: Biostatistics 97
  • 98. Independent Events: • If A has no effect on B, we said that A,B are independent events. • Then, 1- P(A∩B)= P(B)P(A) 2- P(AB)=P(A) 3- P(BA)=P(B) Mekele University: Biostatistics 98
  • 99. Example • In a certain high school class consisting of 60 girls and 40 boys, it is observed that 24 girls and 16 boys wear eyeglasses . If a student is picked at random from this class ,the probability that the student wears eyeglasses , P(E), is 40/100 or 0.4 . • What is the probability that a student picked at random wears eyeglasses given that the student is a boy? • What is the probability of the joint occurrence of the events of wearing eye glasses and being a boy? Mekele University: Biostatistics 99
  • 100. Example • Suppose that of 1200 admission to a general hospital during a certain period of time,750 are private admissions. If we designate these as a set A, then compute P(A) , P( ).A Mekele University: Biostatistics 100
  • 101. The Random Variable (X): • When the values of a variable (height, weight, or age) can’t be predicted in advance, the variable is called a random variable. • An example is the adult height. • When a child is born, we can’t predict exactly his or her height at maturity. Mekele University: Biostatistics 101
  • 102. 4.2 Probability Distributions for Discrete Random Variables • Definition: • The probability distribution of a discrete random variable is a table, graph, formula, or other device used to specify all possible values of a discrete random variable along with their respective probabilities. Mekele University: Biostatistics 102
  • 103. The Cumulative Probability Distribution of X, F(x): • It shows the probability that the variable X is less than or equal to a certain value, P(X  x). Mekele University: Biostatistics 103
  • 104. Mekele University: Biostatistics 104 Example : F(x)= P(X≤ x) P(X=x)frequencyNumber of Programs 0.20880.2088621 0.36700.1582472 0.49830.1313393 0.62960.1313394 0.82490.1953585 0.94950.1246376 0.96300.013547 1.00000.0370118 1.0000297Total
  • 105. • Properties of probability distribution of discrete random variable. 1. 2. 3. P(a  X  b) = P(X  b) – P(X  a-1) 4. P(X < b) = P(X  b-1) Mekele University: Biostatistics 105 0 ( ) 1P X x   ( ) 1P X x 
  • 106. 4.3 The Binomial Distribution: • It is derived from a process known as a Bernoulli trial. • Bernoulli trial is : When a random process or experiment called a trial can result in only one of two mutually exclusive outcomes, such as dead or alive, sick or well, the trial is called a Bernoulli trial. Mekele University: Biostatistics 106
  • 107. The Bernoulli Process • A sequence of Bernoulli trials forms a Bernoulli process under the following conditions 1- Each trial results in one of two possible, mutually exclusive, outcomes. One of the possible outcomes is denoted (arbitrarily) as a success, and the other is denoted a failure. 2- The probability of a success, denoted by p, remains constant from trial to trial. The probability of a failure, 1-p, is denoted by q. 3- The trials are independent, that is the outcome of any particular trial is not affected by the outcome of any other trial Mekele University: Biostatistics 107
  • 108. • The probability distribution of the binomial random variable X, the number of successes in n independent trials is: • Where is the number of combinations of n distinct objects taken x of them at a time. * Note: 0! =1 Mekele University: Biostatistics 108 ( ) ( ) , 0,1,2,....,X n X n f x P X x p q x n x            n x        ! !( )! n n x n xx        ! ( 1)( 2)....(1)x x x x  
  • 109. Properties of the binomial distribution • 1. • 2. • 3.The parameters of the binomial distribution are n and p • 4. • 5. Mekele University: Biostatistics 109 ( ) 0f x  ( ) 1f x  ( )E X np   2 var( ) (1 )X np p   
  • 110. Example • If we examine all birth records from the North Carolina State Center for Health statistics for year 2001, we find that 85.8 percent of the pregnancies had delivery in week 37 or later (full- term birth). If we randomly selected five birth records from this population what is the probability that exactly three of the records will be for full-term births? Mekele University: Biostatistics 110
  • 111. Example • Suppose it is known that in a certain population 10 percent of the population is color blind. If a random sample of 25 people is drawn from this population, find the probability that a) Five or fewer will be color blind. b) Six or more will be color blind c) Between six and nine inclusive will be color blind. d) Two, three, or four will be color blind. Mekele University: Biostatistics 111
  • 112. Properties of continuous probability Distributions: *continuous variable is one that can assume any value within a specified interval of values assumed by the variable. 1- Area under the curve = 1. 2- P(X = a) = 0, where a is a constant. 3- Area between two points a , b = P(a<x<b) . Mekele University: Biostatistics 112
  • 113. 4.6 The normal distribution: • It is one of the most important probability distributions in statistics. • The normal density is given by • , - ∞ < x < ∞, - ∞ < µ < ∞, σ > 0 • π, e : constants • µ: population mean. • σ : Population standard deviation. Mekele University: Biostatistics 113 2 2 2 )( 2 1 )(       x exf
  • 114. Characteristics of the normal distribution • The following are some important characteristics of the normal distribution: 1- It is symmetrical about its mean, µ. 2- The mean, the median, and the mode are all equal. 3- The total area under the curve above the x-axis is one. 4-The normal distribution is completely determined by the parameters µ and σ. Mekele University: Biostatistics 114
  • 115. 5- The normal distribution depends on the two parameters  and .  determines the location of the curve. But,  determines the scale of the curve, i.e. the degree of flatness or peaked ness of the curve. Mekele University: Biostatistics 115 1 2 3 1 < 2 < 3  1 2 3 1 < 2 < 3
  • 116. Note that : 1. P( µ- σ < x < µ+ σ) = 0.68 2. P( µ- 2σ< x < µ+ 2σ)= 0.95 3. P( µ-3σ < x < µ+ 3σ) = 0.997 Mekele University: Biostatistics 116
  • 117. The Standard normal distribution: • Is a special case of normal distribution with mean equal 0 and a standard deviation of 1. • The equation for the standard normal distribution is written as • , - ∞ < z < ∞ Mekele University: Biostatistics 117 2 2 2 1 )( z ezf   
  • 118. Characteristics of the standard normal distribution 1- It is symmetrical about 0. 2- The total area under the curve above the x- axis is one. 3- We can use table D to find the probabilities and areas. Mekele University: Biostatistics 118
  • 119. “How to use tables of Z” Note that The cumulative probabilities P(Z  z) are given in tables for -3.49 < z < 3.49. Thus, P (-3.49 < Z < 3.49)  1. For standard normal distribution, P (Z > 0) = P (Z < 0) = 0.5 Example 4.6.1: If Z is a standard normal distribution, then 1) P( Z < 2) = 0.9772 is the area to the left to 2 and it equals 0.9772. Mekele University: Biostatistics 119 2
  • 120. Example 4.6.2: P(-2.55 < Z < 2.55) is the area between -2.55 and 2.55, Then it equals P(-2.55 < Z < 2.55) =0.9946 – 0.0054 = 0.9892. Example 4.6.2: P(-2.74 < Z < 1.53) is the area between -2.74 and 1.53. P(-2.74 < Z < 1.53) =0.9370 – 0.0031 = 0.9339. Mekele University: Biostatistics 120 -2.74 1.53 -2.55 2.55 0
  • 122. Example : P(Z > 2.71) is the area to the right to 2.71. So, P(Z > 2.71) =1 – 0.9966 = 0.0034. Example : P(Z = 0.84) is the area at z = 2.71. So, P(Z = 0.84) =1 – 0.9966 = 0.0034 Mekele University: Biostatistics 122 0.84 2.71
  • 123. How to transform normal distribution (X) to standard normal distribution (Z)? • This is done by the following formula: • Example: • If X is normal with µ = 3, σ = 2. Find the value of standard normal Z, If X= 6? • Answer: Mekele University: Biostatistics 123    x z 5.1 2 36       x z
  • 124. Normal Distribution Applications The normal distribution can be used to model the distribution of many variables that are of interest. This allow us to answer probability questions about these random variables. Example 4.7.1: The „Uptime ‟is a custom-made light weight battery-operated activity monitor that records the amount of time an individual spend the upright position. In a study of children ages 8 to 15 years. The researchers found that the amount of time children spend in the upright position followed a normal distribution with Mean of 5.4 hours and standard deviation of 1.3.Find Mekele University: Biostatistics 124
  • 125. If a child selected at random ,then 1-The probability that the child spend less than 3 hours in the upright position 24-hour period P( X < 3) = P( < ) = P(Z < -1.85) = 0.0322 ------------------------------------------------------------------------- 2-The probability that the child spend more than 5 hours in the upright position 24-hour period P( X > 5) = P( > ) = P(Z > -0.31) = 1- P(Z < - 0.31) = 1- 0.3520= 0.648 ----------------------------------------------------------------------- 3-The probability that the child spend exactly 6.2 hours in the upright position 24-hour period P( X = 6.2) = 0  X 3.1 4.53 Mekele University: Biostatistics 125  X 3.1 4.55 
  • 126. 4-The probability that the child spend from 4.5 to 7.3 hours in the upright position 24-hour period P( 4.5 < X < 7.3) = P( < < ) = P( -0.69 < Z < 1.46 ) = P(Z<1.46) – P(Z< -0.69) = 0.9279 – 0.2451 = 0.6828  X 3.1 4.55.4  Mekele University: Biostatistics 126 3.1 4.53.7 
  • 128. • Key words: • Point estimate, interval estimate, estimator, Confident level ,α , Confident interval for mean μ, Confident interval for two means, Confident interval for population proportion P, Confident interval for two proportions Mekele University: Biostatistics 128
  • 129. • 6.1 Introduction: • Statistical inference is the procedure by which we reach to a conclusion about a population on the basis of the information contained in a sample drawn from that population. • Suppose that: • an administrator of a large hospital is interested in the mean age of patients admitted to his hospital during a given year. 1. It will be too expensive to go through the records of all patients admitted during that particular year. 2. He consequently elects to examine a sample of the records from which he can compute an estimate of the mean age of patients admitted to his that year. Mekele University: Biostatistics 129
  • 130. • To any parameter, we can compute two types of estimate: a point estimate and an interval estimate. • A point estimate is a single numerical value used to estimate the corresponding population parameter. • An interval estimate consists of two numerical values defining a range of values that, with a specified degree of confidence, we feel includes the parameter being estimated. • The Estimate and The Estimator: • The estimate is a single computed value, but the estimator is the rule that tell us how to compute this value, or estimate. • For example, • is an estimator of the population mean,. The single numerical value that results from evaluating this formula is called an estimate of the parameter . n x x i i  Mekele University: Biostatistics 130
  • 131. Confidence Interval for a Population Mean: (C.I) Suppose researchers wish to estimate the mean of some normally distributed population. • They draw a random sample of size n from the population and compute , which they use as a point estimate of . • Because random sampling involves chance, then can‟t be expected to be equal to . • The value of may be greater than or less than . • It would be much more meaningful to estimate  by an interval. x Mekele University: Biostatistics 131 x
  • 132. The 1- percent confidence interval (C.I.) for : • We want to find two values L and U between which  lies with high probability, i.e. P( L ≤  ≤ U ) = 1- Mekele University: Biostatistics 132
  • 133. For example: • When, •  = 0.01, then 1-  = •  = 0.05, then 1-  = •  = 0.05, then 1-  = Mekele University: Biostatistics 133
  • 136. We have the following cases a) When the population is normal 1) When the variance is known and the sample size is large or small, the C.I. has the form: P( - Z (1- /2) /n <  < + Z (1- /2) /n) = 1-  2) When variance is unknown, and the sample size is small, the C.I. has the form: P( - t (1- /2),n-1 s/n <  < + t (1- /2),n-1 s/n) = 1-  x x Mekele University: Biostatistics 136 xx
  • 137. b) When the population is not normal and n large (n>30) 1) When the variance is known the C.I. has the form: P( - Z (1- /2) /n <  < + Z (1- /2) /n) = 1-  2) When variance is unknown, the C.I. has the form: P( - Z (1- /2) s/n <  < + Z (1- /2) s/n) = 1-  x x Mekele University: Biostatistics 137 x x
  • 138. Example: • Suppose a researcher , interested in obtaining an estimate of the average level of some enzyme in a certain human population, takes a sample of 10 individuals, determines the level of the enzyme in each, and computes a sample mean of approximately Suppose further it is known that the variable of interest is approximately normally distributed with a variance of 45. We wish to estimate . (=0.05) 22x Mekele University: Biostatistics 138
  • 139. Solution: • 1- =0.95→ =0.05→ /2=0.025, • variance = σ2 = 45 → σ= 45,n=10 • 95%confidence interval for  is given by: P( - Z (1- /2) /n <  < + Z (1- /2) /n) = 1-  • Z (1- /2) = Z 0.975 = 1.96 (refer to table D) • Z 0.975(/n) =1.96 ( 45 / 10)=4.1578 22 ± 1.96 ( 45 / 10) → • (22-4.1578, 22+4.1578) → (17.84, 26.16) • Exercise example 6.2.2 page 169 22x x Mekele University: Biostatistics 139 x
  • 140. Example The activity values of a certain enzyme measured in normal gastric tissue of 35 patients with gastric carcinoma has a mean of 0.718 and a standard deviation of 0.511.We want to construct a 90 % confidence interval for the population mean. • Solution: • Note that the population is not normal, • n=35 (n>30) n is large and  is unknown ,s=0.511 • 1- =0.90→ =0.1 • → /2=0.05→ 1-/2=0.95, Mekele University: Biostatistics 140
  • 141. Then 90% confident interval for  is given by : P( - Z (1- /2) s/n <  < + Z (1- /2) s/n) = 1-  • Z (1- /2) = Z0.95 = 1.645 (refer to table D) • Z 0.95(s/n) =1.645 (0.511/ 35)=0.1421 0.718 ± 1.645 (0.511) / 35→ (0.718-0.1421, 0.718+0.1421) → (0.576,0.860). • Exercise example 6.2.3 page 164: xx Mekele University: Biostatistics 141
  • 142. Example6.3.1 Page 174: • Suppose a researcher , studied the effectiveness of early weight bearing and ankle therapies following acute repair of a ruptured Achilles tendon. One of the variables they measured following treatment the muscle strength. In 19 subjects, the mean of the strength was 250.8 with standard deviation of 130.9 we assume that the sample was taken from is approximately normally distributed population. Calculate 95% confident interval for the mean of the strength ? Mekele University: Biostatistics 142
  • 143. Solution: • 1- =0.95→ =0.05→ /2=0.025, • Standard deviation= S = 130.9 ,n=19 • 95%confidence interval for  is given by: P( - t (1- /2),n-1 s/n <  < + t (1- /2),n-1 s/n) = 1-  • t (1- /2),n-1 = t 0.975,18 = 2.1009 (refer to table E) • t 0.975,18(s/n) =2.1009 (130.9 / 19)=63.1 • 250.8 ± 2.1009 (130.9 / 19) → • (250.8- 63.1 , 22+63.1) → (187.7, 313.9) • Exercise 6.2.1 ,6.2.2 • 6.3.2 page 171 8.250x x Mekele University: Biostatistics 143 x
  • 144. 6.3 Confidence Interval for the difference between two Population Means: (C.I) If we draw two samples from two independent population and we want to get the confident interval for the difference between two population means , then we have the following cases : a) When the population is normal 1) When the variance is known and the sample sizes is large or small, the C.I. has the form: Mekele University: Biostatistics 144 2 2 2 1 2 1 2 1 2121 2 2 2 1 2 1 2 1 21 )()( nn Zxx nn Zxx      
  • 145. 2) When variances are unknown but equal, and the sample size is small, the C.I. has the form: 2 )1()1( 11 )( 11 )( 21 2 22 2 112 21 )2(, 2 1 2121 21 )2(, 2 1 21 2121      nn SnSn S where nn Stxx nn Stxx p p nn p nn   Mekele University: Biostatistics 145
  • 146. Example 6.4.1 P174: The researcher team interested in the difference between serum uric and acid level in a patient with and without Down‟s syndrome .In a large hospital for the treatment of the mentally retarded, a sample of 12 individual with Down‟s Syndrome yielded a mean of mg/100 ml. In a general hospital a sample of 15 normal individual of the same age and sex were found to have a mean value of If it is reasonable to assume that the two population of values are normally distributed with variances equal to 1 and 1.5,find the 95% C.I for μ1 - μ2 Solution: 1- =0.95→ =0.05→ /2=0.025 → Z (1- /2) = Z0.975 = 1.96 • 1.1 1.96(0.4282) = 1.1 0.84 = ( 0.26 , 1.94 ) 5.41 x 4.32 x Mekele University: Biostatistics 146 2 2 2 1 2 1 2 1 21 )( nn Zxx     15 5.1 12 1 96.1)4.35.4( 
  • 147. Example 6.4.1 P178: The purpose of the study was to determine the effectiveness of an integrated outpatient dual-diagnosis treatment program for mentally ill subject. The authors were addressing the problem of substance abuse issues among people with sever mental disorder. A retrospective chart review was carried out on 50 patient ,the recherché was interested in the number of inpatient treatment days for physics disorder during a year following the end of the program. Among 18 patient with schizophrenia, The mean number of treatment days was 4.7 with standard deviation of 9.3. For 10 subject with bipolar disorder, the mean number of treatment days was 8.8 with standard deviation of 11.5. We wish to construct 99% C.I for the difference between the means of the populations Represented by the two samples Mekele University: Biostatistics 147
  • 148. Solution : • 1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995 • n2 – 2 = 18 + 10 -2 = 26+n1 • t (1- /2),(n1+n2-2) = t0.995,26 = 2.7787, then 99% C.I for μ1 – μ2 • where • • then (4.7-8.8) 2.7787 √102.33 √(1/18)+(1/10) - 4.1 11.086 =( - 15.186 , 6.986) Exercises: 6.4.2 , 6.4.6, 6.4.7, 6.4.8 Page 180 Mekele University: Biostatistics 148 21 )2(, 2 1 21 11 )( 21 nn Stxx p nn    33.102 21018 )5.119()3.917( 2 )1()1( 22 21 2 22 2 112        xx nn SnSn Sp
  • 149. 6.5 Confidence Interval for a Population proportion (P): A sample is drawn from the population of interest ,then compute the sample proportion such as This sample proportion is used as the point estimator of the population proportion . A confident interval is obtained by the following formula Pˆ n a p  samplein theelementofno.Total isticcharachtarsomewithsamplein theelementofno. ˆ Mekele University: Biostatistics 149 n PP ZP )ˆ1(ˆ ˆ 2 1    
  • 150. Example 6.5.1 The Pew internet life project reported in 2003 that 18% of internet users have used the internet to search for information regarding experimental treatments or medicine . The sample consist of 1220 adult internet users, and information was collected from telephone interview. We wish to construct 98% C.I for the proportion of internet users who have search for information about experimental treatments or medicine Mekele University: Biostatistics 150
  • 151. Solution : 1-α =0.98 → α = 0.02 → α/2 =0.01 → 1- α/2 = 0.99 Z 1- α/2 = Z 0.99 =2.33 , n=1220, The 98% C. I is 0.18 0.0256 = ( 0.1544 , 0.2056 ) Exercises: 6.5.1 , 6.5.3 Page 187 18.0 100 18 ˆ p 1220 )18.01(18.0 33.218.0 )ˆ1(ˆ ˆ 2 1      n PP ZP  Mekele University: Biostatistics 151
  • 152. Confidence Interval for the difference between two Population proportions : Two samples is drawn from two independent population of interest ,then compute the sample proportion for each sample for the characteristic of interest. An unbiased point estimator for the difference between two population proportions A 100(1-α)% confident interval for P1 - P2 is given by 21 ˆˆ PP  Mekele University: Biostatistics 152 2 22 1 11 2 1 21 )ˆ1(ˆ)ˆ1(ˆ )ˆˆ( n PP n PP ZPP      
  • 153. Example Connor investigated gender differences in proactive and reactive aggression in a sample of 323 adults (68 female and 255 males ). In the sample ,31 of the female and 53 of the males were using internet in the internet café. We wish to construct 99 % confident interval for the difference between the proportions of adults go to internet café in the two sampled population . Mekele University: Biostatistics 153
  • 154. Solution : 1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995 Z 1- α/2 = Z 0.995 =2.58 , nF=68, nM=255, The 99% C. I is 0.2481 2.58(0.0655) = ( 0.07914 , 0.4171 ) 2078.0 255 53 ˆ,4559.0 68 31 ˆ  M M M F F F n a p n a p M MM F FF MF n PP n PP ZPP )ˆ1(ˆ)ˆ1(ˆ )ˆˆ( 2 1       Mekele University: Biostatistics 154 255 )2078.01(2078.0 68 )4559.01(4559.0 58.2)2078.04559.0(    
  • 156. Mekele University: Biostatistics 156 • Key words : • Null hypothesis H0, Alternative hypothesis HA , testing hypothesis , test statistic , P-value
  • 157. Mekele University: Biostatistics 157 Hypothesis Testing • One type of statistical inference, estimation, was discussed previously. • The other type ,hypothesis testing ,is discussed in this session.
  • 158. Mekele University: Biostatistics 158 Definition of a hypothesis • It is a statement about one or more populations . It is usually concerned with the parameters of the population. e.g. the hospital administrator may want to test the hypothesis that the average length of stay of patients admitted to the hospital is 5 days
  • 159. Mekele University: Biostatistics 159 Definition of Statistical hypotheses • They are hypotheses that are stated in such a way that they may be evaluated by appropriate statistical techniques. • There are two hypotheses involved in hypothesis testing • Null hypothesis H0: It is the hypothesis to be tested . • Alternative hypothesis HA : It is a statement of what we believe is true if our sample data cause us to reject the null hypothesis
  • 160. Mekele University: Biostatistics 160 Testing a hypothesis about the mean of a population: • We have the following steps: 1.Data: determine variable, sample size (n), sample mean( ) , population standard deviation or sample standard deviation (s) if is unknown 2. Assumptions : We have two cases: • Case1: Population is normally or approximately normally distributed with known or unknown variance (sample size n may be small or large), • Case 2: Population is not normal with known or unknown variance (n is large i.e. n≥30). x
  • 161. Mekele University: Biostatistics 161 • 3.Hypotheses: • we have three cases • Case I : H0: μ=μ0 HA: μ μ0 • e.g. we want to test that the population mean is different than 50 • Case II : H0: μ = μ0 HA: μ > μ0 • e.g. we want to test that the population mean is greater than 50 • Case III : H0: μ = μ0 HA: μ< μ0 • e.g. we want to test that the population mean is less than 50 
  • 162. Mekele University: Biostatistics 162 4.Test Statistic: • Case 1: population is normal or approximately normal σ2 is known σ2 is unknown ( n large or small) n large n small • Case2: If population is not normally distributed and n is large • i)If σ2 is known ii) If σ2 is unknown n X Z  o-  n s X Z o-   n s X T o-   n s X Z o-   n X Z  o- 
  • 163. Mekele University: Biostatistics 163 5.Decision Rule: i) If HA: μ μ0 • Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 (when use Z - test) Or Reject H 0 if T >t1-α/2,n-1 or T< - t1-α/2,n-1 (when use T- test) • ii) If HA: μ> μ0 • Reject H0 if Z>Z1-α (when use Z - test) Or Reject H0 if T>t1-α,n-1 (when use T - test) 
  • 164. Mekele University: Biostatistics 164 • iii) If HA: μ< μ0 Reject H0 if Z< - Z1-α (when use Z - test) • Or Reject H0 if T<- t1-α,n-1 (when use T - test) Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table t1-α/2 , t1-α , tα are tabulated values obtained from table E with (n-1) degree of freedom (df)
  • 165. Mekele University: Biostatistics 165 • 6.Decision : • If we reject H0, we can conclude that HA is true. • If ,however ,we do not reject H0, we may conclude that H0 is true.
  • 166. Mekele University: Biostatistics 166 An Alternative Decision Rule using the p - value Definition • The p-value is defined as the smallest value of α for which the null hypothesis can be rejected. • If the p-value is less than or equal to α ,we reject the null hypothesis (p ≤ α) • If the p-value is greater than α ,we do not reject the null hypothesis (p > α)
  • 167. Mekele University: Biostatistics 167 Example • Researchers are interested in the mean age of a certain population. • A random sample of 10 individuals drawn from the population of interest has a mean of 27. • Assuming that the population is approximately normally distributed with variance 20,can we conclude that the mean is different from 30 years ? (α=0.05) . • If the p - value is 0.0340 how can we use it in making a decision?
  • 168. Mekele University: Biostatistics 168 Solution 1-Data: variable is age, n=10, =27 ,σ2=20,α=0.05 2-Assumptions: the population is approximately normally distributed with variance 20 3-Hypotheses: • H0 : μ=30 • HA: μ 30 x 
  • 169. Mekele University: Biostatistics 169 4-Test Statistic: Z = -2.12 5.Decision Rule • The alternative hypothesis is • HA: μ > 30 • Hence we reject H0 if Z >Z1-0.025/2= Z0.975 • or Z< - Z1-0.025/2= - Z0.975 • Z0.975=1.96(from table )
  • 170. Mekele University: Biostatistics 170 • 6.Decision: • We reject H0 ,since -2.12 is in the rejection region . • We can conclude that μ is not equal to 30 • Using the p value ,we note that p-value =0.0340< 0.05,therefore we reject H0
  • 171. Mekele University: Biostatistics 171 Example • Referring to previous example.Suppose that the researchers have asked: Can we conclude that μ<30. 1.Data.see previous example 2. Assumptions .see previous example 3.Hypotheses: • H0 μ =30 • HA: μ < 30
  • 172. Mekele University: Biostatistics 172 4.Test Statistic : • = = -2.12 5. Decision Rule: Reject H0 if Z< Z α, where • Z α= -1.645. (from table) 6. Decision: Reject H0 ,thus we can conclude that the population mean is smaller than 30. n X Z  o-  10 20 3027 
  • 173. Mekele University: Biostatistics 173 Example • Among 157 African-American men ,the mean systolic blood pressure was 146 mm Hg with a standard deviation of 27. We wish to know if on the basis of these data, we may conclude that the mean systolic blood pressure for a population of African-American is greater than 140. Use α=0.01.
  • 174. Mekele University: Biostatistics 174 Solution 1. Data: Variable is systolic blood pressure, n=157 , x=146, s=27, α=0.01. 2. Assumption: population is not normal, σ2 is unknown 3. Hypotheses: H0 :μ=140 HA: μ>140 4.Test Statistic: = = = 2.78 n s X Z o-   157 27 140146  1548.2 6
  • 175. Mekele University: Biostatistics 175 5. Desicion Rule: we reject H0 if Z>Z1-α = Z0.99= 2.33 (from table D) 6. Desicion: We reject H0. Hence we may conclude that the mean systolic blood pressure for a population of African- American is greater than 140.
  • 176. Mekele University: Biostatistics 176 Hypothesis Testing :The Difference between two population mean : • We have the following steps: 1.Data: determine variable, sample size (n), sample means, population standard deviation or samples standard deviation (s) if is unknown for two population. 2. Assumptions : We have two cases: • Case1: Population is normally or approximately normally distributed with known or unknown variance (sample size n may be small or large), • Case 2: Population is not normal with known variances (n is large i.e. n≥30).
  • 177. Mekele University: Biostatistics 177 • 3.Hypotheses: • we have three cases • Case I : H0: μ 1 = μ2 → μ 1 - μ2 = 0 • HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0 • e.g. we want to test that the mean for first population is different from second population mean. • Case II : H0: μ 1 = μ2 → μ 1 - μ2 = 0 HA: μ 1 > μ 2 →μ 1 - μ 2 > 0 • e.g. we want to test that the mean for first population is greater than second population mean. • Case III : H0: μ 1 = μ2 → μ 1 - μ2 = 0 HA: μ 1 < μ 2 → μ 1 - μ 2 < 0 • e.g. we want to test that the mean for first population is greater than second population mean.
  • 178. Mekele University: Biostatistics 178 4.Test Statistic: • Case 1: Two population is normal or approximately normal σ2 is known σ2 is unknown if ( n1 ,n2 large or small) ( n1 ,n2 small) population population Variances Variances equal not equal where 2 2 2 1 2 1 2121 )(-)X-X( nn Z      21 2121 11 )(-)X-X( nn S T p     2 2 2 1 2 1 2121 )(-)X-X( n S n S T     2 )1(n)1(n 21 2 22 2 112    nn SS Sp
  • 179. Mekele University: Biostatistics 179 • Case2: If population is not normally distributed • and n1, n2 is large(n1 ≥ 0 ,n2≥ 0) • and population variances is known, 2 2 2 1 2 1 2121 )(-)X-X( nn Z     
  • 180. Mekele University: Biostatistics 180 5.Decision Rule: i) If HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0 • Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 (when use Z - test) Or Reject H 0 if T >t1-α/2 ,(n1+n2 -2) or T< - t1-α/2,,(n1+n2 -2) (when use T- test) • __________________________ • ii) HA: μ 1 > μ 2 → μ 1 - μ 2 > 0 • Reject H0 if Z>Z1-α (when use Z - test) Or Reject H0 if T>t1-α,(n1+n2 -2) (when use T - test)
  • 181. Mekele University: Biostatistics 181 • iii) If HA: μ 1 < μ 2 → μ 1 - μ 2 < 0 Reject H0 if Z< - Z1-α (when use Z - test) • Or Reject H0 if T<- t1-α, ,(n1+n2 -2) (when use T - test) Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table D t1-α/2 , t1-α , tα are tabulated values obtained from table E with (n1+n2 -2) degree of freedom (df) 6. Conclusion: reject or fail to reject H0
  • 182. Mekele University: Biostatistics 182 Example • Researchers wish to know if the data have collected provide sufficient evidence to indicate a difference in mean serum uric acid levels between normal individuals and individual with Down‟s syndrome. The data consist of serum uric reading on 12 individuals with Down‟s syndrome from normal distribution with variance 1 and 15 normal individuals from normal distribution with variance 1.5 . The mean are and α=0.05. Solution: 1. Data: Variable is serum uric acid levels, n1=12 , n2=15, σ2 1=1, σ2 2=1.5 ,α=0.05. 100/5.41 mgX  100/4.32 mgX 
  • 183. Mekele University: Biostatistics 183 2. Assumption: Two population are normal, σ2 1 , σ2 2 are known 3. Hypotheses: H0: μ 1 = μ2 → μ 1 - μ2 = 0 • HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0 4.Test Statistic: • = = 2.57 5. Desicion Rule: Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 Z1-α/2= Z1-0.05/2= Z0.975=1.96 (from table D) 6-Conclusion: Reject H0 since 2.57 > 1.96 Or if p-value =0.102→ reject H0 if p < α → then reject H0 2 2 2 1 2 1 2121 )(-)X-X( nn Z      15 5.1 12 1 )0(-3.4)-(4.5  
  • 184. Mekele University: Biostatistics 184 Example The purpose of a study by Tam, was to investigate wheelchair Maneuvering in individuals with over-level spinal cord injury (SCI) And healthy control (C). Subjects used a modified a wheelchair to incorporate a rigid seat surface to facilitate the specified experimental measurements. The data for measurements of the left ischial tuerosity for SCI and control C are shown below 16915011488117122131124115131C 14313011912113016318013015060SCI
  • 185. Mekele University: Biostatistics 185 We wish to know if we can conclude, on the basis of the above data that the mean of left ischial tuberosity for control C lower than mean of left ischial tuerosity for SCI, Assume normal populations equal variances. α=0.05, p-value = -1.33
  • 186. Mekele University: Biostatistics 186 Solution: 1. Data:, nC=10 , nSCI=10, SC=21.8, SSCI=133.1 ,α=0.05. • , (calculated from data) 2.Assumption: Two population are normal, σ2 1 , σ2 2 are unknown but equal 3. Hypotheses: H0: μ C = μ SCI → μ C - μ SCI = 0 HA: μ C < μ SCI → μ C - μ SCI < 0 4.Test Statistic: • Where, 1.126CX 1.133SCIX 569.0 10 1 10 1 04.756 0)1.1331.126( 11 )(-)X-X( 21 2121        nn S T p  04.756 21010 )3.32(9)8.21(9 2 )1(n)1(n 22 21 2 22 2 112        nn SS Sp
  • 187. Mekele University: Biostatistics 187 5. Decision Rule: Reject H 0 if T< - T1-α,(n1+n2 -2) T1-α,(n1+n2 -2) = T0.95,18 = 1.7341 (from table E) 6-Conclusion: Fail to reject H0 since -0.569 < - 1.7341 Or Fail to reject H0 since p = -1.33 > α =0.05
  • 188. Mekele University: Biostatistics 188 Example Dernellis and Panaretou examined subjects with hypertension and healthy control subjects .One of the variables of interest was the aortic stiffness index. Measures of this variable were calculated From the aortic diameter evaluated by M-mode and blood pressure measured by a sphygmomanometer. Physics wish to reduce aortic stiffness. In the 15 patients with hypertension (Group 1),the mean aortic stiffness index was 19.16 with a standard deviation of 5.29. In the30 control subjects (Group 2),the mean aortic stiffness index was 9.53 with a standard deviation of 2.69. We wish to determine if the two populations represented by these samples differ with respect to mean stiffness index .we wish to know if we can conclude that in general a person with thrombosis have on the average higher IgG levels than persons without thrombosis at α=0.01, p-value = 0.0559
  • 189. Mekele University: Biostatistics 189 Solution: 1. Data:, n1=53 , n2=54, S1= 44.89, S2= 34.85 α=0.01. 2.Assumption: Two population are not normal, σ2 1 , σ2 2 are unknown and sample size large 3. Hypotheses: H0: μ 1 = μ 2 → μ 1 - μ 2 = 0 HA: μ 1 > μ 2 → μ 1 - μ 2 > 0 4.Test Statistic: • standard deviationSample Size Mean LgG levelGroup 44.895359.01Thrombosis 34.855446.61No Thrombosis 59.1 54 85.34 53 89.44 0)61.4601.59()(-)X-X( 22 2 2 2 1 2 1 2121        n S n S Z 
  • 190. Mekele University: Biostatistics 190 5. Decision Rule: Reject H 0 if Z > Z1-α Z1-α = Z0.99 = 2.33 (from table D) 6-Conclusion: Fail to reject H0 since 1.59 > 2.33 Or Fail to reject H0 since p = 0.0559 > α =0.01
  • 191. Mekele University: Biostatistics 191 Hypothesis Testing A single population proportion: • Testing hypothesis about population proportion (P) is carried out in much the same way as for mean when condition is necessary for using normal curve are met • We have the following steps: 1.Data: sample size (n), sample proportion( ) , P0 2. Assumptions :normal distribution , pˆ n a p  samplein theelementofno.Total isticcharachtarsomewithsamplein theelementofno. ˆ
  • 192. Mekele University: Biostatistics 192 • 3.Hypotheses: • we have three cases • Case I : H0: P = P0 HA: P ≠ P0 • Case II : H0: P = P0 HA: P > P0 • Case III : H0: P = P0 HA: P < P0 4.Test Statistic: Where H0 is true ,is distributed approximately as the standard normal n qp pp Z 00 0 ˆ  
  • 193. Mekele University: Biostatistics 193 5.Decision Rule: i) If HA: P ≠ P0 • Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 • _______________________ • ii) If HA: P> P0 • Reject H0 if Z>Z1-α • _____________________________ • iii) If HA: P< P0 Reject H0 if Z< - Z1-α Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table D 6. Conclusion: reject or fail to reject H0
  • 194. Mekele University: Biostatistics 194 2. Assumptions : is approximately normally distributed 3.Hypotheses: • we have three cases • H0: P = 0.063 HA: P > 0.063 • 4.Test Statistic : 5.Decision Rule: Reject H0 if Z>Z1-α Where Z1-α = Z1-0.05 =Z0.95= 1.645 21.1 301 )0.937(063.0 063.008.0ˆ 00 0      n qp pp Z pˆ
  • 195. Mekele University: Biostatistics 195 6. Conclusion: Fail to reject H0 Since Z =1.21 > Z1-α=1.645 Or , If P-value = 0.1131, fail to reject H0 → P > α
  • 196. Mekele University: Biostatistics 196 Example Wagen collected data on a sample of 301 Hispanic women Living in Texas .One variable of interest was the percentage of subjects with impaired fasting glucose (IFG). In the study,24 women were classified in the (IFG) stage .The article cites population estimates for (IFG) among Hispanic women in Texas as 6.3 percent .Is there sufficient evidence to indicate that the population Hispanic women in Texas has a prevalence of IFG higher than 6.3 percent ,let α=0.05 Solution: 1.Data: n = 301, p0 = 6.3/100=0.063 ,a=24, q0 =1- p0 = 1- 0.063 =0.937, α=0.05 08.0 301 24 ˆ  n a p
  • 197. Mekele University: Biostatistics 197 Hypothesis Testing :The Difference between two population proportion: • Testing hypothesis about two population proportion (P1,, P2 ) is carried out in much the same way as for difference between two means when condition is necessary for using normal curve are met • We have the following steps: 1.Data: sample size (n1 n2), sample proportions( ), Characteristic in two samples (x1 , x2), 2- Assumption : Two populations are independent . 21 ˆ,ˆ PP 21 21 nn xx p   
  • 198. Mekele University: Biostatistics 198 • 3.Hypotheses: • we have three cases • Case I : H0: P1 = P2 → P1 - P2 = 0 HA: P1 ≠ P2 → P1 - P2 ≠ 0 • Case II : H0: P1 = P2 → P1 - P2 = 0 HA: P1 > P2 → P1 - P2 > 0 • Case III : H0: P1 = P2 → P1 - P2 = 0 HA: P1 < P2 → P1 - P2 < 0 4.Test Statistic: Where H0 is true ,is distributed approximately as the standard normal 21 2121 )1()1( )()ˆˆ( n pp n pp pppp Z     
  • 199. Mekele University: Biostatistics 199 5.Decision Rule: i) If HA: P1 ≠ P2 • Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 • _______________________ • ii) If HA: P1 > P2 • Reject H0 if Z >Z1-α • _____________________________ • iii) If HA: P1 < P2 • Reject H0 if Z< - Z1-α Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table D 6. Conclusion: reject or fail to reject H0
  • 200. Mekele University: Biostatistics 200 Example Noonan is a genetic condition that can affect the heart growth, blood clotting and mental and physical development. Noonan examined the stature of men and women with Noonan. The study contained 29 Male and 44 female adults. One of the cut-off values used to assess stature was the third percentile of adult height .Eleven of the males fell below the third percentile of adult male height ,while 24 of the female fell below the third percentile of female adult height .Does this study provide sufficient evidence for us to conclude that among subjects with Noonan ,females are more likely than males to fall below the respective of adult height? Let α=0.05 Solution: 1.Data: n M = 29, n F = 44 , x M= 11 , x F= 24, α=0.05 479.0 4429 2411        FM FM nn xx p 545.0 44 24 ˆ,379.0 29 11 ˆ  F F F M m M n x p n x p
  • 201. Mekele University: Biostatistics 201 2- Assumption : Two populations are independent . 3.Hypotheses: • Case II : H0: PF = PM → PF - PM = 0 HA: PF > PM → PF - PM > 0 • 4.Test Statistic: 5.Decision Rule: Reject H0 if Z >Z1-α , Where Z1-α = Z1-0.05 =Z0.95= 1.645 6. Conclusion: Fail to reject H0 Since Z =1.39 > Z1-α=1.645 Or , If P-value = 0.0823 → fail to reject H0 → P > α 39.1 29 )521.0)(479.0( 44 )521.0)(479.0( 0)379.0545.0( )1()1( )()ˆˆ( 21 2121          n pp n pp pppp Z
  • 202. An Introduction to the Chi-Square Distribution
  • 203. Mekele University: Biostatistics 203 TESTS OF INDEPENDENCE • To test whether two criteria of classification are independent . For example socioeconomic status and area of residence of people in a city are independent. • We divide our sample according to status, low, medium and high incomes etc. and the same samples is categorized according to urban, rural or suburban and slums etc. • Put the first criterion in columns equal in number to classification of 1st criteria ( Socioeconomic status) and the 2nd in rows, where the no. of rows equal to the no. of categories of 2nd criteria (areas of cities).
  • 204. Mekele University: Biostatistics 204 The Contingency Table • Table Two-Way Classification of sample First Criterion of Classification → Second Criterion ↓ 1 2 3 ….. c Total 1 2 3 . . r N11 N21 N31 . . Nr1 N12 N22 N32 . . Nr2 N13 N 23 N33 . . Nr3 …… …… …... …… N1c N2c N3c . . N rc N1. N2. N3. . . Nr. Total N.1 N.2 N.3 …… N.c N
  • 205. Mekele University: Biostatistics 205 Observed versus Expected Frequencies • Oi j : The frequencies in ith row and jth column given in any contingency table are called observed frequencies that result form the cross classification according to the two classifications. • ei j :Expected frequencies on the assumption of independence of two criterion are calculated by multiplying the marginal totals of any cell and then dividing by total frequency • Formula: N NN e ji ij )((  
  • 206. Mekele University: Biostatistics 206 Chi-square Test • After the calculations of expected frequency, Prepare a table for expected frequencies and use Chi-square Where summation is for all values of r xc = k cells. • D.F.: the degrees of freedom for using the table are (r-1)(c-1) for α level of significance • Note that the test is always one-sided.     k i e eo i ii 1 2 ] )( [ 2 
  • 207. Mekele University: Biostatistics 207 Example The researcher are interested to determine that preconception use of folic acid and race are independent. The data is: Observed Frequencies Table Expected frequencies Table Use of Folic Acid total Yes No White Black Other 260 15 7 299 41 14 559 56 21 Total 282 354 636 Yes no Total White Black Other s (282)(559)/636 =247.86 (282)(56)/636 =24.83 (282)((21) =9.31 (354)(559)/63 6 =311.14 (354)(559) = 31.17 21x354/636 =11.69 559 56 21
  • 208. Mekele University: Biostatistics 208 Calculations and Testing 091.969.11/..... 14.311/86.247/ )69.1114( )14.311299()86.247260( 2 222     • Data: See the given table • Assumption: Simple random sample • Hypothesis: H0: race and use of folic acid are independent HA: the two variables are not independent. Let α = 0.05 • The test statistic is Chi Square given earlier • Distribution when H0 is true chi-square is valid with (r-1)(c-1) = (3- 1)(2-1)= 2 d.f. • Decision Rule: Reject H0 if value of is greater than = 5.991 • Calculations:  2  2 )1)(1(,  cr
  • 209. Mekele University: Biostatistics 209 Conclusion • Statistical decision. We reject H0 since 9.08960> 5.991 • Conclusion: we conclude that H0 is false, and that there is a relationship between race and preconception use of folic acid. • P value. Since 7.378< 9.08960< 9.210, 0.01<p <0.025 • We also reject the hypothesis at 0.025 level of significance but do not reject it at 0.01 level.