1. Medical StatisticsMedical Statistics
Prof.Amany Rashad Aboel-SeoudProf.Amany Rashad Aboel-Seoud
Community Medicine DepartmentCommunity Medicine Department
Zagazig University,EGYPTZagazig University,EGYPT
2. Objectives:Objectives:
•Define statistics, describe its uses inDefine statistics, describe its uses in
community medicinecommunity medicine
•Summarize data in tables and graphsSummarize data in tables and graphs
•Calculate measures of central tendency andCalculate measures of central tendency and
measures of dispersion.measures of dispersion.
•Describe normal distribution curve and itsDescribe normal distribution curve and its
usesuses
3. Definition of statisticsDefinition of statistics
• It is the science and art of dealing withIt is the science and art of dealing with
numbers.numbers.
• Used for collection, summarization,Used for collection, summarization,
presentation and analysis of data to getpresentation and analysis of data to get
information based on objective basis (un-information based on objective basis (un-
biased).biased).
4. Uses of statistics
• Descriptive information for any population
• Prioritization of problems
• Prove association between variables
• Prove relation between risk and disease
• Compare new rates with old ones
• Compare local results with foreign ones
• Evaluate health programs & services
7. Types of dataTypes of data
Any collected observation will be eitherAny collected observation will be either
Quantitative (numbers) either:Quantitative (numbers) either:
1)Discrete (no fraction): as number of1)Discrete (no fraction): as number of
patients, hospital beds, RBCspatients, hospital beds, RBCs
2)Continuous (with fraction): as hormone level,2)Continuous (with fraction): as hormone level,
temperature, blood pressure, agetemperature, blood pressure, age
Qualitative (names or order)Qualitative (names or order)
* categorical : black/white, male/female, yes/no* categorical : black/white, male/female, yes/no
* ordinal : grade of tumor, SE standard* ordinal : grade of tumor, SE standard
8. Presentation of data
In tables
In graphs
They must have title, self explanatory
clear, fully labeled, not complicated.
Summarization of data by few numbers
as average, percentile, variance.
9. Table (1):Percentage distribution of the studied group in
relation to sex
Observation
(variable)
(parameter) Sex
Number
examined
(No.)
Frequency
(%)
Male
Female
20
30
40
60
Total 50 100
10. Table ( ): Percentage distribution of the studied group inTable ( ): Percentage distribution of the studied group in
relation to sex and agerelation to sex and age
Age/yearAge/year malesmales femalesfemales
No.No. %% No.No. %%
<10<10
10 –10 –
20 –20 –
30 –30 –
40 –40 –
50 +50 +
TotalTotal
88
22
55
1212
1313
1010
5050
1616
44
1010
2424
2626
2020
100100
66
55
44
33
44
33
2525
2424
2020
1616
1212
1616
1212
100100
11. Figure( ): Percentage of the studied group inFigure( ): Percentage of the studied group in
relation to age & sexrelation to age & sex
0
5
10
15
20
25
30
<10 10 20 30 40 50
males
females
%
Age/year
13. Fig.( ):Body temperature of 3 patients
4hours after polio vaccination
33
34
35
36
37
38
39
40
41
42
1 2 3 4
mohamed
ahmed
mostafa
hours
Temp.
14. Fig.(4): Relation between age and height for theFig.(4): Relation between age and height for the
studied groupstudied group
0
20
40
60
80
100
120
0 2 4 6 8 10
age/year
height/cm
15. Fig.(8): frequency distribution of cases in relation toFig.(8): frequency distribution of cases in relation to
weightweight
0
10
20
30
40
50
60
70
80
90frequencyofcases
30 40 50 60
weight/kg
16. Fig.(5): Comparison between countries in relationFig.(5): Comparison between countries in relation
to socio-economic standardsto socio-economic standards
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
%
England Germany Egypt Pakistan
low high
17. Data summarizationData summarization
• PercentilePercentile
• QuartileQuartile
• Measures of central tendencyMeasures of central tendency
** mean **median **mode** mean **median **mode
• Measures of dispersionMeasures of dispersion
**range **variance **SD**range **variance **SD
** SE **co-efficient of variation** SE **co-efficient of variation
18. PercentilePercentile
• InIn Ordered dataOrdered data
• 4- 6- 7- 8- 9- 10- 12- 13- 14- 20- 224- 6- 7- 8- 9- 10- 12- 13- 14- 20- 22
• 0-10-20-30-40-50- 60- 70- 80- 90-1000-10-20-30-40-50- 60- 70- 80- 90-100
• 10 is the 5010 is the 50thth
percentile of these datapercentile of these data
• 6 is the 106 is the 10thth
percentilepercentile
• 9 is the 409 is the 40thth
percentilepercentile
• We usually use the 25We usually use the 25thth
, 50, 50thth
and the 75and the 75thth
percentiles for data summarization=quartile.percentiles for data summarization=quartile.
19. 1-1-The arithmetic mean:The arithmetic mean:
• Which is the sum of observation divided by the number ofWhich is the sum of observation divided by the number of
observations:observations:
• x = ∑ xx = ∑ x
• nn
• Where : x = meanWhere : x = mean
• ∑∑ denotes the (sum of)denotes the (sum of)
• x the values of observationx the values of observation
• n the number of observationn the number of observation
• Example: In a study the age of 5 students were: 12 , 15, 10, 17, 13Example: In a study the age of 5 students were: 12 , 15, 10, 17, 13
• Mean = sum of observations / number of observationsMean = sum of observations / number of observations
• Then the mean X = (12 + 15 + 10 + 17 + 13) / 5 =13.4 yearsThen the mean X = (12 + 15 + 10 + 17 + 13) / 5 =13.4 years
20. 2- Median:2- Median:
It is the middle observation in a series of observation after arranging them in anIt is the middle observation in a series of observation after arranging them in an
ascending or descending manner.ascending or descending manner.
• The rank of median is (n + 1)/2 if the number of observation is oddThe rank of median is (n + 1)/2 if the number of observation is odd
and n/2 if the number is even (n = number of observation).and n/2 if the number is even (n = number of observation).
If number of observation is odd, the median will be calculated as follow:If number of observation is odd, the median will be calculated as follow:
Calculate the median of the following data 5, 6, 8, 9, 11 n = 5Calculate the median of the following data 5, 6, 8, 9, 11 n = 5
The rank of the median = n + 1 / 2 i.e. (5+ 1)/ 2 = 3 . The median is the third value inThe rank of the median = n + 1 / 2 i.e. (5+ 1)/ 2 = 3 . The median is the third value in
these groups when data are arranged in ascending (or descending) manner.these groups when data are arranged in ascending (or descending) manner.
So the median is 8 (the third value)So the median is 8 (the third value)
• If the number of observation is even, the median will be calculated as follows:If the number of observation is even, the median will be calculated as follows:
e.g. 5, 6, 8, 9e.g. 5, 6, 8, 9 n = 4n = 4
• - The rank of median = n / 2 i.e. 4 / 2 = 2 .The median is the second value of- The rank of median = n / 2 i.e. 4 / 2 = 2 .The median is the second value of
that group. If data are arranged ascendingly then the median will be 6 and ifthat group. If data are arranged ascendingly then the median will be 6 and if
arranged descendingly the median will be 8 therefore the median will be thearranged descendingly the median will be 8 therefore the median will be the
mean of both observations i.e. (6 + 8)/2 =7. For simplicity we can apply themean of both observations i.e. (6 + 8)/2 =7. For simplicity we can apply the
same equation used for odd numbers i.e. n + 1 / 2. The median rank will be 4 +same equation used for odd numbers i.e. n + 1 / 2. The median rank will be 4 +
1 /2 = 2 ½ i.e. the median will be the second and the third values i.e. 6 and 8,1 /2 = 2 ½ i.e. the median will be the second and the third values i.e. 6 and 8,
take their mean = 7.take their mean = 7.
21. 3- Mode3- Mode
• The most frequent occurring value in the data isThe most frequent occurring value in the data is
the mode and is calculated as follows:the mode and is calculated as follows:
• Example: 5, 6, 7, 5, 10. The mode in this data isExample: 5, 6, 7, 5, 10. The mode in this data is
5 since number 5 is repeated twice. Sometimes,5 since number 5 is repeated twice. Sometimes,
there is more than one mode and sometimesthere is more than one mode and sometimes
there is no mode especially in small set ofthere is no mode especially in small set of
observations. Example : 20 , 18 , 14, 20, 13, 14,observations. Example : 20 , 18 , 14, 20, 13, 14,
30, 19. There are two modes 14 and 20.30, 19. There are two modes 14 and 20.
• Example : 300, 280 , 130, 125 , 240 , 270 . HasExample : 300, 280 , 130, 125 , 240 , 270 . Has
no mode.no mode.
22. 2- Measures of dispersion2- Measures of dispersion
• The measure of dispersion describes theThe measure of dispersion describes the
degree of variations or scatter or dispersion of the datadegree of variations or scatter or dispersion of the data
around its central values(dispersion = variation = spreadaround its central values(dispersion = variation = spread
= scatter).= scatter).
• Range:Range:
• It is the difference between the largest and smallestIt is the difference between the largest and smallest
values. It is the simplest measure of variation. Itsvalues. It is the simplest measure of variation. Its
disadvantages is that, it is based only on two of thedisadvantages is that, it is based only on two of the
observations and gives no idea of how the otherobservations and gives no idea of how the other
observations are arranged between these two. Also, itobservations are arranged between these two. Also, it
tends to be large when the size of the sample increases.tends to be large when the size of the sample increases.
23. Variance:Variance:
• If we want to get the average of differencesIf we want to get the average of differences
between the mean and each observation in the data,between the mean and each observation in the data,
we have to deduce each value from the meanwe have to deduce each value from the mean
and then sum these differences and divide it by theand then sum these differences and divide it by the
number of observation.number of observation.
• i.e. Variance V = ∑ (mean – x) / ni.e. Variance V = ∑ (mean – x) / n
The value of this equation will be equal to zeroThe value of this equation will be equal to zero
because the differences between each value and thebecause the differences between each value and the
mean will have negative and positive signs that willmean will have negative and positive signs that will
equalize zero on algebraic summation.equalize zero on algebraic summation.
• Therefore to overcome this zero we square theTherefore to overcome this zero we square the
difference between the mean and each value so thedifference between the mean and each value so the
sign will be always positive. Thus we get:sign will be always positive. Thus we get:
• V = ∑ (mean – x)2 / n - 1V = ∑ (mean – x)2 / n - 1
24. 3- Standard deviation3- Standard deviation
• ::
• The main disadvantage of theThe main disadvantage of the
variance is that it is the square of the unitsvariance is that it is the square of the units
used. So, it is more convenient to expressused. So, it is more convenient to express
the variation in the original units by takingthe variation in the original units by taking
the square root of the variance. This isthe square root of the variance. This is
called the standard deviation (SD).called the standard deviation (SD).
Therefore SD = √ VTherefore SD = √ V
• i.e. SD =i.e. SD = √ ∑ (mean – x)2 / n - 1√ ∑ (mean – x)2 / n - 1
25. 4- Coefficient of variation:4- Coefficient of variation:
• The coefficient of variation expresses the standard deviation asThe coefficient of variation expresses the standard deviation as
a percentage of the sample mean.a percentage of the sample mean.
• C. V = SD / meanC. V = SD / mean ** 100100
• C.V is useful when, we are interested in the relative size of theC.V is useful when, we are interested in the relative size of the
variability in the data.variability in the data.
• Example : if we have observations 5, 7, 10, 12 and 16. Their meanExample : if we have observations 5, 7, 10, 12 and 16. Their mean
will be 50/5=10. SD = √ (25+9 +0 + 4 + 36 ) / (5-1) = √ 74 / 4 = 4.3will be 50/5=10. SD = √ (25+9 +0 + 4 + 36 ) / (5-1) = √ 74 / 4 = 4.3
• C.V. = 4.3 / 10 x 100 = 43%C.V. = 4.3 / 10 x 100 = 43%
• Another observations are 2, 2, 5, 10, and 11. Their mean = 30 / 5 =Another observations are 2, 2, 5, 10, and 11. Their mean = 30 / 5 =
66
• SD = √ (16 + 16 + 1 + 16 + 25)/(5 –1) = √ 74 / 4 = 4.3SD = √ (16 + 16 + 1 + 16 + 25)/(5 –1) = √ 74 / 4 = 4.3
• C.V = 4.3 /6 x 100 = 71.6 %C.V = 4.3 /6 x 100 = 71.6 %
• Both observations have the same SD but they are different in C.V.Both observations have the same SD but they are different in C.V.
because data in the first group is homogenous (so C.V. is not high),because data in the first group is homogenous (so C.V. is not high),
while data in the second observations is heterogenous (so C.V. iswhile data in the second observations is heterogenous (so C.V. is
high).high).
26. ExampleExample
• Summarize the following hemoglobin readings:Summarize the following hemoglobin readings:
9,8,10,9,7,11,12.9,8,10,9,7,11,12.
• Mean=9+8+10+9+7+11+12=66/7=9.4Mean=9+8+10+9+7+11+12=66/7=9.4
• Median=n+1/2 = 7+1 /2=4Median=n+1/2 = 7+1 /2=4thth
• 7,8,9,9,10,11,12 median=97,8,9,9,10,11,12 median=9
• Mode=9Mode=9
• Range=12 – 7=5Range=12 – 7=5
• Variance=(9.4-9)2 +(9.4-8)2 +(9.4-10)2 ...Variance=(9.4-9)2 +(9.4-8)2 +(9.4-10)2 ...
• /7-1=2.92/7-1=2.92
• SD=square root of variance(2.92)=1.71SD=square root of variance(2.92)=1.71
• SE = SD/sq.root of 7=1.71/2.64=0.64SE = SD/sq.root of 7=1.71/2.64=0.64
• CV = SD / 9.4 x 100 =1.71/9.4=18.2%CV = SD / 9.4 x 100 =1.71/9.4=18.2%
28. Normal distribution curveNormal distribution curve
• Mean, median and mode coincideMean, median and mode coincide
• Bell shape, symmetricalBell shape, symmetrical
• Never endsNever ends
• 68% of population lies between mean + SD68% of population lies between mean + SD
• 95% of population lies between mean +2SD95% of population lies between mean +2SD
• 99% lies between mean+3SD99% lies between mean+3SD