SlideShare una empresa de Scribd logo
1 de 42
MCQ Test Item Analysis

Presented by:
                          Dr. Soha Rashed
                    Prof. of Community Medicine



       Executive Director of Medical Education Department
                Alexandria Faculty of Medicine, Egypt


                           10 March 2013
Content outlines
 Why are we here (Purpose of this session)?
 What’s next (Needed future tasks)?
 Key Features of Student Assessment Methods:
  ⁻ Content and construct Validity
  ⁻ Reliability
  ⁻ Objectivity
 MCQ Test Item Analysis:
  ⁻ Difficulty index (p-value)
  ⁻ Discrimination index (DI)=Point-Biserial correlation (PBS)
  ⁻ Distractor efficiency (DE)
  ⁻ Internal Consistency Reliability
  ⁻ Writing a technical report (including remedial actions &
   recommendations)
 MCQs evaluation checklist
 Why are we here (Purpose of this session)?
 What’s next (Needed future tasks)?
What do we assess?
Achievement of course ILOs:
   Knowledge
   Skills
   Attitudes
ILOs: 5 DOMAINS
1.   Knowledge (Recall) and Understanding
2.   Intellectual Skills
3.   Professional Skills (Practical, Procedural and
     Clinical)
4.   General and Transferable Skills
5.   Professional Attitudes and Ethics
Problem
solving   Proble
          m
          solving
Written exams
Objective    written exams:    MCQ,
 Matching, Extended matching, TF, and
 Short answer Qs.

Essay Qs.(Long, short, modified   essay
 Qs)
Key Features of Student
                Assessment Methods
Quality standards
 Validity: The ability of the test to measure what it is supposed to
    measure.
   Reliability: The consistency of the test scores over time, under
    different testing conditions, and with different raters.
   Objectivity: The degree by which examiners agree to the correct
    answer (Q is scored accurately and fairly, free of examiners’ bias)
   Practicability/Feasibility:     Overall      ease      of construction,
    administration, scoring, and reporting of an assessment instrument.
   Acceptability: the responsiveness of faculty and students to the
    assessment.
   Value/Educational impact: The utility of the test results in producing
    meaningful conclusions (usable information) about the educational
    process.
Validity
Validity refers to the extent to which an
assessment instrument or a test measures what
it intends to measure.


 Content validity
 Construct validity
I. Content validity
Content validity ensures that knowledge and skills
covered by the test items are representative of
the larger domain of knowledge and skills covered
in the course.
Test blueprint
                        Learning Objectives to be tested
          Recall Unders Applica               Problem solving         Total     %
Content/ of facts tanding tion                                                weight
 subject
                                       Analysi    Synthe    Evaluat
   area
                                         s          sis       ion
  ….       3 items 3 items     --        --         --          --     6       6%

  ….       2 items 4 items 2 items                2 items              10      10%
  ….       4 items 3 items 4 items               4 items               15      15%

  ….       5 items 4 items 4 items               4 items               17      17%
  ….       4 items     10    8 items             8 items               30      30%
                     items

  ….       3 items 7 items 5 items                7 items              22      22%

 Total       21       31       23                   25                100
% weight    21%      31%      23%                  25%                        100%
II. Construct validity

This refers to the COMPATIBILITY/
CONGRUENCE between the learning
objective (LO) to be assessed and the type
of assessment.

In other words, construct validity emphasizes
that assessment techniques should be based
on the nature of the LOs that they are
supposed to measure.
Construct validity
    Learning objective to be           Assessment instrument
           assessed
Knowledge & understanding       MCQ, TF, Matching, SAQ, Complete
                                Short essay Q
                                Long essay Q
                                Oral exam
Application & problem solving   Clinical scenario-based MCQ
                                Extended matching Q
                                Modified essay Q
                                Case study (Patient management
                                problem)
                                Oral exam
Practical skills                OSPE

Clinical skills                 OSCE (real or simulated patients)
                                Short Case
                                Long Case

Procedural skills               OSCE (Anatomical models)
To increase the test validity :
 Use of the test blue print
 Focus on the important content areas
 Sample widely across the domains and across the
  content area (% wt)
 To increase construct validity: Use items that have
  high discriminative value (those testing higher
  cognitive/thinking abilities such as comprehension,
  application and problem solving.
   e.g., applied Qs- Clinical scenario-based Qs)
 Use multiple methods to have a valid comprehensive
  assessment
Reliability
 Refers to consistency or repeatability of test
  scores.
 In practice, a reliable assessment should
  yield the same result:
  - When given to the same student at two
    different times (Test-retest reliability ), or
  - By different examiners (Inter-rater reliability),
  - While keeping all the other variables (timing,
    length, content or other contextual features) as
    consistent as possible.
- Internal consistency (intra-exam, inter-item
 reliability): Coherence of the test items, or the
 extent to which the test questions are interrelated.

                Cronbach’s alpha
MCQs are highly reliable
The results of the test are unlikely to be
influenced by:
   when the test is administered,
   when the test is scored, or by
   who does the scoring.


Hence the term “objective” is often used
when referring to these kinds of
assessments.
On the other hand, reliability is an important
concern when grading essay questions, rating
clinical skills or scoring other assessments
requiring judgment or interpretation.

In these situations, clear scoring criteria are
needed to attain a high level of reliability, regardless
of whether one or multiple people will be involved in
grading the responses.
How to improve reliability of the
           test items?
 Writing clear unambiguous questions and test
 instructions improve reliability by generating
 consistent patterns of response from the students.

 Use of structured predefined marking scheme: An
 answer key for MCQs and essay Qs, standardized
 checklists (in OSCEs/OSPEs) with clear scoring
 criteria.

 A longer test with multiple items is more likely to
 have better reliability than a shorter test with a limited
 number of items as the former 'evens out' possible
 inconsistencies of individual items.
Desirable Features of Valid and
        Reliable Assessments
 There is a clearly specified set of learning outcomes.
 Assessment tasks are matched to the stated learning
  outcomes.
 Assessment tasks are a representative sample of the
  stated learning outcomes.
 Assessment tasks are the appropriate level of
  difficulty.
 Assessment   tasks effectively distinguish
 (discriminate) between achievers and non-
 achievers.

 Clear   instructions are given for the
 administration, scoring, and interpretation of the
 assessment results.
MCQ test item analysis
Remark Classic OMR
(Optical Mark Recognition) software
Parameters commonly assessed in
        MCQ test item analysis
 Item analysis:
   Difficulty index (p-value)
   Discrimination index (DI)=Point-Biserial correlation
    (PBS)
   Distractor efficiency (DE)


 Internal Consistency Reliability
Do final grades attained by students actually reflect
               their competences??
 Do they produce meaningful conclusions about
               their performance??
Difficulty and Discrimination
Indices
Difficulty Index (p-value)
 Calculated as the percentage of students that correctly answered the item.
 The range is from 0% to 100%, or more typically written as a proportion as 0.0
  to 1.00 (p-value).
 The higher the value, the easier the item:


 Difficulty level
      d ≥75% = very easy
      d ≥ 70% = easy
      d 30-70% = moderately difficult to moderately easy (Recommended)
      d <30 % = difficult
      d <25% = very difficult

 P-values above 0.90 are very easy items and should not be reused again for
  subsequent tests. If almost all of the students can get the item correct, it is a
  concept probably not worth testing.

 P-values below 0.20 are very difficult items and should be reviewed for
  possible confusing language, removed from subsequent tests, and/or
  highlighted for an area for re-instruction. If almost all of the students get the
  item wrong there is either a problem with the item or students did not get the
  concept.
Discrimination index (DI)=
        Point-Biserial correlation (PBS)
 It describes the ability of an item to distinguish between high and
  low scorers (scores of upper and lower 27% of students after being
  ordered descendingly).

 The range is from 0.0 to 1.00.

 The higher the value, the more discriminating the item. A highly
  discriminating item indicates that the students who had high tests
  scores got the item correct whereas students who had low test
  scores got the item incorrect.

 Items with discrimination values near or less than zero should be
  removed from the test. This indicates that students who overall did
  poorly on the test did better on that item than students who overall
  did well. The item may be confusing for your better scoring students
  in some way.
Interpreting discrimination index
 0.40 or higher = very good discrimination

 0.30 to 0.39 = reasonably good discrimination but possibly subject
  to improvement

 0.20 to 0.29 = Marginal/acceptable discrimination (subject to
  improvement)

 0.00 to 0.19 = poor discrimination (to be rejected or improved by
  revision)

 Negative DI = Low performing students selected the correct
  answer more often than high scorers (to be rejected)
   Use items that have high discrimination values in the test (those
    testing higher cognitive/thinking abilities such as comprehension,
    application and problem solving)

   Linking questions to case scenarios. Asking the question in the
    context of a clinical situation, diagram, graph, image, radiologic
    image, histo-pathological section, laboratory findings, etc.
Distractor efficiency
 The distractors are important components of an item, as they show a
  relationship between the total test score and the distractor chosen by
  the student.

 Distractor efficiency is one such tool that tells whether the item was
  well constructed or failed to perform its purpose.

 The quality of the distractors influences student performance on a
  test item. Ideally, low-scoring students, who have not mastered the
  subject, should choose the distractors more often, whereas, high
  scorers should discard them more frequently while choosing the
  correct option.

 Any distractor that has been selected by less than 5% of the students
  is considered to be a non-functioning distractor (NF-D).

 Reviewing the options can reveal potential errors of judgment and
  inadequate performance of distractors. These poor distractors can be
  revised, replaced, or removed.
Internal Consistency Reliability

 Internal consistency reliability indicates how well the
  items are correlated with one another. It measures
  whether multiple items within an instrument reveal
  similar results.

 Cronbach's Alpha is used as a coefficient of internal
  consistency.

Interpreting Cronbach's Alpha:
 The range is from 0.0 to 1.0, with 0.7 generally accepted
  as a sign of acceptable reliability.
 High reliability indicates that the items are all measuring
  the same thing, or general construct
 The higher the value, the more reliable the overall test
  score.
Interpreting Cronbach's Alpha
Cronbach's
                        Internal consistency
   alpha
   α ≥ 0.9       Excellent
 0.8 ≤ α < 0.9   Very good
                 Good (There are probably a few items
 0.7 ≤ α < 0.8
                 which could be improved).
                  Somewhat low (There are probably some
 0.6 ≤ α < 0.7
                 items which could be improved.
 0.5 ≤ α < 0.6   Poor (Suggests need for revision of test).
                 Questionable/Unacceptable (This test
   α < 0.5       should not contribute heavily to the course
                 grade, and it needs revision).
Practice exercises
 Interpreting Remark Classic OMR (Optical Mark Recognition)
  software outputs

 Writing a technical report on MCQ test item analysis (including
  remedial actions & recommendations)

 Use of MCQs evaluation checklist
MCQ test item analysis

Más contenido relacionado

La actualidad más candente

Scoring and grading ppt
Scoring and grading pptScoring and grading ppt
Scoring and grading pptM Shoaib GH
 
Qualities of Good Test (Usability, Reliability, & Validity)
Qualities of Good Test (Usability, Reliability, & Validity)Qualities of Good Test (Usability, Reliability, & Validity)
Qualities of Good Test (Usability, Reliability, & Validity)HennaAnsari
 
Week 7 Rubrics And Rating Scales
Week 7 Rubrics And Rating ScalesWeek 7 Rubrics And Rating Scales
Week 7 Rubrics And Rating ScalesIPT652
 
Subjective vs Objective test
Subjective vs Objective testSubjective vs Objective test
Subjective vs Objective testSùng A Tô
 
Norm Referenced and Criterion Referenced
Norm Referenced and Criterion ReferencedNorm Referenced and Criterion Referenced
Norm Referenced and Criterion ReferencedDr. Amjad Ali Arain
 
Characteristics of a Good Test
Characteristics of a Good TestCharacteristics of a Good Test
Characteristics of a Good TestAjab Ali Lashari
 
ITEM ANALYSIS
ITEM ANALYSISITEM ANALYSIS
ITEM ANALYSISMEF Ramos
 
Test item formats: definition, types, pros and cons
Test item formats: definition, types, pros and consTest item formats: definition, types, pros and cons
Test item formats: definition, types, pros and consMohamed Benhima
 
Validity of test
Validity of testValidity of test
Validity of testSarat Rout
 
Test Reliability and Validity
Test Reliability and ValidityTest Reliability and Validity
Test Reliability and ValidityBrian Ebie
 
Constructing test Items
Constructing test ItemsConstructing test Items
Constructing test ItemsDEBABRATA GIRI
 
Norm referenced and criterion-referenced evaluation
Norm referenced and criterion-referenced evaluationNorm referenced and criterion-referenced evaluation
Norm referenced and criterion-referenced evaluationAsifEqbal15
 

La actualidad más candente (20)

Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Scoring and grading ppt
Scoring and grading pptScoring and grading ppt
Scoring and grading ppt
 
Test construction
Test constructionTest construction
Test construction
 
Week 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and ReliabilityWeek 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and Reliability
 
Test and Assessment Types
Test and Assessment TypesTest and Assessment Types
Test and Assessment Types
 
Item Analysis
Item AnalysisItem Analysis
Item Analysis
 
Qualities of Good Test (Usability, Reliability, & Validity)
Qualities of Good Test (Usability, Reliability, & Validity)Qualities of Good Test (Usability, Reliability, & Validity)
Qualities of Good Test (Usability, Reliability, & Validity)
 
Week 7 Rubrics And Rating Scales
Week 7 Rubrics And Rating ScalesWeek 7 Rubrics And Rating Scales
Week 7 Rubrics And Rating Scales
 
OBJECTIVITY OF TESTS ppt.pptx
OBJECTIVITY OF TESTS ppt.pptxOBJECTIVITY OF TESTS ppt.pptx
OBJECTIVITY OF TESTS ppt.pptx
 
Subjective vs Objective test
Subjective vs Objective testSubjective vs Objective test
Subjective vs Objective test
 
Norm Referenced and Criterion Referenced
Norm Referenced and Criterion ReferencedNorm Referenced and Criterion Referenced
Norm Referenced and Criterion Referenced
 
Item Analysis
Item AnalysisItem Analysis
Item Analysis
 
Characteristics of a Good Test
Characteristics of a Good TestCharacteristics of a Good Test
Characteristics of a Good Test
 
ITEM ANALYSIS
ITEM ANALYSISITEM ANALYSIS
ITEM ANALYSIS
 
Test item formats: definition, types, pros and cons
Test item formats: definition, types, pros and consTest item formats: definition, types, pros and cons
Test item formats: definition, types, pros and cons
 
Competence-Based Curriculum
Competence-Based CurriculumCompetence-Based Curriculum
Competence-Based Curriculum
 
Validity of test
Validity of testValidity of test
Validity of test
 
Test Reliability and Validity
Test Reliability and ValidityTest Reliability and Validity
Test Reliability and Validity
 
Constructing test Items
Constructing test ItemsConstructing test Items
Constructing test Items
 
Norm referenced and criterion-referenced evaluation
Norm referenced and criterion-referenced evaluationNorm referenced and criterion-referenced evaluation
Norm referenced and criterion-referenced evaluation
 

Destacado

CPT Imp MCQs on Sale of Goods Act
CPT Imp MCQs on Sale of Goods ActCPT Imp MCQs on Sale of Goods Act
CPT Imp MCQs on Sale of Goods ActVXplain
 
Security analysis and Portfolio Management
Security analysis and Portfolio ManagementSecurity analysis and Portfolio Management
Security analysis and Portfolio ManagementAshutosh Pandey
 
How to create multiple choice questions
How to create multiple choice questionsHow to create multiple choice questions
How to create multiple choice questionsJennifer Morrow
 
Introduction to Excel - Excel 2013 Tutorial
Introduction to Excel - Excel 2013 TutorialIntroduction to Excel - Excel 2013 Tutorial
Introduction to Excel - Excel 2013 TutorialSpreadsheetTrainer
 
Odesk english skill test Answer 2014
Odesk english skill  test Answer 2014Odesk english skill  test Answer 2014
Odesk english skill test Answer 2014MD Riyad Rana
 
Synonyms and Antonyms (Mashup)
Synonyms and Antonyms (Mashup) Synonyms and Antonyms (Mashup)
Synonyms and Antonyms (Mashup) Carla Meyer
 
Alphabetical Order
Alphabetical OrderAlphabetical Order
Alphabetical OrderLee De Groft
 
Alphabetical order 2 ppt tg 2012
Alphabetical order 2 ppt tg 2012Alphabetical order 2 ppt tg 2012
Alphabetical order 2 ppt tg 2012gavinnancarrow
 
2014 als a&e test elementary level test passers
2014 als a&e test elementary level test passers2014 als a&e test elementary level test passers
2014 als a&e test elementary level test passersArvic Lasaca
 
Vocabulary- One Word Substitutes
Vocabulary- One Word SubstitutesVocabulary- One Word Substitutes
Vocabulary- One Word Substitutessaraswathi tenneti
 
Word Power Made Easy - Chapter 1
Word Power Made Easy - Chapter 1Word Power Made Easy - Chapter 1
Word Power Made Easy - Chapter 1Venkatram Sureddy
 
Alphabetical order
Alphabetical orderAlphabetical order
Alphabetical orderBevan James
 

Destacado (20)

Company law mcq
Company law mcqCompany law mcq
Company law mcq
 
CPT Imp MCQs on Sale of Goods Act
CPT Imp MCQs on Sale of Goods ActCPT Imp MCQs on Sale of Goods Act
CPT Imp MCQs on Sale of Goods Act
 
Security analysis and Portfolio Management
Security analysis and Portfolio ManagementSecurity analysis and Portfolio Management
Security analysis and Portfolio Management
 
Mis mcq
Mis mcqMis mcq
Mis mcq
 
Security analysis and portfolio management
Security analysis and portfolio managementSecurity analysis and portfolio management
Security analysis and portfolio management
 
How to create multiple choice questions
How to create multiple choice questionsHow to create multiple choice questions
How to create multiple choice questions
 
Introduction to Excel - Excel 2013 Tutorial
Introduction to Excel - Excel 2013 TutorialIntroduction to Excel - Excel 2013 Tutorial
Introduction to Excel - Excel 2013 Tutorial
 
Antonyms
AntonymsAntonyms
Antonyms
 
Odesk english skill test Answer 2014
Odesk english skill  test Answer 2014Odesk english skill  test Answer 2014
Odesk english skill test Answer 2014
 
Synonyms and Antonyms (Mashup)
Synonyms and Antonyms (Mashup) Synonyms and Antonyms (Mashup)
Synonyms and Antonyms (Mashup)
 
1 alphabetical order
1 alphabetical order1 alphabetical order
1 alphabetical order
 
Alphabetical Order
Alphabetical OrderAlphabetical Order
Alphabetical Order
 
Synonyms
SynonymsSynonyms
Synonyms
 
Alphabetical order 2 ppt tg 2012
Alphabetical order 2 ppt tg 2012Alphabetical order 2 ppt tg 2012
Alphabetical order 2 ppt tg 2012
 
2014 als a&e test elementary level test passers
2014 als a&e test elementary level test passers2014 als a&e test elementary level test passers
2014 als a&e test elementary level test passers
 
Alphabetical order
Alphabetical orderAlphabetical order
Alphabetical order
 
Vocabulary- One Word Substitutes
Vocabulary- One Word SubstitutesVocabulary- One Word Substitutes
Vocabulary- One Word Substitutes
 
Word Power Made Easy - Chapter 1
Word Power Made Easy - Chapter 1Word Power Made Easy - Chapter 1
Word Power Made Easy - Chapter 1
 
Synonyms
SynonymsSynonyms
Synonyms
 
Alphabetical order
Alphabetical orderAlphabetical order
Alphabetical order
 

Similar a MCQ test item analysis

STANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TESTSTANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TESTsakshi rana
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Mahsa Farahanynia
 
Validity and reliability
Validity and reliabilityValidity and reliability
Validity and reliabilityrandoparis
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptxJCronus
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnairesVenkitachalam R
 
Issues regarding construction of exams
Issues regarding construction of examsIssues regarding construction of exams
Issues regarding construction of examsFalehaa
 
Construction of Tests
Construction of TestsConstruction of Tests
Construction of TestsDakshta1
 
Stages of test writings final by joy,, language testing
Stages of test writings final by joy,, language testingStages of test writings final by joy,, language testing
Stages of test writings final by joy,, language testingpatiluna
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliabilitysongoten77
 
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)hakim azman
 
Measurement & Evaluation pptx
Measurement & Evaluation pptxMeasurement & Evaluation pptx
Measurement & Evaluation pptxAliimtiaz35
 
Assessment 15 Annotated
Assessment 15 AnnotatedAssessment 15 Annotated
Assessment 15 AnnotatedJames Atherton
 
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)Videoconferencias UTPL
 
Standardized and non standardized tests
Standardized and non standardized testsStandardized and non standardized tests
Standardized and non standardized testsshaziazamir1
 

Similar a MCQ test item analysis (20)

STANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TESTSTANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TEST
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
 
Validity and reliability
Validity and reliabilityValidity and reliability
Validity and reliability
 
Standardized and non standardized tests (1)
Standardized and non standardized tests (1)Standardized and non standardized tests (1)
Standardized and non standardized tests (1)
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptx
 
7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
Issues regarding construction of exams
Issues regarding construction of examsIssues regarding construction of exams
Issues regarding construction of exams
 
7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)
 
Construction of Tests
Construction of TestsConstruction of Tests
Construction of Tests
 
Stages of test writings final by joy,, language testing
Stages of test writings final by joy,, language testingStages of test writings final by joy,, language testing
Stages of test writings final by joy,, language testing
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
 
Evaluation in education
Evaluation in educationEvaluation in education
Evaluation in education
 
Validity
ValidityValidity
Validity
 
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
 
Measurement & Evaluation pptx
Measurement & Evaluation pptxMeasurement & Evaluation pptx
Measurement & Evaluation pptx
 
Assessment 15 Annotated
Assessment 15 AnnotatedAssessment 15 Annotated
Assessment 15 Annotated
 
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
 
Unit. 6.doc
Unit. 6.docUnit. 6.doc
Unit. 6.doc
 
Standardized and non standardized tests
Standardized and non standardized testsStandardized and non standardized tests
Standardized and non standardized tests
 

Último

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 

Último (20)

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 

MCQ test item analysis

  • 1. MCQ Test Item Analysis Presented by: Dr. Soha Rashed Prof. of Community Medicine Executive Director of Medical Education Department Alexandria Faculty of Medicine, Egypt 10 March 2013
  • 2. Content outlines  Why are we here (Purpose of this session)?  What’s next (Needed future tasks)?  Key Features of Student Assessment Methods: ⁻ Content and construct Validity ⁻ Reliability ⁻ Objectivity  MCQ Test Item Analysis: ⁻ Difficulty index (p-value) ⁻ Discrimination index (DI)=Point-Biserial correlation (PBS) ⁻ Distractor efficiency (DE) ⁻ Internal Consistency Reliability ⁻ Writing a technical report (including remedial actions & recommendations)  MCQs evaluation checklist
  • 3.  Why are we here (Purpose of this session)?  What’s next (Needed future tasks)?
  • 4.
  • 5. What do we assess? Achievement of course ILOs: Knowledge Skills Attitudes
  • 6. ILOs: 5 DOMAINS 1. Knowledge (Recall) and Understanding 2. Intellectual Skills 3. Professional Skills (Practical, Procedural and Clinical) 4. General and Transferable Skills 5. Professional Attitudes and Ethics
  • 7. Problem solving Proble m solving
  • 8. Written exams Objective written exams: MCQ, Matching, Extended matching, TF, and Short answer Qs. Essay Qs.(Long, short, modified essay Qs)
  • 9. Key Features of Student Assessment Methods Quality standards  Validity: The ability of the test to measure what it is supposed to measure.  Reliability: The consistency of the test scores over time, under different testing conditions, and with different raters.  Objectivity: The degree by which examiners agree to the correct answer (Q is scored accurately and fairly, free of examiners’ bias)  Practicability/Feasibility: Overall ease of construction, administration, scoring, and reporting of an assessment instrument.  Acceptability: the responsiveness of faculty and students to the assessment.  Value/Educational impact: The utility of the test results in producing meaningful conclusions (usable information) about the educational process.
  • 10. Validity Validity refers to the extent to which an assessment instrument or a test measures what it intends to measure.  Content validity  Construct validity
  • 11. I. Content validity Content validity ensures that knowledge and skills covered by the test items are representative of the larger domain of knowledge and skills covered in the course.
  • 12. Test blueprint Learning Objectives to be tested Recall Unders Applica Problem solving Total % Content/ of facts tanding tion weight subject Analysi Synthe Evaluat area s sis ion …. 3 items 3 items -- -- -- -- 6 6% …. 2 items 4 items 2 items 2 items 10 10% …. 4 items 3 items 4 items 4 items 15 15% …. 5 items 4 items 4 items 4 items 17 17% …. 4 items 10 8 items 8 items 30 30% items …. 3 items 7 items 5 items 7 items 22 22% Total 21 31 23 25 100 % weight 21% 31% 23% 25% 100%
  • 13. II. Construct validity This refers to the COMPATIBILITY/ CONGRUENCE between the learning objective (LO) to be assessed and the type of assessment. In other words, construct validity emphasizes that assessment techniques should be based on the nature of the LOs that they are supposed to measure.
  • 14. Construct validity Learning objective to be Assessment instrument assessed Knowledge & understanding MCQ, TF, Matching, SAQ, Complete Short essay Q Long essay Q Oral exam Application & problem solving Clinical scenario-based MCQ Extended matching Q Modified essay Q Case study (Patient management problem) Oral exam Practical skills OSPE Clinical skills OSCE (real or simulated patients) Short Case Long Case Procedural skills OSCE (Anatomical models)
  • 15. To increase the test validity :  Use of the test blue print  Focus on the important content areas  Sample widely across the domains and across the content area (% wt)  To increase construct validity: Use items that have high discriminative value (those testing higher cognitive/thinking abilities such as comprehension, application and problem solving. e.g., applied Qs- Clinical scenario-based Qs)  Use multiple methods to have a valid comprehensive assessment
  • 16. Reliability  Refers to consistency or repeatability of test scores.  In practice, a reliable assessment should yield the same result: - When given to the same student at two different times (Test-retest reliability ), or - By different examiners (Inter-rater reliability), - While keeping all the other variables (timing, length, content or other contextual features) as consistent as possible.
  • 17. - Internal consistency (intra-exam, inter-item reliability): Coherence of the test items, or the extent to which the test questions are interrelated. Cronbach’s alpha
  • 18. MCQs are highly reliable The results of the test are unlikely to be influenced by:  when the test is administered,  when the test is scored, or by  who does the scoring. Hence the term “objective” is often used when referring to these kinds of assessments.
  • 19. On the other hand, reliability is an important concern when grading essay questions, rating clinical skills or scoring other assessments requiring judgment or interpretation. In these situations, clear scoring criteria are needed to attain a high level of reliability, regardless of whether one or multiple people will be involved in grading the responses.
  • 20. How to improve reliability of the test items?  Writing clear unambiguous questions and test instructions improve reliability by generating consistent patterns of response from the students.  Use of structured predefined marking scheme: An answer key for MCQs and essay Qs, standardized checklists (in OSCEs/OSPEs) with clear scoring criteria.  A longer test with multiple items is more likely to have better reliability than a shorter test with a limited number of items as the former 'evens out' possible inconsistencies of individual items.
  • 21. Desirable Features of Valid and Reliable Assessments  There is a clearly specified set of learning outcomes.  Assessment tasks are matched to the stated learning outcomes.  Assessment tasks are a representative sample of the stated learning outcomes.  Assessment tasks are the appropriate level of difficulty.
  • 22.  Assessment tasks effectively distinguish (discriminate) between achievers and non- achievers.  Clear instructions are given for the administration, scoring, and interpretation of the assessment results.
  • 23. MCQ test item analysis
  • 24. Remark Classic OMR (Optical Mark Recognition) software
  • 25. Parameters commonly assessed in MCQ test item analysis  Item analysis:  Difficulty index (p-value)  Discrimination index (DI)=Point-Biserial correlation (PBS)  Distractor efficiency (DE)  Internal Consistency Reliability
  • 26.
  • 27.
  • 28.
  • 29. Do final grades attained by students actually reflect their competences?? Do they produce meaningful conclusions about their performance??
  • 30.
  • 31.
  • 33. Difficulty Index (p-value)  Calculated as the percentage of students that correctly answered the item.  The range is from 0% to 100%, or more typically written as a proportion as 0.0 to 1.00 (p-value).  The higher the value, the easier the item:  Difficulty level  d ≥75% = very easy  d ≥ 70% = easy  d 30-70% = moderately difficult to moderately easy (Recommended)  d <30 % = difficult  d <25% = very difficult  P-values above 0.90 are very easy items and should not be reused again for subsequent tests. If almost all of the students can get the item correct, it is a concept probably not worth testing.  P-values below 0.20 are very difficult items and should be reviewed for possible confusing language, removed from subsequent tests, and/or highlighted for an area for re-instruction. If almost all of the students get the item wrong there is either a problem with the item or students did not get the concept.
  • 34. Discrimination index (DI)= Point-Biserial correlation (PBS)  It describes the ability of an item to distinguish between high and low scorers (scores of upper and lower 27% of students after being ordered descendingly).  The range is from 0.0 to 1.00.  The higher the value, the more discriminating the item. A highly discriminating item indicates that the students who had high tests scores got the item correct whereas students who had low test scores got the item incorrect.  Items with discrimination values near or less than zero should be removed from the test. This indicates that students who overall did poorly on the test did better on that item than students who overall did well. The item may be confusing for your better scoring students in some way.
  • 35. Interpreting discrimination index  0.40 or higher = very good discrimination  0.30 to 0.39 = reasonably good discrimination but possibly subject to improvement  0.20 to 0.29 = Marginal/acceptable discrimination (subject to improvement)  0.00 to 0.19 = poor discrimination (to be rejected or improved by revision)  Negative DI = Low performing students selected the correct answer more often than high scorers (to be rejected)
  • 36. Use items that have high discrimination values in the test (those testing higher cognitive/thinking abilities such as comprehension, application and problem solving)  Linking questions to case scenarios. Asking the question in the context of a clinical situation, diagram, graph, image, radiologic image, histo-pathological section, laboratory findings, etc.
  • 37.
  • 38. Distractor efficiency  The distractors are important components of an item, as they show a relationship between the total test score and the distractor chosen by the student.  Distractor efficiency is one such tool that tells whether the item was well constructed or failed to perform its purpose.  The quality of the distractors influences student performance on a test item. Ideally, low-scoring students, who have not mastered the subject, should choose the distractors more often, whereas, high scorers should discard them more frequently while choosing the correct option.  Any distractor that has been selected by less than 5% of the students is considered to be a non-functioning distractor (NF-D).  Reviewing the options can reveal potential errors of judgment and inadequate performance of distractors. These poor distractors can be revised, replaced, or removed.
  • 39. Internal Consistency Reliability  Internal consistency reliability indicates how well the items are correlated with one another. It measures whether multiple items within an instrument reveal similar results.  Cronbach's Alpha is used as a coefficient of internal consistency. Interpreting Cronbach's Alpha:  The range is from 0.0 to 1.0, with 0.7 generally accepted as a sign of acceptable reliability.  High reliability indicates that the items are all measuring the same thing, or general construct  The higher the value, the more reliable the overall test score.
  • 40. Interpreting Cronbach's Alpha Cronbach's Internal consistency alpha α ≥ 0.9 Excellent 0.8 ≤ α < 0.9 Very good Good (There are probably a few items 0.7 ≤ α < 0.8 which could be improved). Somewhat low (There are probably some 0.6 ≤ α < 0.7 items which could be improved. 0.5 ≤ α < 0.6 Poor (Suggests need for revision of test). Questionable/Unacceptable (This test α < 0.5 should not contribute heavily to the course grade, and it needs revision).
  • 41. Practice exercises  Interpreting Remark Classic OMR (Optical Mark Recognition) software outputs  Writing a technical report on MCQ test item analysis (including remedial actions & recommendations)  Use of MCQs evaluation checklist

Notas del editor

  1. MCQ, TF, Matching, SAQ, CompleteShort essay QLong essay QOral exam