SlideShare una empresa de Scribd logo
1 de 45
MESSICK’S
FRAMEWORK
What Do Evaluators Need to
Know?
Outline
What this report will
cover
1. Concepts of Validity
2. Messick’s
Contributions
3. Messick’s Framework
Validity Concept
The concept of validity has
historically seen a variety
of iterations that involved
“packing” different aspects
into the concept and
subsequently “unpacking” some
of them.
 Points of broad consensus
 Validity if the most fundamental
consideration in the evaluation
of the appropriateness of claims
about, and uses and interpretations
of assessment results.
 Validity is a matter of degree
rather than all or none.
SICI Conference 2010
North Rhine-Westphalia
Quality Assurance in the Work of “Inspectors”
 Main controversial aspect
…empirical evidence and
theoretical rationales…
Validity is “an integrated evaluative
judgment of the degree to which
empirical evidence and theoretical
rationales support the adequacy and
appropriateness of inferences and
actions based on test scores or other
modes of assessment.”
Messick, S. (1989). Validity. In R. Linn (Ed.),
Educational Measurement (3rd ed., pp.13-103).
Washington, DC: American Council on
Education/Macmillan.
 Broad, but not universal agreement
(for a dissenting viewpoint, Lissitz &
Samuelsen, 2007)
Karen Samuelsen,
Assistant Professor in the Department of
Educational Psychology and Instructional
Technology.
Robert W. Lissitz
Professor of Education in the College of Education
at the University of Maryland and Director of the
Maryland Assessment Research Center for
Education Success (MARCES).
 Broad, but not universal agreement
(for a dissenting viewpoint, Lissitz &
Samuelsen, 2007)
 It is the uses and interpretations of an
assessment result, i.e. the inferences,
rather than the assessment result itself
that is validated.
 Validity may be relatively high for one
use of assessment results by quite low for
another use or interpretation
Messick’s contributions
According to Angoff (1988),
theoretical conceptions of
validity and validation
practices have change
appreciably over the last 60
years largely because of
Messick’s many contributions
to our contemporary
conception of validity.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
 1951 Cureton , the essential
feature of validity was “how
well a test does the job it
was employed to do”
(p.621)
 1954 American
Psychological Association
(APA) listed four distinct
types of validity
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
Types of Validity
1. Construct Validity refers to
how well a particular test
can be show to assess the
construct that it is said to
measure.
2. Content Validity refers to
how well test scores
adequately represent the
content domain that
these scores are said to
measure. Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
3. Predictive Validity is the
degree to which the
predictions made by a test
are confirmed by the later
behavior of the tested
individuals.
4. Concurrent Validity is the
extent to which individuals
scores on a new test
correspond to their scores on
an established test of the
same construct that is
determined shortly before of
after the new test.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
 1966 APA, Standards for
Educational and
Psychological Tests and
Manuals, criterion-related
validity and predictive
validity were collapsed into
criterion-related validity.
 1980 Guion, three aspects
of validity referred to as
“Holy Trinity.”
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
 1996 Hubley & Zumbo, the
Holy Trinity referred by
Guion, means that at least
one type of validity is
needed but one has three
chances to get it.
 1957 Loevinger, argued
that construct validity was
the whole of validity,
anticipating a shift away
from multiple types to a
single type of validity.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
 1988 Angoff, validity was
viewed as a property of
tests, but the focus later
shifted to the validity of a
test in a specific context or
application, such as the
workplace.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
 1974 Standards for
Educational and
Psychological Tests (APA,
American Educational
Research Association and
National Council on
Measurement in Education)
shifted the focus of content
validity from a
representative sample of
content knowledge to a
representative sample of
behaviors in a specific
context.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
 1989 Messick professional
standard s were established
for a number of applied
testing areas such as
“counseling, licensure,
certification and program
evaluation
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
 1985 Standards (APA,
American Educational
Research Association and
National Council on
Measurement in Education
validity was redefined as
the “appropriateness,
meaningfulness, and
usefulness of the specific
inferences made from test
scores.
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
 1985 the unintended social
consequences of the use of
tests – for example, bias
and adverse impact---were
also included in the
Standards (Messick 1989).
Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-
Learning pp. 73-91
Validation Practice
 is “disciplined inquiry” (Hubley & Zumbo, 1996) that started
out historically with calculation of measures of a single
aspect of validity (content validity or predictive validity)
 Building an argument based on multiple sources of
evidence (e.g. statistical calculations, qualitative data,
reflections on one’s own values and those of others, and
an analysis of unintended consequences)
 These calculations are based on logical or mathematical
models that date from the early 20th century (Crocker &
Algina, 1986)
 Messick (1989) describes these procedures as
fragmented, unitary approaches to validation
 Hubley and Zumbo (1996) describe them as
“scanty, disconnected bits of evidence…to
make a two-point decision about the validity of
a test”
 Cronbach (1982) recommended a more
comprehensive, argument-based approach to
validation that considered multiple and diverse
sources of evidence
 Validation practice has also evolved from a
fragmented approach to a comprehensive,
unified approach in which multiple sources of
data are used to support an argument
Messick’s framework
What is Validity?
 Validity is “an integrated evaluative judgment
of the degree to which empirical evidence and
theoretical rationales support the adequacy
and appropriateness of inferences and actions
based on test scores or other modes of
assessment” (Messick, 1989)
 Validity is a unified concept, and validation is a
scientific activity based on the collection of
multiple and diverse type of evidence (Messick,
1989; Zumbo, 1998, 2007)
Messick’s Conception of Validity
Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
In terms of functions
(interpretation vs. use)
Basis for justifying
validity
(evidential basis vs.
consequential
basis)
Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
refer to
traditional
scientific
evidence traditional
psychometrics
relevance to
learners and
to society,
and to cost
benefit
Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
Consequential basis is not
about poor test practice
rather, the consequences of
testing refer to the
unanticipated or
unintended consequences
of legitimate test
interpretation and use
Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
refers to underlying
values, including
language or
rhetoric, theory, and
ideology
Justification
Outcomes
Test Interpretation Test Use
Evidential basis
Construct Validity
(CV)
CV + Relevance/+ Utility
(RU)
Consequential
basis
Value Implications
(CV+RU+VI)
Social Consequences
(CV+RU+VI+UC)
The four facets
The evidential basis of Messick’s
framework contains two facets
1. Traditional psychometric evidence
2. The evidence for relevance in applied settings such
as the workplace as well as utility or cost-benefit.
Evidential Basis for Test
Inferences and Use
 The evidential basis for test interpretation is an
appraisal of the scientific evidence for construct
validity.
 A construct is a “definition of skills and
knowledge included in the domain to be
measured by a tool such as a test” (Reckase,
1998b)
 The four traditional types of validity are included
in this first facet.
Evidential Basis for Test
Inferences and Use
 The evidential basis for test use includes measures of
predictive validity (e.g., correlations with other tests
of behaviors) as well as ultility (i.e., a cost-benefit
analysis)
 Predictive validity coefficients re measures of
behavior to be predicted from the test (e.g., a
correlation between scores on a road test and a
written driver qualification test)
 Cost- benefit refers to an analysis of costs compared
with benefits, which in education are often difficult
to quantify.
The consequential basis of
Messick’s framework contains
two facets
1. Value Implications (VI)
1. (CV + RU + VI)
2. Social Consequences
1. (CV + RU + VI + UC)
Value Implications
Rhetoric Theories Ideologies
Value Implications: The Dimensions
Value implications requires an
investigation of three components
 Rhetoric or value -laden language and
terminology
 Value-laden language that conveys both a
concept and an opinion of concept
 Underlying theories
 Underlying assumptions or logic of how a program
is supposed to work (Chen, 1990)
 Underlying ideologies
 A complex mix of shared values and beliefs that
provide a framework for interpreting the world
(Messick, 1989)
Rhetoric
Includes language that is discriminatory,
exaggerated, or over blown, such as derogatory
language used to refer to the homeless.
In validation practice, the rhetoric surrounding
standardized tests should be critically evaluated
to determine whether these terms are accurate
description of knowledge and skills said to be
assessed by a test (Messick, 1989)
Theory
The second component of the value
implications category is an appraisal of the
theory underlying the test. A theory connotes
a body of knowledge that organizes,
categorizes, describes, predicts, explains and
otherwise aids in understanding phenomenon
and organizing and directing thoughts,
observations and actions (Sidan& Sechrest,
1999)
Ideology
The third component of value implications is an
appraisal of the “broader ideologies that give
theories their perspective and purpose (Messick,
1989)
An ideology is a “complex configuration of shared
values, affects and beliefs that provides, among
other things, an existential framework for
interpreting the world.” (Messick, 1989)
Values implications challenge
us to reflect upon:
a. The personal or social values suggested by our
interest in the construct and the name/label
selected to represent that construct
b. The personal or social values reflected by the
theory underlying the construct and its
measurement
c. The values reflected by the broader social
ideologies that impacted the development of the
identified theory
Messick 1980, 1989
Social Consequences
Social consequences refer to
consequences for society
stemming from the use of a
measure
Remember that construct
validity, relevance and utility,
value implications and social
consequences all work
together and impact one
another in test interpretation
and use.
mzgalleno@yahoo.com

Más contenido relacionado

La actualidad más candente

A" Research Methods Reliability and validity
A" Research Methods Reliability and validityA" Research Methods Reliability and validity
A" Research Methods Reliability and validityJill Jan
 
BASIC OF MEASUREMENT & EVALUATION
BASIC OF MEASUREMENT & EVALUATION BASIC OF MEASUREMENT & EVALUATION
BASIC OF MEASUREMENT & EVALUATION suresh kumar
 
Classical Test Theory and Item Response Theory
Classical Test Theory and Item Response TheoryClassical Test Theory and Item Response Theory
Classical Test Theory and Item Response Theorysaira kazim
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing Seray Tanyer
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testBoyet Aluan
 
Test specifications and designs
Test specifications and designs  Test specifications and designs
Test specifications and designs ahfameri
 
Principles of assessment
Principles of assessmentPrinciples of assessment
Principles of assessmentmunsif123
 
VALIDITY
VALIDITYVALIDITY
VALIDITYANCYBS
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testcyrilcoscos
 
Validity of test
Validity of testValidity of test
Validity of testSarat Rout
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validityshobhitsaxena67
 
Standardized testing.pptx 2
Standardized testing.pptx 2Standardized testing.pptx 2
Standardized testing.pptx 2Jesullyna Manuel
 
CHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENTCHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENTMusfera Nara Vadia
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and ReliabilityMaury Martinez
 
Subjective vs Objective test
Subjective vs Objective testSubjective vs Objective test
Subjective vs Objective testSùng A Tô
 

La actualidad más candente (20)

Validity
ValidityValidity
Validity
 
A" Research Methods Reliability and validity
A" Research Methods Reliability and validityA" Research Methods Reliability and validity
A" Research Methods Reliability and validity
 
Assessment purposes and approaches
Assessment purposes and approachesAssessment purposes and approaches
Assessment purposes and approaches
 
BASIC OF MEASUREMENT & EVALUATION
BASIC OF MEASUREMENT & EVALUATION BASIC OF MEASUREMENT & EVALUATION
BASIC OF MEASUREMENT & EVALUATION
 
Classical Test Theory and Item Response Theory
Classical Test Theory and Item Response TheoryClassical Test Theory and Item Response Theory
Classical Test Theory and Item Response Theory
 
Types of tests in measurement and evaluation
Types of tests in measurement and evaluationTypes of tests in measurement and evaluation
Types of tests in measurement and evaluation
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing
 
Test construction
Test constructionTest construction
Test construction
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Test specifications and designs
Test specifications and designs  Test specifications and designs
Test specifications and designs
 
Principles of assessment
Principles of assessmentPrinciples of assessment
Principles of assessment
 
VALIDITY
VALIDITYVALIDITY
VALIDITY
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Validity of test
Validity of testValidity of test
Validity of test
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Standardized testing.pptx 2
Standardized testing.pptx 2Standardized testing.pptx 2
Standardized testing.pptx 2
 
CHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENTCHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENT
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and Reliability
 
classroom assessment
classroom assessment classroom assessment
classroom assessment
 
Subjective vs Objective test
Subjective vs Objective testSubjective vs Objective test
Subjective vs Objective test
 

Destacado

Methodologie validite et_fiabilite
Methodologie validite et_fiabiliteMethodologie validite et_fiabilite
Methodologie validite et_fiabiliteRémi Bachelet
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validityAnju Kumawat
 
Fundamentals of music
Fundamentals of musicFundamentals of music
Fundamentals of musiclerise
 
Reading test specifications assignment-01-ppt
Reading test specifications assignment-01-pptReading test specifications assignment-01-ppt
Reading test specifications assignment-01-pptBilal Yaseen
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testingPhuong Tran
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Linejan
 

Destacado (8)

Methodologie validite et_fiabilite
Methodologie validite et_fiabiliteMethodologie validite et_fiabilite
Methodologie validite et_fiabilite
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Teaching Diverse Adult Learners
Teaching Diverse Adult LearnersTeaching Diverse Adult Learners
Teaching Diverse Adult Learners
 
Fundamentals of music
Fundamentals of musicFundamentals of music
Fundamentals of music
 
Andragogy
AndragogyAndragogy
Andragogy
 
Reading test specifications assignment-01-ppt
Reading test specifications assignment-01-pptReading test specifications assignment-01-ppt
Reading test specifications assignment-01-ppt
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testing
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 

Similar a Messick’s framework

Basic Principles of Assessment
Basic Principles of AssessmentBasic Principles of Assessment
Basic Principles of AssessmentYee Bee Choo
 
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...Castle Worldwide, Inc.
 
Presentation validity
Presentation validityPresentation validity
Presentation validityAshMusavi
 
reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234MajaAiraBumatay
 
Language testing the social dimension
Language testing  the social dimensionLanguage testing  the social dimension
Language testing the social dimensionahfameri
 
validity and reliability
validity and reliabilityvalidity and reliability
validity and reliabilityaffera mujahid
 
Validity in Research
Validity in ResearchValidity in Research
Validity in ResearchEcem Ekinci
 
Fb11001 reliability and_validity_in_qualitative_research_summary
Fb11001 reliability and_validity_in_qualitative_research_summaryFb11001 reliability and_validity_in_qualitative_research_summary
Fb11001 reliability and_validity_in_qualitative_research_summaryDr. Akshay S. Bhat
 
8. brown & hudson 1998 the alternatives in language assessment
8. brown & hudson 1998 the alternatives in language assessment8. brown & hudson 1998 the alternatives in language assessment
8. brown & hudson 1998 the alternatives in language assessmentCate Atehortua
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testALMA HERMOGINO
 
Assessment Of The Analytic Scale Of Argumentative Writing (ASAW)
Assessment Of The Analytic Scale Of Argumentative Writing (ASAW)Assessment Of The Analytic Scale Of Argumentative Writing (ASAW)
Assessment Of The Analytic Scale Of Argumentative Writing (ASAW)Joaquin Hamad
 
Validity in psychological testing
Validity in psychological testingValidity in psychological testing
Validity in psychological testingMilen Ramos
 
Test characteristics
Test characteristicsTest characteristics
Test characteristicsSamcruz5
 

Similar a Messick’s framework (20)

Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
Basic Principles of Assessment
Basic Principles of AssessmentBasic Principles of Assessment
Basic Principles of Assessment
 
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
 
Presentation validity
Presentation validityPresentation validity
Presentation validity
 
reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234
 
Maryam Bolouri
Maryam BolouriMaryam Bolouri
Maryam Bolouri
 
Language testing the social dimension
Language testing  the social dimensionLanguage testing  the social dimension
Language testing the social dimension
 
Language testing the social dimension
Language testing  the social dimensionLanguage testing  the social dimension
Language testing the social dimension
 
validity and reliability
validity and reliabilityvalidity and reliability
validity and reliability
 
01 validity and its type
01 validity and its type01 validity and its type
01 validity and its type
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 
Fb11001 reliability and_validity_in_qualitative_research_summary
Fb11001 reliability and_validity_in_qualitative_research_summaryFb11001 reliability and_validity_in_qualitative_research_summary
Fb11001 reliability and_validity_in_qualitative_research_summary
 
8. brown & hudson 1998 the alternatives in language assessment
8. brown & hudson 1998 the alternatives in language assessment8. brown & hudson 1998 the alternatives in language assessment
8. brown & hudson 1998 the alternatives in language assessment
 
Week 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and ReliabilityWeek 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and Reliability
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Assessment Of The Analytic Scale Of Argumentative Writing (ASAW)
Assessment Of The Analytic Scale Of Argumentative Writing (ASAW)Assessment Of The Analytic Scale Of Argumentative Writing (ASAW)
Assessment Of The Analytic Scale Of Argumentative Writing (ASAW)
 
Chapter 6: Validity
Chapter 6: ValidityChapter 6: Validity
Chapter 6: Validity
 
Validity.docx
Validity.docxValidity.docx
Validity.docx
 
Validity in psychological testing
Validity in psychological testingValidity in psychological testing
Validity in psychological testing
 
Test characteristics
Test characteristicsTest characteristics
Test characteristics
 

Messick’s framework

  • 2. Outline What this report will cover 1. Concepts of Validity 2. Messick’s Contributions 3. Messick’s Framework
  • 4. The concept of validity has historically seen a variety of iterations that involved “packing” different aspects into the concept and subsequently “unpacking” some of them.
  • 5.  Points of broad consensus  Validity if the most fundamental consideration in the evaluation of the appropriateness of claims about, and uses and interpretations of assessment results.  Validity is a matter of degree rather than all or none. SICI Conference 2010 North Rhine-Westphalia Quality Assurance in the Work of “Inspectors”
  • 6.  Main controversial aspect …empirical evidence and theoretical rationales… Validity is “an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment.” Messick, S. (1989). Validity. In R. Linn (Ed.), Educational Measurement (3rd ed., pp.13-103). Washington, DC: American Council on Education/Macmillan.
  • 7.  Broad, but not universal agreement (for a dissenting viewpoint, Lissitz & Samuelsen, 2007) Karen Samuelsen, Assistant Professor in the Department of Educational Psychology and Instructional Technology. Robert W. Lissitz Professor of Education in the College of Education at the University of Maryland and Director of the Maryland Assessment Research Center for Education Success (MARCES).
  • 8.  Broad, but not universal agreement (for a dissenting viewpoint, Lissitz & Samuelsen, 2007)  It is the uses and interpretations of an assessment result, i.e. the inferences, rather than the assessment result itself that is validated.  Validity may be relatively high for one use of assessment results by quite low for another use or interpretation
  • 10. According to Angoff (1988), theoretical conceptions of validity and validation practices have change appreciably over the last 60 years largely because of Messick’s many contributions to our contemporary conception of validity. Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 11.  1951 Cureton , the essential feature of validity was “how well a test does the job it was employed to do” (p.621)  1954 American Psychological Association (APA) listed four distinct types of validity Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 12. Types of Validity 1. Construct Validity refers to how well a particular test can be show to assess the construct that it is said to measure. 2. Content Validity refers to how well test scores adequately represent the content domain that these scores are said to measure. Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 13. 3. Predictive Validity is the degree to which the predictions made by a test are confirmed by the later behavior of the tested individuals. 4. Concurrent Validity is the extent to which individuals scores on a new test correspond to their scores on an established test of the same construct that is determined shortly before of after the new test. Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 14.  1966 APA, Standards for Educational and Psychological Tests and Manuals, criterion-related validity and predictive validity were collapsed into criterion-related validity.  1980 Guion, three aspects of validity referred to as “Holy Trinity.” Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 15.  1996 Hubley & Zumbo, the Holy Trinity referred by Guion, means that at least one type of validity is needed but one has three chances to get it.  1957 Loevinger, argued that construct validity was the whole of validity, anticipating a shift away from multiple types to a single type of validity. Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 16.  1988 Angoff, validity was viewed as a property of tests, but the focus later shifted to the validity of a test in a specific context or application, such as the workplace. Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 17.  1974 Standards for Educational and Psychological Tests (APA, American Educational Research Association and National Council on Measurement in Education) shifted the focus of content validity from a representative sample of content knowledge to a representative sample of behaviors in a specific context. Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 18.  1989 Messick professional standard s were established for a number of applied testing areas such as “counseling, licensure, certification and program evaluation Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 19.  1985 Standards (APA, American Educational Research Association and National Council on Measurement in Education validity was redefined as the “appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores. Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 20.  1985 the unintended social consequences of the use of tests – for example, bias and adverse impact---were also included in the Standards (Messick 1989). Ruhe V. and Zumbo B. Evaluation in Distance Education and E- Learning pp. 73-91
  • 21. Validation Practice  is “disciplined inquiry” (Hubley & Zumbo, 1996) that started out historically with calculation of measures of a single aspect of validity (content validity or predictive validity)  Building an argument based on multiple sources of evidence (e.g. statistical calculations, qualitative data, reflections on one’s own values and those of others, and an analysis of unintended consequences)  These calculations are based on logical or mathematical models that date from the early 20th century (Crocker & Algina, 1986)  Messick (1989) describes these procedures as fragmented, unitary approaches to validation
  • 22.  Hubley and Zumbo (1996) describe them as “scanty, disconnected bits of evidence…to make a two-point decision about the validity of a test”  Cronbach (1982) recommended a more comprehensive, argument-based approach to validation that considered multiple and diverse sources of evidence  Validation practice has also evolved from a fragmented approach to a comprehensive, unified approach in which multiple sources of data are used to support an argument
  • 24. What is Validity?  Validity is “an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (Messick, 1989)  Validity is a unified concept, and validation is a scientific activity based on the collection of multiple and diverse type of evidence (Messick, 1989; Zumbo, 1998, 2007)
  • 25. Messick’s Conception of Validity Justification Outcomes Test Interpretation Test Use Evidential basis Construct Validity (CV) CV + Relevance/+ Utility (RU) Consequential basis Value Implications (CV+RU+VI) Social Consequences (CV+RU+VI+UC)
  • 26. Justification Outcomes Test Interpretation Test Use Evidential basis Construct Validity (CV) CV + Relevance/+ Utility (RU) Consequential basis Value Implications (CV+RU+VI) Social Consequences (CV+RU+VI+UC) In terms of functions (interpretation vs. use) Basis for justifying validity (evidential basis vs. consequential basis)
  • 27. Justification Outcomes Test Interpretation Test Use Evidential basis Construct Validity (CV) CV + Relevance/+ Utility (RU) Consequential basis Value Implications (CV+RU+VI) Social Consequences (CV+RU+VI+UC) refer to traditional scientific evidence traditional psychometrics relevance to learners and to society, and to cost benefit
  • 28. Justification Outcomes Test Interpretation Test Use Evidential basis Construct Validity (CV) CV + Relevance/+ Utility (RU) Consequential basis Value Implications (CV+RU+VI) Social Consequences (CV+RU+VI+UC) Consequential basis is not about poor test practice rather, the consequences of testing refer to the unanticipated or unintended consequences of legitimate test interpretation and use
  • 29. Justification Outcomes Test Interpretation Test Use Evidential basis Construct Validity (CV) CV + Relevance/+ Utility (RU) Consequential basis Value Implications (CV+RU+VI) Social Consequences (CV+RU+VI+UC) refers to underlying values, including language or rhetoric, theory, and ideology
  • 30. Justification Outcomes Test Interpretation Test Use Evidential basis Construct Validity (CV) CV + Relevance/+ Utility (RU) Consequential basis Value Implications (CV+RU+VI) Social Consequences (CV+RU+VI+UC)
  • 32. The evidential basis of Messick’s framework contains two facets 1. Traditional psychometric evidence 2. The evidence for relevance in applied settings such as the workplace as well as utility or cost-benefit.
  • 33. Evidential Basis for Test Inferences and Use  The evidential basis for test interpretation is an appraisal of the scientific evidence for construct validity.  A construct is a “definition of skills and knowledge included in the domain to be measured by a tool such as a test” (Reckase, 1998b)  The four traditional types of validity are included in this first facet.
  • 34. Evidential Basis for Test Inferences and Use  The evidential basis for test use includes measures of predictive validity (e.g., correlations with other tests of behaviors) as well as ultility (i.e., a cost-benefit analysis)  Predictive validity coefficients re measures of behavior to be predicted from the test (e.g., a correlation between scores on a road test and a written driver qualification test)  Cost- benefit refers to an analysis of costs compared with benefits, which in education are often difficult to quantify.
  • 35. The consequential basis of Messick’s framework contains two facets 1. Value Implications (VI) 1. (CV + RU + VI) 2. Social Consequences 1. (CV + RU + VI + UC)
  • 36. Value Implications Rhetoric Theories Ideologies Value Implications: The Dimensions
  • 37. Value implications requires an investigation of three components  Rhetoric or value -laden language and terminology  Value-laden language that conveys both a concept and an opinion of concept  Underlying theories  Underlying assumptions or logic of how a program is supposed to work (Chen, 1990)  Underlying ideologies  A complex mix of shared values and beliefs that provide a framework for interpreting the world (Messick, 1989)
  • 38. Rhetoric Includes language that is discriminatory, exaggerated, or over blown, such as derogatory language used to refer to the homeless. In validation practice, the rhetoric surrounding standardized tests should be critically evaluated to determine whether these terms are accurate description of knowledge and skills said to be assessed by a test (Messick, 1989)
  • 39. Theory The second component of the value implications category is an appraisal of the theory underlying the test. A theory connotes a body of knowledge that organizes, categorizes, describes, predicts, explains and otherwise aids in understanding phenomenon and organizing and directing thoughts, observations and actions (Sidan& Sechrest, 1999)
  • 40. Ideology The third component of value implications is an appraisal of the “broader ideologies that give theories their perspective and purpose (Messick, 1989) An ideology is a “complex configuration of shared values, affects and beliefs that provides, among other things, an existential framework for interpreting the world.” (Messick, 1989)
  • 41. Values implications challenge us to reflect upon: a. The personal or social values suggested by our interest in the construct and the name/label selected to represent that construct b. The personal or social values reflected by the theory underlying the construct and its measurement c. The values reflected by the broader social ideologies that impacted the development of the identified theory Messick 1980, 1989
  • 43. Social consequences refer to consequences for society stemming from the use of a measure
  • 44. Remember that construct validity, relevance and utility, value implications and social consequences all work together and impact one another in test interpretation and use.