Chapter 7 – Validation and Use of Individual Differences Measures
Chapter 7 – Validation and Use of Individual Differences Measures
- 内容提要:中国经济管理大学 MBA微课程:免费畅享MBA
中国经济管理大学 《心理学与人力资源管理》 (MBA研究生课程班)Chapter 7 – Validation and Use of Individual Differences Measures
韦恩.F.卡西欧(Wayne F. Cascio)美国科罗拉多大学(丹佛校区)商学院Robert H. Reynolds全球领导与管理讲座教授,拥有罗切斯特大学工业与组织心理学博士学位。担任美国心理学会工业/组织心理学分会、美国管理学会、全美人力资源学会等多个组织会士。
赫尔曼·阿吉斯(Herman Aguinis)美国科罗拉多大学(丹佛校区)商学院Mehalchin管理讲座教授。曾担任美国管理学会研究方法分会会长。Chapter 7 – Validation and Use of Individual Differences Measures
Overview
Chapter 7 deals with issues surrounding measurement validity. Chapter 7 also presents some of the procedures for determining the validity of individual differences measurements. Individual difference measurements derive meaning only to the extent that the measurements can be related to other psychological behaviors. For example, measuring motivation is meaningful only if the measurement predicts future performance. Validity then is important with regard to whether a test measures what it is designed to measure and how well this test measures what it is designed to measure. Test validity can be determined by examining the test’s content, its relationship to other measures of the same test constructs (concurrent validity), its predictive accuracy for future behaviors, and its ability to measure the psychological characteristic of interest (construct). The chapter also presents different strategies for conducting empirical validation studies. Concluding topics focus on the relevance of validation studies to the practice of HRM and the necessity of validation studies in adherence to laws surrounding the practice of HRM.
Annotated Outline
I. At a Glance
A. Meanings of individual difference scores
1. Follow from relationships to other psychological & observable behaviors
2. Result from intentional validation studies
3. Influenced by reliability measures
B. Validation issues
1. Does test measure what it intends to measure?
2. How well does test measure its intended aspect?
C. Types of validity approaches
1. Content
2. Construct
3. Concurrent
4. Predictive
D. Validation strategy considerations
1. Group differences
2. Range restrictions
3. Position in employment process
4. Test-predictor relationship
E. Other validity approaches (may converge to yield better validation)
1. Validity generalization
2. Synthetic validity
3. Test transportability
F. Validation efforts reflect legal adherence
II. Relationship between Reliability and ValidityA. Simple definitions
1. Reliability is consistency, dependability2. Validity is accuracy, truthfulness
B. Why does HRM care?1. Because, theoretically, a measure can be reliable but have no validity
2. With reliability, validity is enhanced, its limits clarified
C. Examples
1. You could reliably measure surgical vocabulary, but be unable to accurately measure surgical skills.
2. You could reliably disassemble a firearm, but be unable to shoot a target.
D. Reliability is therefore a necessary measurement property but not the only important property for measures.
E. Reliability must be combined with validity.
1. When the reliability coefficient of a criterion is known, it can be statistically adjusted for unreliability.2. This is called “correction for attenuation in the criterion.”
3. This process helps HRM account for more variance in measurements and increase and improve decisions based on the criteria measured by tests.
4. In other words, quality HRM decisions require knowledge of measures’
reliability and validity.
F. Caution with reliability coefficients:
1. Choose reliability statistic carefully.
2. Reliability statistic will influence magnitude of validity.
III. Evidence of Validity
A. Validation is a process evaluating criterion interrelationships. It is not a single measure.
B. Validation processes tell researchers
1. what is measured and
2. how well “it” (the criterion) is measured
C. Validity is unitary concept
1. validity exists or not
2. evidence for it varies
3. exists as matter of degree
D. Of interest & necessity is the accuracy of the inferences made about a person based on the reliability and validity of test measures.
E. To evaluate validity, there are 3 primary strategies1. content-related evidence
2. criterion-related evidence
3. construct-related evidence
IV. Content-related Evidence
A. Content-related asks whether the measurement procedure represents a
fair sample of the universe of situations for the behavior of interest to happen.
B. To HRM Content-related validation is concerned with the measurement’s representation of the job.
C. An assumption then is that similar jobs involve similar performance domains.D. Valid content measures reflect job performance.
E. Content-related measures become more difficult as jobs become more complex and abstract. Hence, it is appropriate to think of content-oriented test measures’ development.
F. Examples1. Simple content (construct) - a typing test for administrative assistants
2. Complex content (construct) - predicting success of a film based on a screenplay or a novel.
G. Content-related legal precedent1. Guardians Assn. of N. Y. City Police Dept. v. Civil Service Comm. of City of
N.Y., 19802. True question is not if work content (construct) is measured but what class of content (construct) is measured
H. To determine content-related validity, researchers must focus on test construction
and not the inferences about the test scores.1. Establish content evaluation panel
2. Calculate content validity index
3. Calculate substantive validity index
4. Conduct content adequacy procedure
5. Conduct Analysis of variance
I. For content-related validity, the primary goal is predicting future performance
by describing existing scores. This means criterion behavior must be considered, too.
V. Criterion-related Validity Evidence
A. Criterion-related validity evidence asks whether a measurement procedure indicates that a behavior is present and may predict that a future behavior will occur.
B. In other words, criterion-related validity evidence tests the hypothesis that
measurement scores are related to criterion performance.
C. There are two types of Criterion-related validity procedures.
1. Concurrent validity exists when a criterion measure exists at the same time as a predictor measure.
2. Predictive validity exists when a criterion measure becomes available after the point in time when the predictor measure is taken.
3. Concurrent & predictive validity differ by timing, contexta. Concurrent – can you do the job right now
b. Predictive – can you do the job in the future
D. Criterion-related predictive studies are the foundation individual differences and are conducted by the following steps.1. Measure candidates for job
2. Select candidates without using results of measurement procedure
3. Obtain measurements of criterion performance at later date
4. Assess the strength of the relationship between predictor and criterion
E. Criterion-related predictive study issues include1. Sample size - the larger the sample size, the greater the power
(smaller samples can be accommodated statistically)
2. Statistical power (Type II error) - rejecting null when null is false
can increase with larger samples and with larger region for rejecting null
3. Null hypothesis rejection (Type I error) - usually set at .05 or .01 by
determining a desired power and estimating a desired effect size as
small, medium, or large by the correlation coefficients between the criteria
4. Magnitude of the effect (statistical significance and meaningfulness) - can be predetermined as small, medium, large based on the correlation coefficients and is easier to achieve w/large sample
a. When power and effect size specified, can use www.StatPages.net
to determine needed sample size
b. Larger sample needed for power, smaller sample ok w/larger effects
c. With fixed samples & effects, changing alphas may help.
F. Other Issues for Predictive Studies
1. Acknowledge small samples & continue to collect data over time
2. Consider time between initial measure and criterion performance appraisal
3. Samples must be representative
4. Should include actual applicants who are motivated to do well
G. Criterion-related Concurrent Study Validation Method looks at predictor data and actual performance concurrently or at same time.
1. Determine criterion (or criteria)
2. Collect successful employees’ criterion measures
3. Collect performance appraisals of successful employees
4. Compare relationship between criterion measure and appraisals
5. Choose new applicants based on closeness of scores to successful performers
H. Criterion-related Concurrent Study Validation Method – may be more cost effective than traditional predictive validity study but it does not account for the effects of motivation & job experience.
I. Criterion-related Concurrent Study Validation Method Issues1. Appear useful for cognitive ability test measures
2. Are not interchangeable with predictive studies
3. Must consider situations surrounding the study and uncontrolled variables
J. Issues for both Predictive and Concurrent Criterion-related Validity Studies1. Sensitivity to random errors?
2. Dependably indicate differences?
3. Free from contamination?
4. Performance criteria must be collected independently of predictor criteria.
VI. Factors Affecting the Size of Obtained Validity Coefficients
A. Factors affecting obtained validity coefficients include1. Range Enhancement
a. validity will appear falsely high if the validation group is broader than the applicant poolb. example - validating with machinists, mechanics, tool crib attendants, and engineers but predicting for engineers only
2. Range Restriction
a. Validity appears too low if either the predictor or the criterion are limited
i. Direct range restriction - measures used to select prior to validation
ii. Indirect range restriction - experimental predictors administered but not included in employment decision
b. May also occur when predictor selection occurs at hiring and the
criterion selection occurs while on the job
c. To interpret validity given range restrictions, apply appropriate statistic by determiningi. If restriction exists for predictor, criterion, or a third variable
ii. Whether unrestricted variances for relevant variables exist
iii. Whether a third variable is measured or unmeasured
d. To interpret validity given range restrictions where unrestricted variances are unknown, may use multivariate correction formula and/or the RANGEJ computer program
3. Position in employment process
a. Variance is greater during selection
b. Variance is restricted during later employment processes
4. Form of predictor criterion relationship
a. Normal distributions are assumed
b. Predictor & criterion relationships linear
c. Column segmentation variance equal
VII. Construct-related Evidence
A. Construct-related Validity Evidence focuses on the meaning of the trait and allows the interpretation of scores.
B. Construct-related Validity assumes that there is a nomonological network of interrelated concepts, propositions, and laws with regard to the behaviors relevant to the HRM decisions. In other words, the successful candidate or employee can be assessed by more than one trait.
C. Construct-related Validity Evidence Method
1. State hypotheses
2. Define traits nomonologically
3. Ask test takers about their strategies
4. Analyze internal consistency of items
5. Consult behavioral domain experts
6. Correlate procedures
7. Factor analyze group of procedures
8. Conduct structural equation modeling
9. Consider scores’ discrimination
10. Demonstrate relationships
11. Analyze convergent & discriminant validity
a. Convergent – multiple measures same construct
b. Discriminant – differing from measures of other constructs
VIII. Cross-validation
A. Defined – when scores from one sample accurately predict outcomes for other samples of same population or for the whole population.
B. Cross-validation is often assumed to exist & needs verification.
C. Cross-validation issue to address is the phenomenon of shrinkage.1. Shrinkage occurs when weighted predictors from one sample are applied to another sample.
2. Shrinkage may be large
i. when the initial sample is small
ii. when test items are not relevant
iii. when there are many predictors
D. Cross-validation methods may be1. Empirical –
i. apply regression model to sample
ii. apply same model to second sample2. Statistical – adjusts multiple correlation coefficient as function of sample size and number of predictors
E. Cross-validation comparisons1. Empirical – costly and may not yield better results than statistical
2. Statistical –
i. should be recalculated annually
ii. numbers will reflect changes in values and changes to job needs
IX. Gathering Validity Evidence When Local Validation is not Feasible
A. Small organizations with few positions cannot conduct full validation studies.1. Small organizations can apply synthetic validity,
2. test transportability, &
3. validity generalization.
B. Gathering Validity Evidence
1. Synthetic validity – job analysis based
2. Test transportability
3. Validity generalization
B. Synthetic validity
1. Job analysis based and infers validity for jobs
2. Offers legal acceptance
3. Offers feasibility
4. Job Analysis Tool Examplesa. Position Analysis Questionnaire (PAQ)
b. General Aptitude Test Battery
C. Test transportability –
1. Using a test developed elsewhere2. Requires
a. results of the other study’s criterion-related study
b. results of test fairness
c. documented job similarities
d. documented applicant similarities
D. Validity generalization –
1. Meta-analyses focused on testing the situational specificity hypothesis between variables2. Meta-analyses are statistical comparisons of
two variables across studies
analyzes the variability of this relationship across studies
E. Methods for Gathering Validity Evidence1. Obtain validity coefficient for each study, compute the mean
2. Calculate the variance
3. Correct the mean and variance
4. Compare corrected standard deviation
5. For large variations, analyze moderator variables
Discussion Questions
1. What are some of the consequences of using incorrect reliability estimates that lead to over- or underestimation of validity coefficients? Pages 139 - 142
Reliability estimates set the ceiling or upper limit for validity coefficients. Reliability coefficients may also set the lower limits or basement value for validity coefficients. This can be explained by understanding that validity represents the accuracy and the meaningfulness of measures of related behaviors while reliability coefficients represent relationships between two behaviors. In this case, behaviors may be thought of measured individual differences on some measures or the scores on tests. Another way to think about the behaviors is to think of them as predicted performance on a job versus actual performance on a job.
With this as the frame of reference then, two behaviors may appear to be and may actually be related but have no accuracy with regard to reality (reliable but not valid). The example offered by the text is that police officer positions appear to require higher skills than detectives but, in fact, do not. This can be explained by the measures used. The measures reliably measured police officer duties but did not reliably measure detective duties. Therefore, these reliability coefficients were very high but the corresponding validity coefficients were extremely low.
Another example to consider might be in the selection of sales representatives for technical products. A person may reliably (consistently) receive high scores for measures of product knowledge but be completely unsuccessful for closing sales contracts (very low validity). However, these same individuals might consistently score high (high reliability) for technical knowledge and be very successful with post-sales support (high validity). These two scenarios bring home the point that reliability and validity are interdependent but that the correct types of reliability coefficients must be considered when trying to interpret corresponding validity coefficients. To determine the correct coefficients to use, consider the types of measurement instruments, the procedures used to collect the data, the types of decisions to be made based on the results, and the underlying constructs of the measurement devices.
2. Explain why validity is a unitary concept. Pages 141 - 142
First, for this consideration, unitary means a concept of wholeness represented by a numerical value. Next, validity is a unitary concept in that it exists or not (wholeness) and that the degree or existence of this concept can be represented by a numerical measurement. Finally, the measurement may be calculated mathematically as a result of varying procedures. The measurement of validity, therefore, will represent different types of evidence rather than different types of validity.
3. What are the various strategies to quantify content-related validity? Pages 142 - 145
Content-related validity is concerned with a test’s ability to accurately measure the intended domain of the test. There are at least four different strategies to quantify content-related validity. One strategy is to calculate a content validity index. This CVI requires the participation of an equal number of incumbents and supervisors. Both groups rate a set of test items as being essential, useful but not necessary, or not necessary to performance of the job being studied. Using the two sets of scores, a content validity ratio is calculated. The CVI represents the overlap between the capability to do the job versus performance on the test. A second strategy is to calculate the substantive validity index (SVI). The SVI considers panel members ratings of items’ classifications to particular constructs. The SVI is an extension of the CVI calculation.
Two other strategies to calculate content-related validity are to review the content adequacy procedure and the analysis of variance approach. To calculate content adequacy, Likert type scales are used by panelists to rate the test items by their applicability to the constructs under consideration. The analysis of variance approach is similar to the content adequacy approach except that the panelists rate the items with only one construct in mind rather than all of the constructs in mind. For this approach, a sample size of approximately 50 people is required.
4. Explain why construct validity is the foundation for all validity. Pages 153 - 157
Content-related validity is concerned with a test’s ability to accurately measure the intended domain of the test. Criterion-related validity (both concurrent and predictive) considers the empirical relationship between a predictor variable and a criterion behavior. As such, neither provides a framework for improving the predictions of individuals’ performance. Construct validity is specifically concerned with the improvement of these predictions and provides the basis for interpreting individual differences (measurement scores).
Construct-related validity starts with investigators stating a hypothesis with regard to the relationship between a score on a test (measurement) and expected outcome for a criterion (performance appraisal). These hypothetical predictions assume that the desired behaviors exist within a nomonological network. This network is a system of interrelated concepts, propositions, and laws interrelated with an observable set of behaviors. Following the formation of the hypotheses, predictive data is collected, analyzed, and compared to actual outcome data. Operational behavioral definitions must be very clear, raters and performers must understand the assumptions and expectations, and care must be taken with the collection of the data and the analysis of the data not to cause more error or to create conditions with procedural variance. Adequate construct-related validity will provide information with regard to the convergence of behavioral constructs and will identify areas of discrimination for the different behaviors.
5. Why is cross-validation necessary? What is the difference between shrinkage and cross-validation? Pages 157, 158
Cross-validation is necessary to account for changes in values, changes in jobs, and changes in people who are responsible for these jobs. Shrinkage is a statistical phenomenon that occurs when predictions about future performances are based on the measurements taken from a sample of individuals. For HRM, this means that when a population of interest is defined and a sample of people from the population provide initial decision-making data on multiple predictors, the corresponding multiple correlation between the predictors and the actual criterion will decrease (shrink) for another sample from the same population. Shrinkage will be greater when the initial sample size is small, miscellaneous predictors are chosen, and when there are multiple predictors.
Shrinkage can be understood, controlled, and minimized by conducting cross-validation analyses. Cross-validation means that weights from one sample can predict the behavioral outcome of additional samples from the same population. Cross-validation can be completed in two ways, empirically and statistically. Empirical cross-validation can be completed by collecting data from two independent samples of the same population or, if the initial sample is large enough, by analyzing the data as if it came from two groups. Statistically, cross-validation can be completed by adjusting the multiple correlation coefficients by a function of the sample size and the number of predictors through the use of existing formulas. As it is not always possible to collect empirical cross-validation data, statistical procedures can be very useful.
6. What factors might affect the size of a validity coefficient? What can be done to deal with each of these factors? Pages 149 – 153
The size of a validity coefficient is a function of two variables, the predictor and the criterion. When sample sizes are small, there is an inherently larger sampling error. There also is a built in range restriction for the scores.
Some additional factors that affect the size of a validity coefficient are (1) the initial sample size, (2) a shotgun approach to test question inclusion, and (3) when the number of predictors increases. When sample sizes are larger, the probability that unique or random individual differences exist is spread more evenly over all the participants hence increasing the probability that true differences will be due to the behavior or construct of interest. The most straightforward way to deal with small sample sizes is to increase the number of participants in the sample. Another coping method is to acknowledge the inherent weakness of a small sample size and attempt to increase the available sample data over time.
When many questions or items are included in a test (measure) without consideration for their relationship to the criterion of interest, then the resulting validity coefficient may be smaller. To deal with this, over time, the test authors or developers should thoughtfully consider each item (question) before it is included and they should check each item’s performance, each group of items performance, and the test’s performance. This means checking and monitoring the respective reliability and validity coefficients. While larger samples generally are better, more test items are not necessarily better.
Increasing the number of predictors may result in increasing the error variance within the sample and decreasing the validity coefficient. While not always true, larger samples or changing samples may increase the number of irrelevant factors present due to chance. This occurrence may be minimized by making sure that all sample participants are chosen randomly and are representative of the population and are in an initially large sample group.
7. Provide examples of situations where it would be appropriate and inappropriate to correct a validity coefficient for the effects of range restriction. Pages 150 – 153
When range restriction does exist or when the potential for range restriction exists, it may be classified as either direct or indirect (incidental) range restriction. The type of restriction will influence the corrective actions that may be taken. For example, when only the applicants who make it past initial screening are included in a selection sample, direct range restriction exists. Whether it would be appropriate to go back to the entire pool of applicants to collect sample data will depend on circumstances surrounding the process. For example, if it is possible that legal challenges may result or have been experienced in the past, the organization’s HRM may decide to include all applicants in the validation sample. If the positions involved are not subject to legal challenge, then a sufficient sample may be collected from applicants who get through the screening.
Indirect range restriction may exist when a predictor variable is applied to applicants but then is not used for selection decisions. For example, requiring all applicants to have a degree in a certain major but deciding to hire or to place a person based on interview impressions. Indirect range restriction may also result in prejudicial or biased decision making. For indirect range restriction to be corrected it must be acknowledged and understood. In the interest of fairness, it should be addressed but in practice, the relative risk may not be perceived costly enough to warrant the correction. In this circumstance, acknowledgement may be the only corrective action deemed necessary. Interpretation of the respective validity coefficients should reflect awareness of both direct and indirect range restriction.
8. What are some of the contributions of validity generalization to human resource selection? Pages 160 - 163
Validity Generalization (VG) is a meta-analysis of specific HRM variables across research studies. A meta-analysis is a quantitative literature review that considers the relationships between the same two variables across multiple studies, across employers, across regions, across time frames, and so forth. VGs shed light as to whether predictors and criterion use has broad applicability (generalizability) or is situation specific. VGs help HRM researchers better understand the applicability and the limitations of predictive and criterion variables. With the contributions of VGs, large organizations may be able to justify the use of test measures without having to conduct an additional validity study thus saving personnel expenses while still following the letter of the laws for HRM practices and making high quality selection decisions for the organization. For small organizations, the use of VGs makes it possible to implement tests that have been validated by other organizations for similar jobs.
9. What are some challenges and unresolved issues in implementing a VG study and using VG evidence? Pages 162 - 165
One challenge to VG studies is that of job specificity within organizations both large and small. In this case, the need is considered as local (within). In other words, VG results may not apply to all jobs within an organization. Some legal challenges have resulted in VG evidence being considered inadequate to support the decisions. VG evidence needs to be included as part of a complete validity analysis that also includes data with regard to synthetic validity and test transportability.
Synthetic validity refers to the process of validating each job situation and its corresponding job elements based on a detailed job analysis. From this analysis, a combined validity or a synthesized validity coefficient is produced and used to assess the overall validity of the test, the procedures, the decisions, etc.
Test transportability is an accepted validity analysis when local validation is not feasible. Test transportability means that a test developed and validated elsewhere is used by another organization. Test transportability is acceptable when the following conditions are met. There is evidence provided (1) for a criterion-related validity study, (2) for a test fairness analysis, (3) the degree of similarity for the jobs in the local and the validation organization, and (4) the degree of similarity between applicants in the local and the validation organization.
10. What are some of the similarities and differences in gathering validity evidence in large, as compared to small, organizations? Pages 160, 161, 165
Whether small or large, organizations must be able to substantiate that strong valid procedures exist to support their HRM decisions. For large organizations, the data may come from within the organization. For small organizations, the data may be the result of VGs completed across other organizations. Large organizations may choose to strengthen their HRM decisions by using VG results, too.
Regardless of organizational size, to support their HRM decisions, organizations must possess a thorough understanding of the necessary statistical processes and resulting validity coefficients with supporting reliability coefficients. That said, statistics alone are not enough. HRM must consider the practicalities of their HRM policies along with evidence of thoughtful usage of statistical data. Statistical data supports the decisions of HRM but is not a substitute for thoughtful, deliberate decisions regarding the usefulness of individual difference data.给自信 更公益
中国经济管理大学《公益教育宣言》
中国经济管理大学——每个人都有受教育的权利和义务,不分民族、性别、宗教、语言、社会出身、财产或其它身份等任何区别。
中国经济管理大学——MBA/EMBA培训不应该仅仅属于富人的专属特权,更不应该让天价学费阻碍为那些有管理潜力普通大众的求学之路。
中国经济管理大学——中国经济管理大学EMBA公益研究生院(免费学堂)勇当教育公益事业先行者,2015继续让公益培训遍结硕果。每月2-4次免费专题培训。
中国经济管理大学——为有潜力的管理人才、培训合格人才免费颁发合格证书,筹建高端管理人才库,让每一位学员享有金牌猎头服务。
中国经济管理大学
中国经济管理大学|中国经济管理大学培训
中國經濟管理大學版權所有