Chapter 6 – Measuring and Interpreting Individual Differences
Chapter 6 – Measuring and Interpreting Individual Differences
- 内容提要:中国经济管理大学 MBA微课程:免费畅享MBA
中国经济管理大学 《心理学与人力资源管理》 (MBA研究生课程班)Chapter 6 – Measuring and Interpreting Individual Differences
韦恩.F.卡西欧(Wayne F. Cascio)美国科罗拉多大学(丹佛校区)商学院Robert H. Reynolds全球领导与管理讲座教授,拥有罗切斯特大学工业与组织心理学博士学位。担任美国心理学会工业/组织心理学分会、美国管理学会、全美人力资源学会等多个组织会士。
赫尔曼·阿吉斯(Herman Aguinis)美国科罗拉多大学(丹佛校区)商学院Mehalchin管理讲座教授。曾担任美国管理学会研究方法分会会长。Chapter 6 – Measuring and Interpreting Individual Differences
Overview
Chapter 6 introduces concepts surrounding the measurement and interpretation of individual differences. Chapter 6 also focuses on the relevance of individual differences to the practice of human resource management (HRM). The chapter opens with a discussion of the meaning of measurement and clarifies the differences between physical characteristics and psychological traits. Following the discussion of measurement, the various types of scales and the appropriate uses for each type of scale is presented. Special emphasis is placed on the uses of nominal, ordinal, and interval scales. The next topic is the identification of and the development of measurement devices or tests. This discussion includes the steps to develop tests and the importance of test score reliability. Special attention is paid to the different aspects of reliability and the appropriate reliability statistic per aspect. The chapter concludes with the role that reliability plays on the validity of test scores.
Annotated Outline
I. At a Glance
A. Heart of personnel psychology (human resource management)
B. Measurement of individual differences
C. Differences
1. Physical
2. Psychological
D. Psychological measurements
1. Nominal - frequent
2. Ordinal - frequent
3. Interval – statistical considerations
4. Ratio
E. Decisions require understanding of patterns of
1. Knowledge
2. Abilities
3. Skills
4. Other aspects
F. Understanding gained through tests
1. May use tests already developed
2. May develop specific or new tests
G. Analysis assists understanding test results
1. Item Response Theory
2. Generalizability Theory
H. Test Classification Criteria
1. Content
2. Administration
3. Scoring
I. Tests must have
1. Reliability
a. Dependable
b. Consistent
c. Free of Systematic Error
J. Test scores actually bands or ranges of scores
II. Introduction
A. To measure differences is to measure the variability.
B. To measure variability must know
1. how to measure accurately
2. must know when variations’ differences important
III. What is Measurement?
A. Definition
1. The assignment of numerals to objects or events according to rules.
2. Numeral assignment applies to both
a. physical characteristics
b. psychological characteristics
c. similar process
B. Measurement answers the questions1. How many?
2. How much?
3. How often?
C. Psychological measurements
1. focus on individual traits &
2. are not as precise as physical measurements
D. Traits – label for groups of interrelated behaviors
1. Dominance
2. Agreeableness
3. Creativity
4. Openness
5. Extraversion
6. Others
E. Traits - measured by comparing one person to standardized samples of behaviors from other people
F. Trait standards reflect measurement by types of scales
G. Psychological Measurement Scales generally1. Nominal or ordinal scales
2. May approximate interval scales
IV. Scales of Measurement
A. Types of Scales
1. Nominal
2. Ordinal
3. Interval
4. Ratio
B. Nominal Scales1. lowest level measurement
2. categorize or catalogue
3. numbers have no meaning (descriptive rather than quantitative)
4. assumes equality as in a = b
5. assumes exclusivity as in a ≠ b
6. must be either equal or exclusive but not both at same time
7. Nominal Scale statistics used by HRM based on frequencies
a. Chi square statistics
b. Contingency coefficients
C. Ordinal Scales1. next level measurement
2. categorize, catalogue, rank order (ranks magnitude)3. assume equality (a=b), exclusivity(a ≠ b), and
4. assume transitivity,
a. If [(a>b) & (b>c)], then (a>c)
b. If (a = b) and (b= c), then (a=c)5. Transitivity to HR means
a. one candidate is better than …
b. one candidate is stronger than …
c. one candidate is more qualified
6. Ordinal Scale Statisticsa. medians
b. percentile ranks
c. rank-order correlations
d. rank-order analysis of variance
D. Interval Scales
1. categorize, catalogue, rank magnitude
2. assume equality, exclusivity, transitivity, and additivity
3. additivity
4. assume equal size units
a. If (a>b) and (b>c), then (a>c) orb. If (a=b) and (b=c), then (a=c) and
c. (d-a) = (c-a)+(d-c)
5. Interval Scales to HRM, additivity meansa. If one candidate scores 10 points higher on a trait and that trait is valued, then this candidate is 10 points better than a candidate scoring 10 points lower.
b. Value optimism, measure optimism, chose most optimistic person
6. Interval Scales statistics
a. Central Tendency
b. Variability
c. Correlation coefficient
d. Tests of significance
e. Transformations: add, subtract, multiply, divide by constant number
E. Ratio Scales – all characteristics of interval scales and absolute zeros1. Equality
2. Exclusivity
3. Transitivity
4. Additivity
5. Absolute Zero Point
6. Not as commonly used for traits analyses
V. Scales used in Psychological Measurement
A. Psychological Measurement Scales generally are nominal or ordinal
B. Psychological Measurement Scales may approximate interval
1. Assume equality between intervals of traits
a. intellect
b. aptitude
c. personality pattern2. When interval equality is questioned, transform raw scores statistically
into equivalent derivatives
C. To HRM, psychological intervals consider the social utility1. selection: hire or reject
2. placement: which position
3. diagnosis: which remedial alternative
4. hypothesis testing: accuracy of theory
5. evaluation: what score
VI. Selecting and Creating the Right Measure
A. Otherwise known as the “right test”
B. In HR, tests are1. Written, oral, or performance
2. Interviews, rating scales, and/or scorable3. Systematic by content, administration, scoring
C. When HRM knows what to test, then must decide where & how to test1. Mental Measurements Yearbook
2. www.unl.edu/buros/00testscomplete.html
D. Selecting & Creating Tests1. Determine a Measure’s Purpose
2. Define the Attribute
3. Develop a Measure Plan
4. Write Items
5. Conduct a Pilot Study
6. Conduct Traditional Item Analysis by analyzing item
a. clarity,
b. distractors,
c. difficulty,
d. discrimination
7. Conduct Item Response Theory Analysis by analyzing item
a. difficulty,
b. discrimination,
c. probability of guessing
8. Select Items
9. Determine Reliability & Validity
10. Revise & Update
E. Selecting Tests by Classification Methods1. Content
a. Task
i. Verbal
ii. Nonverbal
iii. Performance
b. Process
i. Cognitive
1). Ability
2). Achievement
ii. Affective (inventories)
1). Interests
2). Values
3). Motives
4). Traits
2. Administration
a. Efficiency
i. Individual
ii. Group
b. Time
c. Speed
d. Power
e. Standardized or non-standardized
3. Scoring
a. Objective - unbiased
b. Nonobjective
i. bias potential high
ii. risk rater variance
4. Other considerationsa. Costs
i. direct
ii. indirect
b. Administration time
c. Interpretation of results
d. Face validity
F. Test Guidance Sources1. APA’s Guidelines for Test User Qualification
2. Sample user qualification form www.agsnet.com/site7/appform.asp
VII. Reliability as Consistency
A. Reliability
1. formally defined - freedom from unsystematic measurement errors
2. informally defined - consistency
B. Formal definition of error –
1. minimized random errors from ratee(s), rater(s), instrument
2. random error spread evenly
VIII. Estimation of Reliability
A. Reliability statistics1. correlation coefficient, r , -1.0 to +1.0
2. coefficient of determination, r₂, -1.0 to +1.0B. Reliability coefficients estimate
1. precision of procedures
2. performance consistenciesC. Reliability methods measures
1. Test – retest
coefficient of stability2. Parallel forms
coefficient of equivalence
3. Internal consistency
a. Kuder-Richardson 20
b. coefficient alpha (Cronbach)
4. Split-half also produces a coefficient of equivalence
D. Reliability consideration1. Rater consistency or Interrater reliability, error attributable to examiner
2. Measured by
a. interrater agreement
b. interclass correlation
c. intraclass correlation
3. Interrater reliability errors may be due to
a. what is observed
b. access to nonattribute information
c. interpretation expertise
d. observation evaluation
E. Reliability & Error Sources – Method and Source
1. Test-retest time sampling
2. Parallel forms content, time sampling
3. Split-half content
4. Cronbach’s alpha content
5. Kuder-Richardson content
6. Interrater agreement consensus
7. Interrater correlation consistency
8. Intraclass correlation consistency
F. Reliability & Error Source, for more information1. Standards for Educational and Psychological Tests
2. Mental Measurements Yearbook3. www.unl.edu/buros/00testscomplete.html
IX. Interpretation of Reliability
A. Interpretation of Reliability1. depends on use of scores
2. also place limits on validity
B. Factors effecting reliability1. Individual differences ranges
2. Measurement difficulty procedure
3. Sample size & representativeness
4. Standard error of Measurement
C. For HRM, standard error of measurement used to determine
1. Whether individual descriptions differ significantly
2. Whether individual measures differ significantly from hypothetical true score
3. Whether tests discriminate differences per group
4. Whether test scores represent ranges rather than precise points
X. Generalizability Theory
A. Generalizability Theory - precision with which an individual’s score represents the
generalized universe of the scores where different universes may exist
B. More current than standard reliability considerations
C. Scores represent
1. Samples from universe of admissible observations
2. Universe refers to
a. conditions where examinees can be observed or tested
b. conditions that produce a specific outcome or degree
3. Universe score – expected value of score over all admissible observations
D. Different universes are presented by facet or dimension
E. Generalizability Theory studied through 2 types of research
1. generalizability (G) – development of measurement instrument
2. decision (D) – data used to reach conclusions about individuals
XI. Interpreting the Results of Measurement Procedures
A. For HR, measurements result in
1. performance predictions
2. developmental actions
3. evaluations of behaviors
B. All decisions are relative to norms1. Norms assume normal curves (bell shaped distributions)
2. Normal curves allow comparisons by standard deviations from means
a. Individuals scores are near (or far) from the mean of average score
b. Preferred closeness or distance from mean influences HR decisions
C. Quality of the HR decisions dependent on the
1. reliability of the initial data &2. leads to the next consideration, accuracy and validity of decisions
Discussion Questions
1. Why are psychological measures considered to be nominal or ordinal in nature? Pages 112 - 114
Nominal measures are those measures that classify or catalogue. Nominal measures indicate differences in kind. Nominal measures do not have numerical properties. In other words, you cannot add, subtract, multiply, or divide nominal measures. You can use nominal measures to describe. For example, you can say nominally that this person has brown hair, is a graduate of ABC University, and is a member of the XYZ Organization. Nominal measures may be used to describe, classify, or catalogue groups, also.
Ordinal measures are those that classify, catalogue, describe, and rank order an individual by the classification, catalogue, or description. This rank order provides a measure of magnitude but does not allow numerical operations. The rank order does offer a degree of transitivity. For example, you can say this person has the brownest hair, is the smartest graduate, or is the most intense organizational member. Ordinal measures may be used to describe, classify, catalogue, and catalogue groups, too.
Psychological measures focus on the interrelationships between groups of behaviors. These interrelated behaviors are called traits. Measurements of traits attempt to quantify through the use of scales, the existence of and the qualities of the traits. Hence, nominal scales may measure psychological traits by describing a person’s score for the attribute or trait of interest. For example, a person may be classified as an optimistic, conscientious, and intellectual person. An ordinal scale measurement for this same person may result in a measurement score indicating that this person is more optimistic than the last applicant, exhibits less conscientiousness than the desired candidate, but is smarter than the current job incumbent.
2. Is it proper to speak of the reliability of a test? Why? Pages 121 - 129
Within the context of HRM, tests are either procedures or psychometric instruments designed to measure or to assess knowledge, skills, or abilities that are related to the performance of jobs. With this as the frame of reference, the reliability of a test refers to the dependability or consistency of the test results. Specifically, HRM needs test results to lead to dependable outcomes for decisions based on the results. That said, the formal definition of test reliability is a test’s freedom from unsystematic measurement errors. The formal definition of measurement error includes the variation in results that are not related to the test. For example, errors may occur due to the participant’s inattention, to environmental causes, to outside influences, or to the test administrator’s inattention, misunderstanding, distraction, etc. These errors are called random errors. The definition of the errors may change per test. A reliable test is either error free or performs with the error spread evenly over all participants.
3. Which methods of estimating reliability produce the highest and lowest (most conservative) estimates? Page 129
All methods for estimating reliability are based on correlation coefficients and, therefore, their range of values will fall between the ranges of -1.0 to +1. Each method of reliability estimation considers different conditions that produce unsystematic changes in the test scores. Therefore, the reliability measures may be considered as more or less conservative or sensitive to unsystematic test score variance.
When variance over time is of interest, computing a test-retest correlation is appropriate. The time should be long enough to account for any increase that may be due to practice. Generally, six months is sufficient for sensory discrimination, psychomotor tests, and some tests of knowledge. These reliability coefficients are called coefficients of stability.
For some tests, time is not available to collect test-retest data but alternate forms of the same test are available or different tests for the same content domain may be available. When the same content domain is tested, the alternate forms reliability can be checked by the coefficient of equivalence. When this method is adopted the two forms should be administered as close together as is possible with the first half of the group receiving one form first and the second half receiving the other form first.
When neither alternate forms nor time is available, a test’s internal consistency can be examined through the use of reliability estimates known as split half, the Kuder-Richardson KR-20, or Cronbach’s coefficient alpha. The coefficient alpha is appropriate to use when items use scoring methods such as always, sometimes, occasionally, or never. Personality inventories are examples of these types of tests. KR-20 is appropriate when items are scored as right or wrong or by an all-or-nothing system such as with multiple choice tests. Split half estimates are conceptually similar to the alternate forms method with the data collected at the same time. It is best to choose the items for each half statistically (by random choice), but it may be easier to understand by dividing the items as even or odd or the first half relative to the second half.
4. Is interrater agreement the same as interrater reliability? Why? Page 128
Interrater agreement is similar to interrater reliability but does not have the same meaning. First, consider what intrarater reliability means. Intrarater reliability refers to the consistency with which one rater evaluates different ratees based on equivalent information. Interrater reliability refers to the consistency with which different raters evaluate the same ratee (or a group of ratees) when given equivalent information. This approach is similar in concept to alternate forms reliability checks with each rater acting as an alternate form. Conceptually then, interrater agreement refers to the perceived (calculated) systematic measurement (absolute agreement, consensus) while interrater disagreement refers to the unsystematic sources of error.
5. What type of knowledge can be gathered through the application of item-response theory (IRT) and generalizability theory (GT)? Pages 117, 118, 134, 135
IRT explains the effect of individual differences on the behavioral responses to specific items. These differences per item can be graphed for easier comprehension. The graphs offer pictures of the item’s discrimination, the item’s difficulty, and the probability of guessing correctly by chance. IRT contributes to decisions surrounding the selection of items, to the reliability and validity of items, and to the revision and updating of items.
GT changes the focus of reliability from a focus on test scores to the possibility that test scores represent only one possible score from a universe of possible scores from a universe of all admissible scores from all possible test takers. GT allows raters to consider that there are greater ranges of scores which reliably measure potential applicants. This allows raters to take a longer range view and to recognize that tests and test items require the collection of more data rather than more specific data.
6. What does the standard of error of measurement tell the HR specialist? Pages 131, 132
The standard error of measure is an estimate of the standard deviation of the normal distribution of scores that one person would receive if this one person took the test a large number of times (theoretically, an infinite number of times). An HR specialist finds the standard error of measurement useful in multiple ways. These uses include (1) whether the measures differ significantly when describing individuals, (2) whether an individual measure is significantly different from some hypothetical true score, (3) whether a test discriminates differently in different groups, and (4) the realization that test scores will reflect ranges of scores rather than absolute (exact) points.
7. What is scale coarseness? How can we address scale coarseness before and after data are collected? Pages 132 - 134
Scale coarseness may lead to errors in measurement predictions, specifically, systematic downward bias in the correlation coefficients. Scale coarseness may be said to exist when a continuous psychological concept is measured through a collapsed scale such as with Likert-type or ordinal items. For example, we have a tendency to assume that rating an attitude along a scale from 1 to 7 yields a maximum attitude measurement of 7 and a minimum attitude measurement of 1. However, attitudes may be greater (more intense, stronger) than 7 or could conceivably be absent (less than 1). To address scale coarseness before collecting data, you can make sure that the research design is as strong as it can be given the circumstances surrounding the research. For example, if an attitude scale is to be used, consider using a continuous, graphic-rating line rather than forced choice rating. Scale coarseness can be addressed after data is collected through the use of software designed to provide statistical correction.
8. What do test norms tell us? What do they not tell us? Pages 135 – 138
Generally, test norms tell you where a specific score falls relative to an average score for the test. Norms also tell you how many other scores for this test are near the average score and to the score of interest. Since scores represent performance for existing traits or attributes, norms allow you to compare one person to other people relative average performances. However, normative comparisons are only as good as the person’s applicable membership in the normative group. In other words, there can be many normative groups and many normative comparisons but if the correct normative group is not used, the comparison may be meaningless.
These comparisons do not make up for sampling inadequacies either. Norms do not tell you everything you need to know to make accurate decisions about people based on reliable test scores. You can improve the quality of the decisions by improving the reliability and by improving the norms upon which comparisons are drawn. One way to do this is to determine norms relative to the group characteristics pertinent to the decisions. This is done by taking raw scores, calculating percentile rankings, then statistically (through the use of constants) changing the percentile ranking to interval measures.给自信 更公益
中国经济管理大学《公益教育宣言》
中国经济管理大学——每个人都有受教育的权利和义务,不分民族、性别、宗教、语言、社会出身、财产或其它身份等任何区别。
中国经济管理大学——MBA/EMBA培训不应该仅仅属于富人的专属特权,更不应该让天价学费阻碍为那些有管理潜力普通大众的求学之路。
中国经济管理大学——中国经济管理大学EMBA公益研究生院(免费学堂)勇当教育公益事业先行者,2015继续让公益培训遍结硕果。每月2-4次免费专题培训。
中国经济管理大学——为有潜力的管理人才、培训合格人才免费颁发合格证书,筹建高端管理人才库,让每一位学员享有金牌猎头服务。
中国经济管理大学
中国经济管理大学|中国经济管理大学培训
中國經濟管理大學版權所有