Reliability, in a broad statistical sense, is synonymous with:
consistently good
consistently bad
consistency
validity
A reliability coefficient is:
an index
a proportion of the total variance attributed to true variance
unaffected by a systematic source of error
all are correct
Which of the following is true of systematic error? Systematic error:
significantly lowers the reliability of a measure
insignificantly lowers the reliability of a measure
increases the reliability of a measure
has no effect on the reliability of a measure
As the degree of reliability increases, the proportion of:
total variance attributed to true variance increases
total variance attributed to true variance decreases
total variance attributed to error variance increases
none are correct
Computer-scorable items have virtually eliminated error variance, because of:
item sampling
scorer differences
content sampling
test-takers' reactions to environmental variables
Christopher Titus did not read and study Chapter 1 in the text until the night before an exam was given over Chapters 1-5. Christopher amazed himself (and others) by getting 90% of the items on the exam correct. As a budding expert in tests and measurement, you might best explain Christopher's success with reference to:
the concept of item sampling, especially as it relates to whoever wrote the exam
the concept of error, especially as it relates to whoever scored Christopher's exam
the concept of norming, especially as it relates to members of the normative sample
all are correct
Poorly worded items that cause students to differentially respond to the same questions contribute to what type of error variance?
content sampling
test administration error
test-scoring and interpretation variance
content sampling and test-scoring and interpretation variance
What type of reliability estimate is obtained by correlating pairs of scores from the same person (or people) on two different administrations of the same test?
parallel forms
split-half
test-retest
none are correct
What type of reliability estimate would be appropriate only when evaluating the reliability of a test that measures a trait that is relatively stable over time?
parallel forms
split-half
alternate forms
test-retest
Test-retest reliability estimates would be least appropriate for:
IQ tests
tests that measure moment-to-moment mood(s)
academic achievement tests on topics, such as ancient history
tests that measure art aptitude
An estimate of test-retest reliability is often referred to as a coefficient of stability when the time interval between the test and retest is more than:
30 days
60 days
3 months
6 months
Which of the following is true for parallel forms of a test?
the means of the observed scores are equal for the two forms
the variances of the observed scores are equal for the two forms
the means and variances of the observed scores are equal for the two forms
none are correct
Which source of error variance affects parallel- or alternate forms reliability estimates, but does not effect test-retest estimates?
fatigue
learning
practice
item sampling
Test-retest estimates of reliability are referred to as measures of ____________, and split-half reliability estimates are referred to as measures of ____________.
true scores; error scores
internal consistency; stability
inter-scorer reliability; consistency
stability; internal consistency
Which of the following is usually minimized when using split-half reliability estimates as compared with test-retest or parallel forms reliability estimates?
time and expense
reliability and validity
reliability only
none are correct
Which of the following factors may influence a split-half reliability estimate?
fatigue
anxiety
item difficulty
all are correct
For a heterogeneous test, measures of internal consistency reliability will tend to be ____________ compared with other methods of estimating reliability.
higher
lower
very similar or higher
not effected by heterogeneous items
Error variance for measures of inter-item consistency comes from:
fatigue
motivation
a test-taker practice effect
heterogeneity of the content
If items from a test are measuring the same trait, then estimates of reliability yielded from split-half methods will typically be ____________ compared with KR-20.
higher
lower
similar
exactly the same
Which of the following is NOT an acceptable way to divide a test when using the split-half method of reliability?
randomly assign items to each half
assign odd-numbered items to one half and even-numbered items to the other half
assign the first half of the items to one half and the second half of the items to the other half
assign easy items to one half and difficult items to the other half
Which is NOT an assumption that should be met in order to use KR-21?
items should be dichotomous
items should be of equal difficulty
items should be homogeneous
items should be scored by the same scorer
Which is a synonym for inter-scorer reliability?
inter-judge reliability
observer reliability
inter-rater reliability
all are correct
Which BEST conveys the meaning of an inter-scorer reliability estimate of .90?
90% of the scores obtained are reliable
90% of the variance in the scores assigned by the scorers was attributed to true differences and 10% to error
10% of the variance in the scores assigned by the scorers was attributed to true differences and 90% to error
the test is stable
Which type(s) of reliability estimates would be most appropriate for a measure of heart rate?
test-retest
alternate form
inter-rater
all are correct
If a time limit is long enough to allow test-takers to attempt all items, and if some items are so difficult that no test-taker is able to obtain a perfect score, then the test is referred to as a ____________ test.
speed
power
reliable
valid
Interpretations of criterion-referenced tests are typically made with respect to:
the total number of items the examinee responded to
the material the examinee evidenced mastery of
a comparison of the examinee's performance with that of others who took the test
the total number of items the examinee passed or failed
The statement "68% of the scores for a particular test fall between 58 and 61" is:
a confidence interval
the reliability of the test
the validity of the test
all are correct
The standard error of measurement of a particular test of anxiety is 8. A student earns a score of 60. What is the confidence interval for this test score at the 95% confidence level?
52-68
40-68
44-76
36-84
As the confidence interval increases, the range of scores a single test score is likely to fall into:
decreases
increases
remains the same
first increases, then decreases
If the standard deviations of two tests are identical, but the reliability is lower for Test A as compared to Test B, then the standard error of measurement will be ____________ for Test A as compared with Test B.