The authors of the text suggest that the beginnings of any published test can be traced to:
"self-talk"
the test construction process
identifying an appropriate sample
identifying an appropriate publisher
Item analysis may include evaluation of:
item reliability
item validity
item difficulty
all are correct
The idea for a new test may come from:
social need
review of the available literature
commonsense appeal
all are correct
Scaling refers to:
a measurement process
ways in which the difficulty of achievement tests are calibrated
a method of determining test variability
ways in which numbers are assigned to a measurement instrument
Which statement is true of scales?
there is one best type of scale
ratio scales are the most frequently used by psychologists
many methods of scaling exist
scaling is important in the test standardization process
Test items that contain alternatives with five points ranging from "strongly agree" to "strongly disagree" employ:
Guttman scaling
Likert scaling
Simmons scaling (not Richard)
opinion scaling
Experts recommend that the first draft of a test include AT LEAST how many items compared with the final version?
twice the number of the final version
the same number as the final version
one-quarter the number of the final version
three times the number of the final version
Components of a multiple-choice item include:
a stem
a distractor
a foil
all are correct
Multiple-choice items tap primarily:
recognition
organization
planning
perceptual-motor skills
Which statement characterizes the test tryout phase of test construction?
test conditions should be as similar to the actual administration as possible
a large number of participants should be included to ensure accurate results
the sample should be nationally representative
the sample should not have had prior coaching
A suggested minimum number of participants for use in a test tryout is?
one-half of the expected standardization sample
25
50
four times the expected standardization sample
If 100 people take a test and 20 people answer a particular item correctly, then the "p" value of the item is:
.25
.20
.40
.80
An item-difficulty index can range from ____________ to ____________.
.10; .99
0; 1
.25; .75
0; 100
In what situation does an item-difficulty index of 1 occur?
all examinees answer the item incorrectly
all examinees answer the item correctly
all examinees evenly respond to the item correctly and incorrectly
none are correct
Generous time limits are typically associated with:
speeded conditions
power conditions
untimed conditions
hazardous conditions
Ability tests are typically standardized on a sample that is representative of the general population and selected on the basis of variables, such as:
age
gender
geographic region
all are correct
A test manual for commercially prepared tests should include:
a description of the test development procedures used
test-retest reliability data
internal consistency reliability data
all are correct
A professor who asks a colleague to regrade a set of essay questions is most likely attempting to address concerns about:
rater reliability
content validity
criterion-related validity
test-retest reliability
Who is associated with the development of the methodology of scaling?
Galton
Wundt
Spearman
Thurstone
Which does not represent a type of question to be raised and answered during the test conceptualization stage of test development?
what is the objective of the test?
is there a need for the test?
how valid are the items on the test?
what types of responses will be required of the test-taker?
For a test designed for classroom use, which represents an advantage of using true-false over using multiple-choice items?
true-false items are applicable to a wider range of subject areas
true-false items are easier to write
true-false items reduce the chances of guessing by the examinee
true-false items achieve acceptable levels of reliability
Which best describes a "good" test item?
almost all test-takers respond correctly indicating the item is well written
the item distinguishes high scorers from low scorers
almost all test-takers respond incorrectly, indicating the item is a challenging item
the correct answer cannot be guessed
The higher the item-reliability index,:
the higher the internal consistency of the test
the lower the internal consistency of the test
the more likely the test-taker is to miss the item
the more likely the test developer is to eliminate the item
Factor analysis can help the test developer:
to eliminate or revise items that do not load on the predicted factor
to identify whether test items appear to be measuring the same construct
all are correct
none are correct
Expert panels may be used in the process of test development to:
provide ratings of item reliability
recommend statistical tests of validity
screen test items for possible bias
serve as expert witnesses in any subsequent litigation
Having a large number of items available in the item pool during test revision is:
a disadvantage due to the great expense of item development
often a waste of time, because many of the items are eventually deleted
an advantage, because poor items can be deleted in favor of the good items
none are correct
The following item appears on an end-of-semester course evaluation in a tests and measurement course: "This course was more work than I thought it would be." The possible responses are: (1) strongly agree; (2) agree; (3) unsure; (4) disagree; (5) strongly disagree. This is an illustration of what type of scaling?
nomothetic
Likert
Guttman
ipsative
If 50 students were administered a classroom test, how many would be included in each group for the purpose of calculating the item-discrimination index (d)?
25
10
13
17
When analyzing an item's discriminative abilities, the test developer should compare:
the highest and lowest scorers on the test
the highest and middle scores on the test
the responses from various minority groups to the item
the responses of people of different ages to the item
If you were interested in developing a test for adapting well to a college fraternity or sorority, and you began by interviewing those who had graduated from college after having been a member of a fraternity or sorority for at least two years, then which stage of test development process would this represent?