Item response theory (IRT) is the dominant psychometric paradigm for constructing, scoring, and analyzing assessments. Virtually all large-scale assessments utilize IRT because of its well-documented advantages. In many cases, however, it is referred to as a single way of analyzing data. However, IRT is actually a family of models, and a growing family at that.
I often hear the question: “What is the difference between dichotomous and polytomous?” Well, these terms represent two subfamilies within the IRT family.
Dichotomous IRT Models
Dichotomous IRT models are those where there are two possible item scores. Note that I say “item scores” and not “item responses” – the most common example of a dichotomous item is multiple choice, which typically has 4 to 5 options, but only two possible scores (correct/incorrect). True/False or Yes/No items are also obvious examples, and are more likely to appear in surveys or inventories, as opposed to the ubiquity of the multiple choice item in achievement/aptitude testing. Other item types that can be dichotomous are Scored Short Answer and Multiple Response (all or nothing scoring). Learn more about item types here.
What models are dichotomous?
The three most common are the 1PL/Rasch, the 2PL, and the 3PL. Which one to use depends on the type of data you have, as well as your doctrine of course. A great example is Scored Short Answer items: there should be no effect of guessing on such an item, so the 2PL is a logical choice. Here is a broad overgeneralization:
- 1PL/Rasch: Uses only the difficulty (b) parameter and does not take into account guessing effects or the possibility that some items might be more discriminating than others; however, can be useful with small samples and other situations
- 2PL: Uses difficulty (b) and discrimination (a) parameters, but no guessing (c); relevant for the many types of assessment where there is no guessing
- 3PL: Uses all three parameters, typically relevant for achievement/aptitude testing.
What do dichotomous models look like?
Dichotomous models, graphically, will have one S-shaped curve with a positive slope, as seen here. This models that the probability of responding in the keyed direction increases with higher levels of the trait or ability. Technically, there is also a line for the probability of an incorrect response, which goes down, but this is obviously the 1-P complement, so it is rarely drawn in graphs. It is, however, used in scoring algorithms (check out this free software download and this white paper). In the example, a student with theta = -3 has about a 0.28 chance of responding correctly, while theta = 0 has about 0.72 and theta = 2 has about 0.94.
Polytomous IRT Models
Polytomous models are for items that have more than two possible scores. The most common examples are Likert-type items (Rate on a scale of 1 to 5) and partial credit items (score on an Essay might be 0 to 5 points). IRT models typically assume that the item scores are integers.
What models are polytomous?
Unsurprisingly, the most common polytomous models use names like rating scale and partial credit.
- Rating Scale Model (Andrich, 1978)
- Partial Credit Model (Masters, 1982)
- Generalized Rating Scale Model (Muraki, 1990)
- Generalized Partial Credit Model (Muraki, 1992)
- Graded Response Model (Samejima, 1972)
- Nominal Response Model (Bock, 1972)
What do polytomous models look like?
Polytomous models have a line that models each possible response. The line for the highest point value is typically S-shaped like a dichotomous curve. The line for the lowest point value is typically sloped down like the 1-P dichotomous curve. Point values in the middle typically have a bell-shaped curve. The example on the right is for an Essay scored 0 to 5 points. Only students with theta >2 are likely to get the full points (blue), while students 1<theta<2 are likely to receive 4 points (green).
I’ve seen “polychotomous.” What does that mean?
It means the same as polytomous. For an interesting discussion by my graduate advisor and co-founder of ASC, David J. Weiss, please read http://conservancy.umn.edu/handle/11299//117413.
How is IRT used in our platform?
We use it to support the test development cycle, including form assembly, scoring, and adaptive testing. You can learn more at this page.
How can I analyze my tests with IRT?
You need specially designed software, like Xcalibre. Classical test theory is so simple that you can do it with Excel functions.
Item Response Theory for Psychologists by Embretson and Riese (2000). These authors also both graduated from the psychometrics program at Minnesota.
Want to improve the quality of your assessments?
Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.
Latest posts by Nathan Thompson, PhD (see all)
- Is R for psychometrics finally becoming mainstream? - April 22, 2019
- Flagging invalid scores with Time-Score analysis - April 22, 2019
- Is teaching to the test a bad thing? - March 7, 2019