The British statistician George Box is credited with the quote, “All models are wrong but some are useful.” As psychometricians, it is important that we never forget this perspective. We cannot be so haughty as to think that our models actually represent the true underlying phenomena and any data that does not fit nicely is just noise. We need to remember that everything we do is an approximation, and respect the balance between parsimony and parameterization.
Really… all models are wrong?
Yeah, there is no TRUE model that perfectly describes the interaction between an examinee and a test item. Obviously the probability of a correct response is primarily due to important factors such as examinee ability, item difficulty, item quality, the presence of guessing, and the scoring function of the item. There are also additional factors, such as student motivation, timing factors, lighting in the room, screen size, whether they broke up with their girlfriend/boyfriend the previous day, whether their mom made their favorite breakfast that morning… you get the picture. Attempting to model all those factors is certainly overparameterization.
Wikipedia as has a lengthier quote on that aspect:
Most, if not all psychometricians, would agree that my earlier description of overparameterization is valid. The controversy in the field of Psychometrics is which of those “important factors” I mentioned qualify as overparameterization. The Rasch model famously boils down the interaction to a single item parameter (difficulty) and a single person parameter (ability). Many psychometricians consider this to be underparameterization since, for example, items widely differ in their quality (discrimination). The Rasch cohort would consider the 2 and 3 parameter item response theory (IRT) models to be overparameterization, especially since they necessitated the development of new parameter estimation algorithms in the 1970s. There are some practitioners in each camp who would claim that the other is the “mark of mediocrity.”
Sooo… How do I select a model?
Well, try to be cognizant of that tradeoff, which is one of several tradeoffs when selecting an IRT model. There is no right answer all the time, it is more a matter of whether your data fits a model and whether it satisfies your requirements for a particular situation. That is, whether it is truly useful, which is Box’s original point. But don’t forget that all the models are wrong!
Latest posts by Nathan Thompson, PhD (see all)
- What validity threats are relevant to psychometric forensics? - November 10, 2017
- What is classical item difficulty (P value)? - November 6, 2017
- Examinee Collusion: Primary vs Secondary - October 30, 2017