All Psychometric Models Are Wrong

The British statistician George Box is credited with the quote, “All models are wrong but some are useful.”  As psychometricians, it is important that we never forget this perspective.  We cannot be so haughty as to think that our models actually represent the true underlying phenomena and any data that does not fit nicely is just noise.  We need to remember that everything we do is an approximation, and respect the balance between parsimony and parameterization.

Really… all models are wrong?

Yeah, there is no TRUE model that perfectly describes the interaction between an examinee and a test item.  Obviously the probability of a correct response is primarily due to important factors such as examinee ability, item difficulty, item quality, the presence of guessing, and the scoring function of the item.  There are also additional factors, such as student motivation, timing factors, lighting in the room, screen size, whether they broke up with their girlfriend/boyfriend the previous day, whether their mom made their favorite breakfast that morning… you get the picture.  Attempting to model all those factors is certainly overparameterization.

Wikipedia as has a lengthier quote on that aspect:

Since all models are wrong the scientist cannot obtain a “correct” one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.

Most, if not all psychometricians, would agree that my earlier description of overparameterization is valid.  The controversy in the field of Psychometrics is which of those “important factors” I mentioned qualify as overparameterization.  The Rasch model famously boils down the interaction to a single item parameter (difficulty) and a single person parameter (ability).  Many psychometricians consider this to be underparameterization since, for example, items widely differ in their quality (discrimination).  The Rasch cohort would consider the 2 and 3 parameter item response theory (IRT) models to be overparameterization, especially since they necessitated the development of new parameter estimation algorithms in the 1970s.  There are some practitioners in each camp who would claim that the other is the “mark of mediocrity.”

Sooo… How do I select a model?

Well, try to be cognizant of that tradeoff, which is one of several tradeoffs when selecting an IRT model.  There is no right answer all the time, it is more a matter of whether your data fits a model and whether it satisfies your requirements for a particular situation.  That is, whether it is truly useful, which is Box’s original point. But don’t forget that all the models are wrong!

The following two tabs change content below.

Nathan Thompson, PhD

Chief Product Officer at ASC
I am a psychometrician, software developer, author, and researcher, currently serving as Chief Product Officer for Assessment Systems Corporation (ASC). My mission is to elevate the profession of psychometrics by using software to automate the menial stuff like job analysis and Angoff studies, so we can focus on more innovative work. My core goal is to improve assessment throughout the world. I was originally trained as a psychometrician, doing an undergrad at Luther College in Math/Psych/Latin and then a PhD in Psychometrics at the University of Minnesota. I then worked multiple roles in the testing industry, including item writer, test development manager, essay test marker, consulting psychometrician, software developer, project manager, and business leader. Research and innovation are incredibly important to me. In addition to my own research, I am cofounder and Membership Director at the International Association for Computerized Adaptive Testing, You can often find me at other important conferences like ATP, ICE, CLEAR, and NCME. I've published many papers and presentations, and my favorite remains
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply