item response theory

How do I implement item response theory?

I recently received a email from a researcher that wanted to implement item response theory, but was not sure where to start.  It occurred to me that there are plenty of resources out there which describe IRT but few, if any, that provide guidance for how someone new to the topic could apply IRT.  That is, plenty of resources that define the a-b-c parameters and discuss the item response function, but few resources that tell you how to calculate those parameters or what to do with them.

Why do I need to implement item response theory?

First of all, you might want to ask yourself this question.  Don’t just be using IRT because you heard it is an advanced psychometric paradigm.  IRT was invented to address shortcomings in classical test theory, and works best in the situations where those shortcomings are highlighted.  For example, you might want to design adaptive tests, assemble parallel forms, or equate score scales across years.

What sort of tests/data work with IRT?

This is the next question you need to ask yourself is whether your test can work with IRT.  IRT assumes unidimensionality and local independence.  Unidimensionality means that all items intercorrelate highly, and from a factor analysis perspective, load highly on one primary factor.  Local independence means that items are independent of one another – so testlets and “innovative” item types that violate this might not work well.

IRT assumes that items are scored dichotomously (correct/incorrect) or polytomously (integer points where smarter or high-trait examinees earn higher points).  Surprisingly, this isn’t always the case.  This blog post explores how a certain PARCC item type violated the should-be-obvious assumption that smarter students earn higher points, a great example of pedagogues trying to do psychometrics.

And, of course, IRT has sample size requirements.  I’ve received plenty of email questions from people who wonder why Xcalibre doesn’t work on their data set… of 6 students.  Well, IRT requires 100 examinees for the simplest model and up to a minimum of 1,000 for more complex models.  Six students obviously isn’t enough for classical test theory, for that matter.

How do I calculate IRT analytics?

Classical test theory is super-super-simple, so that anyone can easily calculate things like P, Rpbis, and coefficient alpha with Microsoft Excel formulas.  CITAS does this.  IRT calculations are much more complex, and it takes hundreds of lines of real code to estimate item parameters like a, b, and c.  I recommend the program Xcalibre to do so.  It has a straightforward, user-friendly interface and will automatically create MS Word reports for you.  If you are a member of the Rasch club, the go-to software is Winsteps.  You can also try R packages, but to do so you will need to learn to program in the R language, and the output is greatly inferior to commercial software.

Some of the secondary analyses in IRT can be calculated easily enough that Excel formulas are an option.  The IRT Scoring Spreadsheet scores a single student with IRT item parameters you supply, in an interactive way that helps you learn how IRT scoring works. I also have a spreadsheet that helps you build parallel forms by calculating the test information function (TIF) and conditional standard error of measurement (CSEM).  However, my TestAssembler program does that with automation, saving you hours of manual labor.

There are also a few specific-use tools available on the web.  One of my favorites is IRTEQ, which performs conversion-style equating such as mean/sigma and Stocking-Lord.  That is, it links together scores from different forms of an exam onto a common scale, even if the forms are delivered in different years.

So where do I go from here?

If you want to implement item response theory, I recommend that you start by downloading the free version of Xcalibre.  If you are interested in managing an assessment with IRT throughout the test development cycle, sign up for a free account in FastTest, our cloud-based testing ecosystem.  If you still need to learn more about what IRT is, read this introductory article, then if you want more I recommend the book Item Response Theory for Psychologists by Embretson and Reise (2000).

The following two tabs change content below.

Nathan Thompson, PhD

Chief Product Officer at ASC
I am a psychometrician, software developer, author, and researcher, currently serving as Chief Product Officer for Assessment Systems Corporation (ASC). My mission is to elevate the profession of psychometrics by using software to automate the menial stuff like job analysis and Angoff studies, so we can focus on more innovative work. My core goal is to improve assessment throughout the world. I was originally trained as a psychometrician, doing an undergrad at Luther College in Math/Psych/Latin and then a PhD in Psychometrics at the University of Minnesota. I then worked multiple roles in the testing industry, including item writer, test development manager, essay test marker, consulting psychometrician, software developer, project manager, and business leader. Research and innovation are incredibly important to me. In addition to my own research, I am cofounder and Membership Director at the International Association for Computerized Adaptive Testing, You can often find me at other important conferences like ATP, ICE, CLEAR, and NCME. I've published many papers and presentations, and my favorite remains http://pareonline.net/getvn.asp?v=16&n=1.

Latest posts by Nathan Thompson, PhD (see all)