Flagging invalid scores with Time-Score analysis

Psychometric forensics is a surprisingly deep and complex field.  Many of the indices are incredibly sophisticated, but a good high-level and simple analysis to start with is overall time vs. scores, which I call Time-Score Analysis.  This approach uses simple flagging on two easily interpretable metrics (total test time in minutes and number correct raw score) to identify possible pre-knowledge, clickers, and harvester/sleepers.  Consider the four quadrants that a bivariate scatterplot of these variables would produce.

 

Quadrant Interpretation Possible threat? Suggested flagging
Upper right High scores and taking their diligent time Good examinees NA
Upper left High scores with low time Pre-knowledge Top 50% score and bottom 5% time
Lower left Low scores with low time “Clickers” or other low motivation Bottom 5% time and score
Lower right Low scores with high time Harvesters, sleepers, or just very low ability Top 5% time and bottom 5% scores

An example of Time-Score Analysis

Consider the example data below.  What can this tell us about the performance of the test in general, and about specific examinees?

This test had 100 items, scored classically (number-correct), and a time limit of 60 minutes.  Most examinees took 45-55 minutes, so the time limit was appropriate.  A few examinees spent 58-59 minutes; there will usually be some diligent students like that.  There was a fairly strong relationship of time with score, in that examinees who took longer, scored highly.

Now, what about the individuals?  I’ve highlighted 5 examples.

  1. This examinee had the shortest time, and one of the lowest scores.  They apparently did not care very much.  They are an example of a low motivation examinee that moved through quickly.  One of my clients calls these “clickers.”
  2. This examinee also took a short time, but had a suspiciously high score.  They definitely are an outlier on the scatterplot, and should perhaps be investigated.
  3. This examinee is simply super-diligent.  They went right up to the 60 minute limit, and achieved one of the highest scores.
  4. This examinee also went right up to the 60 minute limit, but had one of the lowest scores.  They are likely low ability or low motivation.  That same client of mine calls these “sleepers” – a candidate that is forced to take the exam but doesn’t care, so just sits there and dozes.Alternatively, it might be a harvester; some who has been assigned to memorize test content, so they spend all the time they can, but only look at half the items so they can focus on memorization.
  5. This examinee had by far the lowest score, and one of the lowest times.  Perhaps they didn’t even answer every question.  Again, there is a motivation/effort issue here, most likely.

How useful is time-score analysis?

Like other aspects of psychometric forensics, this is primarily useful for flagging purposes.  We do not know yet if #4 is a Harvester or just low motivation.  Instead of accusing them, we open an investigation.  How many items did they attempt?  Are they a repeat test-taker?  What location did they take the test?  Do we have proctor notes, site video, remote proctoring video, or other evidence that we can review?  There is a lot that can go into such an investigation.  Moreover, simple analyses such as this are merely the tip of the iceberg when it comes to psychometric forensics.  In fact, so much that I’ve heard some organizations simply stick their head in the sand and don’t even bother checking out someone like #4.  It just isn’t in the budget.

However, test security is an essential aspect of validity.  If someone has stolen your test items, the test is now compromised, and you are guaranteed that scores do not mean the same thing they meant when the test was published.  It’s now apples and oranges, even though the items on the test are the same.  Perhaps you might not challenge individual examinees, but perhaps institute a plan to publish new test forms every 6 months.  Regardless, your organization needs to have some difficult internal discussions and establish a test security plan.

 

The following two tabs change content below.

Nathan Thompson, PhD

Chief Product Officer at ASC
I am a psychometrician, software developer, author, and researcher, currently serving as Chief Product Officer for Assessment Systems Corporation (ASC). My mission is to elevate the profession of psychometrics by using software (especially AI and machine learning elements) to automate the menial stuff like job analysis and Angoff studies, so we can focus on more innovative work. My core goal is to improve assessment throughout the world. I was originally trained as a psychometrician, doing an undergrad at Luther College in Math/Psych/Latin and then a PhD in Psychometrics at the University of Minnesota. I then worked multiple roles in the testing industry, including item writer, test development manager, essay test marker, consulting psychometrician, software developer, product owner, and business leader. Research and innovation are incredibly important to me. In addition to my own research, I am cofounder and Membership Director at the International Association for Computerized Adaptive Testing, You can often find me at other important conferences like ATP, ICE, CLEAR, and NCME. I've published many papers and presentations, and my favorite remains http://pareonline.net/getvn.asp?v=16&n=1.
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply