Flag exam cheating with Time-Score analysis

Psychometric forensics is a surprisingly deep and complex field. Many of the indices are incredibly sophisticated, but a good high-level and simple analysis to start with is overall time vs. scores, which I call Time-Score Analysis. This approach uses simple flagging on two easily interpretable metrics (total test time in minutes and number correct raw score) to identify possible pre-knowledge, clickers, and harvester/sleepers. Consider the four quadrants that a bivariate scatterplot of these variables would produce.

Quadrant	Interpretation	Possible threat?	Suggested flagging
Upper right	High scores and taking their diligent time	Good examinees	NA
Upper left	High scores with low time	Pre-knowledge	Top 50% score and bottom 5% time
Lower left	Low scores with low time	“Clickers” or other low motivation	Bottom 5% time and score
Lower right	Low scores with high time	Harvesters, sleepers, or just very low ability	Top 5% time and bottom 5% scores

An example of Time-Score Analysis

Consider the example data below. What can this tell us about the performance of the test in general, and about specific examinees?

This test had 100 items, scored classically (number-correct), and a time limit of 60 minutes. Most examinees took 45-55 minutes, so the time limit was appropriate. A few examinees spent 58-59 minutes; there will usually be some diligent students like that. There was a fairly strong relationship of time with the score, in that examinees who took longer, scored highly.

Now, what about the individuals? I’ve highlighted 5 examples.

This examinee had the shortest time, and one of the lowest scores. They apparently did not care very much. They are an example of a low motivation examinee that moved through quickly. One of my clients calls these “clickers.”
This examinee also took a short time but had a suspiciously high score. They definitely are an outlier on the scatterplot, and should perhaps be investigated.
This examinee is simply super-diligent. They went right up to the 60-minute limit and achieved one of the highest scores.
This examinee also went right up to the 60-minute limit but had one of the lowest scores. They are likely low ability or low motivation. That same client of mine calls these “sleepers” – a candidate that is forced to take the exam but doesn’t care, so just sits there and dozes. Alternatively, it might be a harvester; some who have been assigned to memorize test content, so they spend all the time they can, but only look at half the items so they can focus on memorization.
This examinee had by far the lowest score, and one of the lowest times. Perhaps they didn’t even answer every question. Again, there is a motivation/effort issue here, most likely.

How useful is time-score analysis?

Like other aspects of psychometric forensics, this is primarily useful for flagging purposes. We do not know yet if #4 is a Harvester or just low motivation. Instead of accusing them, we open an investigation. How many items did they attempt? Are they repeat test-takers? What location did they take the test? Do we have proctor notes, site video, remote proctoring video, or other evidence that we can review?

There is a lot that can go into such an investigation. Moreover, simple analyses such as this are merely the tip of the iceberg when it comes to psychometric forensics. In fact, so much that I’ve heard some organizations simply stick their head in the sand and don’t even bother checking out someone like #4. It just isn’t in the budget.

Some of this analysis is best done with specialized software for psychometric forensics, like SIFT.

However, test security is an essential aspect of validity. If someone has stolen your test items, the test is compromised, and you are guaranteed that scores do not mean the same thing they meant when the test was published. It’s now apples and oranges, even though the items on the test are the same. Perhaps you might not challenge individual examinees but perhaps institute a plan to publish new test forms every 6 months. Regardless, your organization needs to have some difficult internal discussions and establish a test security plan.

Bio
Latest Posts

Nathan Thompson, PhD

Nathan Thompson earned his PhD in Psychometrics from the University of Minnesota, with a focus on computerized adaptive testing. His undergraduate degree was from Luther College with a triple major of Mathematics, Psychology, and Latin. He is primarily interested in the use of AI and software automation to augment and replace the work done by psychometricians, which has provided extensive experience in software design and programming. Dr. Thompson has published over 100 journal articles and conference presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/ .

Latest posts by Nathan Thompson, PhD (see all)

What is a T score? - April 15, 2024
Item Review Workflow for Exam Development - April 8, 2024
Likert Scale Items - February 9, 2024

Flag exam cheating with Time-Score analysis

An example of Time-Score Analysis

How useful is time-score analysis?

Nathan Thompson, PhD

Latest posts by Nathan Thompson, PhD (see all)

Company

Online Testing Solutions

Psychometrics