When you ask an AI to generate a quiz from your training material, it defaults to recall questions, because those are the easiest for a model to write and check against a single right answer. A quiz everyone passes that only measures recognition is not evidence of capability. It is the illusion of competence, industrialized.
You have probably done this. You paste the team’s new SOP into a chatbot, ask for a 10-question quiz, and send it out. Everyone scores 90 percent or higher. Three weeks later, the work still comes back wrong.
It usually is not ignorance that ships the recall quiz. It is the day. You needed a check by Friday, you needed a record that people had read the thing, and the AI handed you ten clean questions in ten seconds. The problem is not that the quiz is fake. The problem is that it answers a different question than the one you actually care about, and it answers it convincingly enough to stop you looking.
Why an AI Quiz Feels Like Proof
#Recognition feels like proof, but it is not the same as retrieval or transfer. Seeing the right option in a multiple-choice question means someone recognized it from the material. It does not mean they could produce it unaided, or apply it to a problem they have not seen.
The research on this is old and clean. In 2006, Henry Roediger and Jeffrey Karpicke ran the comparison. Re-readers felt confident. The group that practiced retrieval, actually trying to recall the material, remembered substantially more weeks later. The re-readers had mistaken fluency for mastery: familiar material reads easy, and easy feels like knowing. The feeling of knowing is exactly what an end-of-session quiz rewards.
The Three Things a Question Can Test
#It helps to read any quiz question by asking which of three things it actually measures.
The first is recognition. The answer is on the screen and the learner only has to point at it. “Which of these is the correct first step?” Almost every multiple-choice question lives here, and it is the lowest bar there is. Seeing the right answer in a list is not far from guessing it.
The second is recall. The answer is not on the screen, and the learner has to produce it from memory. “What is the first step, and why does it come first?” There is nothing to point at. You either have it or you do not, and that is already a much stronger signal than recognition.
The third is transfer, and it is the one you are actually paying for. The learner takes what they know and applies it to a situation the material never spelled out. “The supplier missed the deadline, the usual process does not fit, and the customer is waiting. What do you do?” That is the job. It is also the level almost no AI quiz reaches unless you make it.
If you have run into Bloom’s taxonomy, this is the bottom few rungs of it: remember, then understand, then apply and judge. You do not need the framework to use the idea. You only need to notice that recall is the floor, and a quiz that never leaves the floor tells you almost nothing about whether someone can do the work.
Why the AI Quiz Makes It Worse
#In its default mode, AI does not just permit the cheap quiz. It industrializes it. You can generate an endless stream of recall questions in seconds, perfectly formatted and instantly gradable. The friction that used to make a lazy assessment annoying is gone, so the path of least resistance is now a wall of recognition questions.
There is a reason the default skews this way. A recall question has one checkable answer, which is exactly what a model is built to produce and score. A transfer question has many defensible answers and no clean key, so it is harder for the model to write and harder for it to grade. Left to its own habits, the AI drifts toward the questions it can settle by itself. Those are the ones that measure the least.
The natural home for that quiz makes it worse still: the immediate check at the end of a session. That is exactly when the fluency effect peaks. The team passes, feels ready, and books a capability debt that will not surface until the work does.
How to Climb Off Recall
#Climbing the ladder means pushing the questions past recognition toward application and judgment. Two moves do most of the work.
The first fix is in the prompt. Tell the AI to write questions about what learners must do with the material, not whether they saw it.
- Instead of “Define X,” ask it to “write a scenario where a team member applies X to solve a problem the material never spells out.”
- Instead of “Which of these is Y?” ask it to “give two ways to handle situation Z and have the learner pick one and defend it under constraints A and B.”
- Instead of “List the steps for process P,” ask it to “show a flawed attempt at process P and have the learner find the errors and fix them.”
It is worth seeing the same topic at both levels. Say the material is your refund policy. The recall question is “What is the maximum refund window?” The answer is a number on the page, and a new hire gets it right by skimming for it. The climbed version is “A customer is four days past the window, your team gave them the wrong information, and they want a full refund. Walk through what you would do and what in the policy backs it up.” Now the page is necessary but not enough. They have to find the rule, weigh it against the situation, and defend a call. One question tells you they can locate a number. The other tells you they can actually run the policy.
The second fix is timing. Confirm capability with unaided production after a real delay, not recognition in the moment. The test is whether someone can do it next week, on the actual task, without the material open. Even one low-stakes check a few days later tells you more than the perfect score on day one.
An AI can write you a thousand questions in a minute. Only you can decide whether they test recognition or capability. Before you trust a quiz as proof your team is ready, ask which one it actually measured.
