Failure Modes
The Reliability Gap in AI Essay Grading
Apr 30, 2026Why LLM-based essay graders score the same essay differently each run, what the MCAS rescoring incident reveals about the category, and the five engineering controls that turn a language model into a reliable scoring instrument.
22 min read·Technical·evaluation, reliability, responsible-ai, case-study