The Irreplaceable Human Skill: Why Generative AI Can’t Teach Students to Judge Their Own Work

A note to readers: I’m writing this in the thick of marking student submissions – the most grinding aspect of academic work. My brain fights against repetitive rote labour and goes on tangents to keep me entertained. What follows emerged from that very human need to find intellectual stimulation in the midst of administrative necessity.

There’s considerable discussion that our distinction as creators and thinkers from Generative AI content production lies in creativity and critical thinking linked to innovation. But where does the hair actually split? Are we actually replaceable by robots or will they atrophy our critical thinking skills by doing the work for us? Will we just get dummer and less capable to tie our own shoe laces – like most fear based reporting suggests? I think we are asking the wrong questions.

Here is a look at what is actually going on, on the ground. A student recently asked me for detailed annotations on their assignment—line-by-line corrections marking every error. They wanted me to do the analytical work of identifying problems in their writing. This request highlights a fundamental challenge in education: the difference between fixing problems and developing the capacity to recognise them. More importantly, it reveals where the Human-Generative AI distinction becomes genuinely meaningful.

Could Generative AI theoretically teach students to judge their own work? Perhaps, through Socratic questioning or scaffolded self-assessment prompts. But that’s not how students actually use these tools. Or want to use them, apparently. A discussion I had with a tech developer working in a tutoring company utilising Generative AI in the teaching/learning process mentioned that students got annoyed by the Socratic approach when they encountered it. So there goes that morsel of hope.

The Seductive Trap of Generative AI Writing Assistance

Students increasingly use Generative AI tools for grammar checking, expression polishing, and even content generation. These tools are seductive because they make writing appear better—more polished, more confident, more academically sophisticated. But here’s the problem: Generative AI tools are fundamentally sycophantic and don’t course correct misapprehensions. They won’t tell a student their framework analysis is conceptually flawed, their citations are inaccurate, or their arguments lack logical consistency. Instead, they’ll make poorly reasoned content sound more convincing.

This creates a dangerous paradox: students use Generative AI to make their work sound rigorous and sophisticated, but this very process prevents them from developing the judgement to recognise what genuine rigour looks like. They can’t evaluate what they clearly don’t know – that their work isn’t conceptually aligned, coherently logical, or correctly interpreting sources – because the AI has dressed their half-formed understanding in authoritative-sounding language.

I have encountered several submissions across different subjects that exemplified this perfectly: beautifully written but containing fundamental errors in framework descriptions, questionable source citations, and confused theoretical applications. The prose was polished, the structure clear, but the content revealed gaps in understanding that no grammar checker could identify or fix. The student had learned to simulate the appearance of academic rigour without developing the capacity to recognise genuine scholarly quality.

Where the Hair Actually Splits

Generative AI can actually be quite “creative” in generating novel combinations of ideas, and it can perform certain types of critical analysis when clearly guided and bounded. What it fundamentally cannot do is develop the evaluative judgement to recognise quality, coherence, and accuracy in complex, contextualised work. It has no capacity for self reflection and meaning making (at the moment), we do.

The distinction isn’t between:

  • Generating creative output (which Generative AI can somewhat do)
  • Performing critical analysis (which generative AI can also somewhat do)

Rather, it’s between:

  • Creating sophisticated looking content (which Generative AI increasingly excels at)
  • Judging the quality of that content in context (which requires human oversight and discernment)

Generative AI can produce beautifully written, seemingly sophisticated arguments that are conceptually flawed. It can create engaging content that misrepresents sources or conflates different frameworks. What it cannot do is step back and recognise “this sounds polished but the underlying logic is problematic” or “this citation doesn’t actually support this claim.”

The irreplaceable human skill isn’t creativity per se—it’s the capacity for metacognitive evaluation: the ability to assess one’s own thinking, to recognise when arguments are coherent versus merely convincing, to distinguish between surface-level polish and deep understanding.

What Humans Bring That AI Cannot

The irreplaceable human contribution to education isn’t information delivery—AI is increasingly able to do that pretty efficiently (although there is a lot of hidden labour in this). It’s developing the capacity for metacognitive evaluation in our students.

This happens through:

Exposure to expertise modelling: Students need to observe how experts think through problems, make quality judgements, and navigate uncertainty. This isn’t just about seeing perfect examples—it’s about witnessing the thinking process behind quality work.

Calibrated feedback loops: Human educators can match feedback to developmental readiness, escalating complexity as students build capacity. We recognise when to scaffold and when to challenge.

Critical engagement with authentic problems: Unlike AI-generated scenarios, real-world applications come with messy complexities, competing priorities, and value judgements that require human judgement, discernment and social intelligence.

Social construction of standards: Quality isn’t just individual—it’s negotiated within communities of practice. Students learn to recognise “good work” through dialogue, peer comparison, and collective sense-making.

Refusing to spoon-feed solutions: Perhaps most importantly, human educators understand when not to provide answers. When my student asked for line-by-line corrections, providing them would have created dependency rather than developing their evaluative judgement. The metacognitive skill of self-assessment can only develop when students are required to do the analytical work themselves.

The Dependency Problem

When educators provide line-by-line corrections or when students rely on Generative AI for error detection in thinking, writing or creating, we create dependency rather than capacity. Students learn to outsource quality judgement instead of developing their own ability to recognise problems.

The student who asked for detailed annotations was essentially asking me to do their self-assessment for them. But self-regulated learning—the ability to monitor, evaluate, and adjust one’s own work—is perhaps the most crucial skill we can develop. Without it, students remain permanently dependent on external validation and correction.

Teaching Evaluative Judgement in a Generative AI World

This doesn’t mean abandoning Generative AI tools entirely. Rather, it means being intentional about what we ask humans to do versus what we delegate to technology:

Use Generative AI for: Initial drafting, grammar checking, formatting, research organisation—the mechanical aspects of work.

Reserve human judgement for: Source evaluation, argument coherence, conceptual accuracy, ethical reasoning, quality assessment—the thinking that requires wisdom, not just processing.

In my own practice, I provide rubric-based feedback that requires students to match criteria to their own work. This forces them to develop pattern recognition and quality calibration. It’s more cognitively demanding than receiving pre-marked corrections, but it builds the evaluative judgement they’ll need throughout their careers.

The Larger Stakes

The question of human versus Generative AI roles in education isn’t just pedagogical—it’s about what kind of thinkers we’re developing. If students learn to outsource quality judgement to Generative AI tools, we’re creating a generation that can produce polished content but can’t recognise flawed reasoning, evaluate source credibility, or build intellectual capacity and critical reasoning skills.

This is why we need to build self-evaluative judgement in students – not just critical thinking and creative processes more broadly. The standard educational discourse about “21st century skills” focuses on abstract categories like critical thinking and creativity, but misses this more precise distinction: the specific metacognitive capacity to evaluate the quality of one’s own intellectual work.

This self-evaluative judgement operates laterally across disciplines rather than being domain-specific, and it’s fundamentally metacognitive because it requires thinking about thinking. It addresses the actual challenge students face in a Generative AI world: distinguishing between genuine understanding and polished simulation of understanding. A student might articulate sophisticated pedagogical concepts yet be unable to evaluate whether their own framework descriptions are accurate or their citations valid.

The unique human contribution isn’t delivering perfect feedback—it’s teaching students to become their own quality assessors. That capacity for self-evaluation, for recognising what makes work meaningful and rigorous, remains irreplaceably human.

In a world where Generative AI can make anyone’s writing sound professional, the ability to think critically about one’s own work becomes more valuable, not less. That’s the expertise that human educators bring to the table—not just knowing the right answers, but developing in students the judgement to recognise quality thinking when they see it, including in their own work.

Leave a comment