OET Small-Span Correction — Prototype live LLM

Goal Make the AI proofreader produce atomic phrase-level edits (the style of a real human proofreader), not sentence-level rewrites. The same clause skeleton the student wrote is preserved — only the errors are touched.

Pipeline Constrained Claude prompt → streaming NDJSON (one correction object per line, pushed over Server-Sent Events as Claude emits them) → per-correction minimal-diff trim (strip any prefix/suffix words shared between original and replacement) → JavaScript validator (drops spans that cross sentence boundaries, break word boundaries, or exceed 4 tokens) → live letter render. The default below was pre-computed on a sample letter; paste your own letter and corrections stream in one-by-one, typically first byte in 2–4 seconds.

Sample: COPD discharge referral, 21 corrections, avg span ≈ 14 chars.

How this compares to sentence-level rewrites

4 Tighter spans hide more corrections — client originally asked for 4, but their goal sample has up to 7. Slide live to compare.
Shown corrections
v1-style: ~8-12
Avg span length
v1-style: 30–80 chars
Avg span tokens
target ≤ cap
Filtered by validator
silently dropped pre-render
1 · Constrained prompt (the core)

The prompt enforces small-span output through four rules. Full text below — copy and test it yourself on any OET letter.

loading…
2 · Minimal-diff trim (the new step)

Even when the LLM picks a small span, the span often still has repeated words that are identical in the correction — e.g. bilateral ears-ringing → bilateral tympanic membranes. The word "bilateral" appears in both, so it doesn't need to be part of the correction. This post-processor strips those shared prefix/suffix words so the span contains only the words that actually change: ears-ringing → tympanic membranes. Runs after the LLM call, before the validator. Strict equality — a capitalisation change still counts as a change.

loading…
3 · Validator (post-processor, ~35 lines of JS)

After trim, this validator runs before render. It drops any correction whose original_text crosses a sentence boundary, starts or ends mid-word, or exceeds the token budget. Dropped corrections are silently filtered — not shown in the UI, only surfaced as a count in the metrics strip. This pre-render check directly addresses the "broken tracked changes" and "duplicated text" issues from the brief.

loading…
4 · How the default sample was generated

The sample you see above is not hand-authored. The pipeline was run once on the COPD letter with a single Claude Sonnet 4.6 API call. The JSON response was saved and is loaded here as the default. The Try Your Own Letter button calls the same endpoint live.

  • Model: claude-sonnet-4-6
  • Max tokens: 8000
  • Temperature: default (1.0)
  • System prompt: as above
  • User turn: the letter, verbatim, wrapped in --- delimiters

The validator also demonstrates itself: one of the LLM's 21 corrections ("time. we" → "time. We") technically crosses a sentence boundary, and the validator flags it — an example of the pipeline catching a real LLM slip.