OET Small-Span Correction — Prototype live LLM

Goal Make the AI proofreader produce atomic phrase-level edits (the style of a real human proofreader), not sentence-level rewrites. The same clause skeleton the student wrote is preserved — only the errors are touched.

Pipeline Constrained Claude prompt → streaming NDJSON (one correction object per line, pushed over Server-Sent Events as Claude emits them) → per-correction minimal-diff trim (strip any prefix/suffix words shared between original and replacement) → JavaScript validator (drops spans that cross sentence boundaries, break word boundaries, or exceed 4 tokens) → live letter render. The default below was pre-computed on a sample letter; paste your own letter and corrections stream in one-by-one, typically first byte in 2–4 seconds.

How this compares to sentence-level rewrites

Max span tokens 4 Tighter spans hide more corrections — client originally asked for 4, but their goal sample has up to 7. Slide live to compare.

Shown corrections

—

v1-style: ~8-12

Avg span length

—

v1-style: 30–80 chars

Avg span tokens

—

target ≤ cap

Filtered by validator

—

silently dropped pre-render

1 · Constrained prompt (the core)

The prompt enforces small-span output through four rules. Full text below — copy and test it yourself on any OET letter.

loading…

2 · Minimal-diff trim (the new step)

Even when the LLM picks a small span, the span often still has repeated words that are identical in the correction — e.g. bilateral ears-ringing → bilateral tympanic membranes. The word "bilateral" appears in both, so it doesn't need to be part of the correction. This post-processor strips those shared prefix/suffix words so the span contains only the words that actually change: ears-ringing → tympanic membranes. Runs after the LLM call, before the validator. Strict equality — a capitalisation change still counts as a change.

loading…

3 · Validator (post-processor, ~35 lines of JS)

After trim, this validator runs before render. It drops any correction whose original_text crosses a sentence boundary, starts or ends mid-word, or exceeds the token budget. Dropped corrections are silently filtered — not shown in the UI, only surfaced as a count in the metrics strip. This pre-render check directly addresses the "broken tracked changes" and "duplicated text" issues from the brief.

loading…

4 · How the default sample was generated

The sample you see above is not hand-authored. The pipeline was run once on the COPD letter with a single Claude Sonnet 4.6 API call. The JSON response was saved and is loaded here as the default. The Try Your Own Letter button calls the same endpoint live.

Model: claude-sonnet-4-6
Max tokens: 8000
Temperature: default (1.0)
System prompt: as above
User turn: the letter, verbatim, wrapped in --- delimiters

The validator also demonstrates itself: one of the LLM's 21 corrections ("time. we" → "time. We") technically crosses a sentence boundary, and the validator flags it — an example of the pipeline catching a real LLM slip.

Prototype built against the henry-mathews-feedback-smaller-spans.html target. Source original_text / suggested_replacement / nature / affects schema is a direct subset of the existing correction shape.

OET Small-Span Correction — Prototype live LLM

How this compares to sentence-level rewrites

Paste any OET letter — the same pipeline will run on it.