Editorial transparency
How entries are verified
Every entry in this archive published via the automated news pipeline passed through a 7-stage process — intake, triage, extraction, verification, drafting, review, and final validation. This page documents each step and the checks applied.
Pipeline
7-stage publication process
- 01
Intake
A source article is fetched and its full text extracted. Metadata (publication date, URL, outlet) is recorded verbatim at this step — nothing is inferred yet.
- 02
Triage
An LLM decides whether the article describes a genuinely new incident or an update to an existing one. Confidence is scored 0–1. Low-confidence items are dropped before any further processing.
- 03
Extract
Factual claims are extracted from the article text as discrete, attributable statements. Each claim is typed (action, statement, allegation, figure). No synthesis happens here — only extraction.
- 04
Verify
Claims are cross-checked against independent sources. A corroboration level (unverified / partial / strong) is assigned based on how many independent outlets report the same facts. Tier 1 sources (national press, wire services) are weighted more heavily than tier 2 (regional/partisan).
- 05
Draft
A structured archive entry is drafted from the verified claims. The draft follows a fixed schema — title, date, description, summary, sources, tags, timeline — ensuring every entry is machine-readable and citable.
- 06
Review
A separate LLM model (different from the drafter) runs five editorial checks: factual grounding, source attribution, neutrality of language, date accuracy, and absence of hallucinated content. All five must pass.
- 07
Publish
Final programmatic validation (required fields, valid URLs, no future dates, no duplicate slugs) followed by an optional random deep-audit where one source is re-fetched and claims are spot-checked against the live article. Only then is the entry written to the archive.
Review stage
Five editorial checks
Stage 6 uses a model that was not involved in drafting — a deliberate separation to avoid self-validation. All five checks must pass. A single failure quarantines the item for human review.
- Factual grounding Every claim traces to a cited source
- Source attribution Publication, author, and date are recorded
- Language neutrality No editorialising or loaded language
- Date accuracy Dates match what sources report
- No hallucination Drafting model did not invent uncited facts
Corroboration
Source thresholds
Corroboration level is determined by independent reporting — outlets that published separately, not wire-service pickups of the same story.
| Level | Meaning | Entry behaviour |
|---|---|---|
| Strong | 2+ independent tier-1 sources | Published automatically |
| Partial | 1 tier-1 or 2+ tier-2 sources | Published with corroboration note |
| Unverified | Single source, tier unknown | Quarantined for human review |
Models
Why different models per stage
Triage, extraction, drafting, and review intentionally use different LLM models. This prevents a single model's systematic biases from propagating through the full pipeline undetected. The review model has no access to the drafter's chain-of-thought — it sees only the finished draft and the source text.
All models run locally via Ollama. No data leaves the machine during processing. Model names are recorded in each entry's provenance chain and are visible on the entry's detail page.
Manual curation
Entries added directly (via CMS or git) bypass the pipeline. These show a Manually curated badge on the detail page instead of pipeline provenance. Manual entries are held to the same factual standards but do not carry machine-generated audit trails.
For forensic evidence of source preservation, see the chain of custody page.