Editorial transparency

How entries are verified

Every entry in this archive published via the automated news pipeline passed through a 7-stage process — intake, triage, extraction, verification, drafting, review, and final validation. This page documents each step and the checks applied.

Pipeline

7-stage publication process

01
Intake

A source article is fetched and its full text extracted. Metadata (publication date, URL, outlet) is recorded verbatim at this step — nothing is inferred yet.
02
Triage

An LLM decides whether the article describes a genuinely new incident or an update to an existing one. Confidence is scored 0–1. Low-confidence items are dropped before any further processing.
03
Extract

Factual claims are extracted from the article text as discrete, attributable statements. Each claim is typed (action, statement, allegation, figure). No synthesis happens here — only extraction.
04
Verify

Claims are cross-checked against independent sources. A corroboration level (unverified / partial / strong) is assigned based on how many independent outlets report the same facts. Tier 1 sources (national press, wire services) are weighted more heavily than tier 2 (regional/partisan).
05
Draft

A structured archive entry is drafted from the verified claims. The draft follows a fixed schema — title, date, description, summary, sources, tags, timeline — ensuring every entry is machine-readable and citable.
06
Review

A separate LLM model (different from the drafter) runs five editorial checks: factual grounding, source attribution, neutrality of language, date accuracy, and absence of hallucinated content. All five must pass.
07
Publish

Final programmatic validation (required fields, valid URLs, no future dates, no duplicate slugs) followed by an optional random deep-audit where one source is re-fetched and claims are spot-checked against the live article. Only then is the entry written to the archive.

Review stage

Five editorial checks

Stage 6 uses a model that was not involved in drafting — a deliberate separation to avoid self-validation. All five checks must pass. A single failure quarantines the item for human review.

Factual grounding Every claim traces to a cited source
Source attribution Publication, author, and date are recorded
Language neutrality No editorialising or loaded language
Date accuracy Dates match what sources report
No hallucination Drafting model did not invent uncited facts

Corroboration

Source thresholds

Corroboration level is determined by independent reporting — outlets that published separately, not wire-service pickups of the same story.

Level

Meaning

Entry behaviour

Strong

2+ independent tier-1 sources

Published automatically

Partial

1 tier-1 or 2+ tier-2 sources

Published with corroboration note

Unverified

Single source, tier unknown

Quarantined for human review

Models

Why different models per stage

Triage, extraction, drafting, and review intentionally use different LLM models. This prevents a single model's systematic biases from propagating through the full pipeline undetected. The review model has no access to the drafter's chain-of-thought — it sees only the finished draft and the source text.

All models run locally via Ollama. No data leaves the machine during processing. Model names are recorded in each entry's provenance chain and are visible on the entry's detail page.

Manual curation

Entries added directly (via CMS or git) bypass the pipeline. These show a Manually curated badge on the detail page instead of pipeline provenance. Manual entries are held to the same factual standards but do not carry machine-generated audit trails.

For forensic evidence of source preservation, see the chain of custody page.

7-stage publication process

Intake

Triage

Extract

Verify

Draft

Review

Publish

Five editorial checks

Source thresholds

Why different models per stage