Skip to content

Methodology

How the v2 investigation was designed, pre-registered, and run. These pages document the process — the scientific and engineering scaffolding that turns a hypothesis into a verdict you can audit.

  • Pre-registration — what was locked before generation began (CQS-craft formula, rubric, conditions, model pool, primary statistical criterion) and how amendments are handled.
  • ml-lab debate protocol — the 4-stage critic ↔ defender protocol that surfaced the F1 unclamped-judge-score bug before the main run, with a worked example from the v2 log.
  • Investigation logs — what's in the two append-only INVESTIGATION_LOG.jsonl files, how to read them, and what an outside observer can (and cannot) reconstruct from them alone.
  • Amendments — the five pre-registration drift events (A1–A5) that adjusted the v2 design during pre-flight, each with trigger, change, and justification.
  • Limitations — what the v2 design does not measure, and what would be needed to settle the open questions.