Skip to content

v4: Preregistered Study

The first ml-lab experiment with a locked pre-registration — hypothesis, metrics, and pass criteria specified before any code ran.

Hypothesis

The debate protocol detects planted methodology flaws at a rate significantly above chance, with verdict accuracy above a pre-specified threshold, when evaluated on benchmark cases with known ground truth.

Design

  • Pre-registration locked before experiment execution
  • Metrics: detection rate, verdict accuracy, per-case breakdown
  • Pass criteria: pre-specified thresholds in PREREGISTRATION.json
  • Novel: no changes to hypothesis or metrics allowed after locking

Key finding: specification drift

The most important finding wasn't about detection rates — it was about the evaluation process itself. Implementation changes during the experiment silently violated the pre-registration constraints:

  • Code measured something subtly different from what was specified
  • Divergence was small enough to miss in code review
  • Large enough to invalidate the result if uncorrected

Response

v4's post-mortem produced /intent-watch:

  • Automated monitoring of experiment directories against source-of-truth documents
  • Catches drift as it happens rather than after the experiment concludes
  • Integrated into ml-lab's workflow as a mandatory Gate 1 check

Artifacts

  • experiments/self_debate_experiment_v4/
  • POST_MORTEM.md — specification drift analysis
  • EXECUTION_PLAN.md — pre-registered execution plan
  • PREREGISTRATION.json — locked hypothesis and metrics