v4: Preregistered Study¶
The first ml-lab experiment with a locked pre-registration — hypothesis, metrics, and pass criteria specified before any code ran.
Hypothesis¶
The debate protocol detects planted methodology flaws at a rate significantly above chance, with verdict accuracy above a pre-specified threshold, when evaluated on benchmark cases with known ground truth.
Design¶
- Pre-registration locked before experiment execution
- Metrics: detection rate, verdict accuracy, per-case breakdown
- Pass criteria: pre-specified thresholds in
PREREGISTRATION.json - Novel: no changes to hypothesis or metrics allowed after locking
Key finding: specification drift¶
The most important finding wasn't about detection rates — it was about the evaluation process itself. Implementation changes during the experiment silently violated the pre-registration constraints:
- Code measured something subtly different from what was specified
- Divergence was small enough to miss in code review
- Large enough to invalidate the result if uncorrected
Response¶
v4's post-mortem produced /intent-watch:
- Automated monitoring of experiment directories against source-of-truth documents
- Catches drift as it happens rather than after the experiment concludes
- Integrated into ml-lab's workflow as a mandatory Gate 1 check
Artifacts¶
experiments/self_debate_experiment_v4/POST_MORTEM.md— specification drift analysisEXECUTION_PLAN.md— pre-registered execution planPREREGISTRATION.json— locked hypothesis and metrics