v4: Preregistered Study¶

The first ml-lab experiment with a locked pre-registration — hypothesis, metrics, and pass criteria specified before any code ran.

Hypothesis¶

The debate protocol detects planted methodology flaws at a rate significantly above chance, with verdict accuracy above a pre-specified threshold, when evaluated on benchmark cases with known ground truth.

Design¶

Pre-registration locked before experiment execution
Metrics: detection rate, verdict accuracy, per-case breakdown
Pass criteria: pre-specified thresholds in PREREGISTRATION.json
Novel: no changes to hypothesis or metrics allowed after locking

Key finding: specification drift¶

The most important finding wasn't about detection rates — it was about the evaluation process itself. Implementation changes during the experiment silently violated the pre-registration constraints:

Code measured something subtly different from what was specified
Divergence was small enough to miss in code review
Large enough to invalidate the result if uncorrected

Response¶

v4's post-mortem produced /intent-watch:

Automated monitoring of experiment directories against source-of-truth documents
Catches drift as it happens rather than after the experiment concludes
Integrated into ml-lab's workflow as a mandatory Gate 1 check

Artifacts¶

experiments/self_debate_experiment_v4/
POST_MORTEM.md — specification drift analysis
EXECUTION_PLAN.md — pre-registered execution plan
PREREGISTRATION.json — locked hypothesis and metrics