v8: Defense-Case Calibration (Active)¶
v8 is the current active experiment, addressing a calibration gap discovered in v7.
Problem¶
Previous versions validated:
- Detection — can the critic find planted flaws? (v2–v7: yes)
- Ambiguity judgment — does the system handle genuinely ambiguous cases? (v6–v7: yes)
But defense-case performance — cases where the PoC is actually correct and the critic should find nothing material — had not been rigorously evaluated. A system that always finds flaws (even when none exist) would score well on detection but fail in practice.
Hypothesis¶
The debate protocol correctly identifies "defense wins" cases (where the PoC is sound) at a rate above chance, with false-positive rates below a pre-specified threshold.
Design¶
- Benchmark cases expanded to include a substantial proportion of "correct" PoCs
- Defense-case ground truth:
correct_position = defense_wins - Metrics: false-positive rate, defense-case verdict accuracy
- Pre-registered with
/intent-watchenforcement
Status¶
In progress. Phases, raw outputs, prompts, seeds, and a STATUS.md are available in the experiment directory.
Artifacts¶
experiments/self_debate_experiment_v8/- 31 files including phases, raw outputs, prompts, and seeds
STATUS.md— current progress