Use Debate vs Ensemble Mode¶
ml-lab offers two review modes. Choose the one that matches your situation.
Debate mode (default)¶
Use when you want a deep, convergent review that resolves every finding to a verdict.
What happens:
ml-criticproduces initial findings with severity levelsml-defenderresponds with structured rebuttals (7-type taxonomy: CONCEDE, REBUT-DESIGN, REBUT-SCOPE, REBUT-EVIDENCE, REBUT-IMMATERIAL, DEFER, EXONERATE)- Stage B runs 2–4 challenge/response rounds until
derive_verdict()determines convergence - Each finding gets a deterministic verdict:
critique_wins,defense_wins, orempirical_test_agreed
Best for: hypothesis-driven investigations where you want every issue resolved before experimenting.
Ensemble mode (opt-in)¶
Use when you want a high-recall sweep and will triage precision manually.
What happens:
ml-criticis dispatched 3 times independently — no visibility between critics- Findings are clustered by root cause
- Each issue gets a support count: 3/3, ⅔, or ⅓
ENSEMBLE_REVIEW.mdis written with tier-weighted output
Best for: broad exploratory sweeps where missing a real issue is costlier than triaging false positives. ⅓ minority findings need explicit user confirmation before entering experiment design.
Choosing between them¶
| Factor | Debate | Ensemble |
|---|---|---|
| Issue resolution | Deterministic verdict per finding | Manual triage by support tier |
| Depth | Deep — multi-round convergence | Broad — independent perspectives |
| False positive rate | Low (defender filters) | Higher (union pooling) |
| False negative rate | Higher (single critic) | Low (3× independent critics) |
| When to use | Focused hypothesis testing | Exploratory audit, unknown risk surface |
Tip
After v8 calibration fixes, debate mode is recommended for all standard investigations. Use ensemble when you're exploring a new codebase or problem domain and don't yet know what to worry about.