Run an ML Investigation¶
Start the investigation¶
Invoke the skill:
Or describe your hypothesis in natural language — Claude Code routes to ml-lab automatically.
Provide your hypothesis¶
ml-lab will ask you to sharpen your hypothesis into a falsifiable claim. You need:
- The claim — what you believe is true
- The metric — how you'll measure it
- The pass criteria — what outcome would falsify the claim
You'll also choose:
- Review mode —
debate(default, adversarial) orensemble(3× independent critics) - Report mode —
full_reportorconclusions_only
Wait for the PoC¶
ml-lab builds a minimal proof-of-concept to confirm the measurement works before investing in review. This is typically a few lines of code and a handful of API calls.
Review the PoC and confirm it measures what you intend.
Review the critique¶
In debate mode, you'll see:
CRITIQUE.md— initial critic findings with severity levels (FATAL, MATERIAL, MINOR)DEFENSE.md— structured rebuttals (CONCEDE, REBUT-*, DEFER, EXONERATE)- Stage B rounds — challenge/response loop until verdicts converge
In ensemble mode, you'll see ENSEMBLE_REVIEW.md with tier-weighted findings (3/3, ⅔, ⅓ support).
Approve the experiment plan¶
Gate 1 requires your approval. Review:
- All pre-flight items are CLOSED
- The experiment design matches your intent
- Cost estimates are acceptable
Optional: promote to a Metaflow flow¶
After Gate 1 approval, a multi-cell investigation may be promoted onto a config-driven Metaflow+Hydra pipeline before Step 6 runs. This is a judgment call, not a requirement — the default signal is more than one cell or more than one distinct analysis. Quick single-cell PoCs skip this entirely and proceed to Step 6 unchanged.
When you promote, Step 6 runs as the Metaflow flow instead of inline. The promotion gate is: flow-lint → pipeline-reviewer → determinism check (prevent → lint → review → prove).
Run /pipeline-init to start the promotion. See Promote an Investigation to a Metaflow Flow for the full procedure.
Monitor the experiment¶
During Step 6, /intent-watch runs to catch pre-registration drift. If a HIGH or CRITICAL conflict is flagged, resolve it before continuing.
Review conclusions¶
ml-lab synthesizes CONCLUSIONS.md with the primary result, figures, and a verdict against your pre-specified pass criteria.
If findings are surprising enough to falsify a review assumption, ml-lab proposes a macro-iteration — reopening the review cycle with results in hand. Up to 3 macro-iteration cycles are allowed.
Optional: peer review and final report¶
After conclusions, ml-lab offers:
- Step 10 — Peer review loop (
research-reviewer+research-reviewer-lite, up to 3 rounds) - Step 11 — Final
TECHNICAL_REPORT.mdin results mode
Both require your confirmation to start.
You have now...¶
Completed a full ml-lab investigation from hypothesis to conclusions, with adversarial review, pre-registered experiments, and optional peer review.