Explanation¶
Understanding the design decisions, trade-offs, and evolution behind ml-lab. These pages answer why, not how.
| Topic | What it covers |
|---|---|
| Why a Metaflow Pipeline | Why investigations that outgrow a single-cell PoC are promoted onto a config-driven flow: reproducibility, consistency, accuracy, and why the pipeline never carries the PoC forward |
| Project Origin | How a FastText experiment recursed into its own evaluation infrastructure |
| The Experiment Arc | Why eight experiment versions exist and what each one taught |
| Debate Protocol | Why adversarial critique works, how the verdict function enforces convergence, and when ensemble mode is better |
| Evaluation Methodology | Pre-registration, metrics, scoring, cross-vendor evaluation, and statistical methods |
| Post-Mortems & Lessons | What broke across v3-v5 and what the fixes revealed about LLM evaluation |