Research¶ ml-lab's methodology has been empirically evaluated across eight experiment versions with pre-registered hypotheses. This section contains the working paper, related work survey, per-version experiment reports, and presentation materials. Resource Description Working Paper Full research paper on the adversarial debate evaluation methodology Related Work Literature survey covering LLM-as-judge, multi-agent debate, and evaluation calibration Experiment Reports¶ Version Focus Status v1-v3 Initial methodology validation and calibration issues Complete v4 First pre-registered study Complete v5 Harder benchmark case generation pipeline Complete v6 Ensemble mode extension and cross-vendor scoring Complete v7 Refined evaluation with sensitivity analysis Complete v8 Defense-case calibration (active) In progress Presentations¶ Slides — evaluation overview, ml-journal system, and plugin architecture decks