Research¶

ml-lab's methodology has been empirically evaluated across eight experiment versions with pre-registered hypotheses. This section contains the working paper, related work survey, per-version experiment reports, and presentation materials.

Resource	Description
Working Paper	Full research paper on the adversarial debate evaluation methodology
Related Work	Literature survey covering LLM-as-judge, multi-agent debate, and evaluation calibration

Experiment Reports¶

Version	Focus	Status
v1-v3	Initial methodology validation and calibration issues	Complete
v4	First pre-registered study	Complete
v5	Harder benchmark case generation pipeline	Complete
v6	Ensemble mode extension and cross-vendor scoring	Complete
v7	Refined evaluation with sensitivity analysis	Complete
v8	Defense-case calibration (active)	In progress

Presentations¶

Slides — evaluation overview, ml-journal system, and plugin architecture decks