Skip to content

Research

ml-lab's methodology has been empirically evaluated across eight experiment versions with pre-registered hypotheses. This section contains the working paper, related work survey, per-version experiment reports, and presentation materials.

Resource Description
Working Paper Full research paper on the adversarial debate evaluation methodology
Related Work Literature survey covering LLM-as-judge, multi-agent debate, and evaluation calibration

Experiment Reports

Version Focus Status
v1-v3 Initial methodology validation and calibration issues Complete
v4 First pre-registered study Complete
v5 Harder benchmark case generation pipeline Complete
v6 Ensemble mode extension and cross-vendor scoring Complete
v7 Refined evaluation with sensitivity analysis Complete
v8 Defense-case calibration (active) In progress

Presentations

  • Slides — evaluation overview, ml-journal system, and plugin architecture decks