Skip to content

LLM Preamble Experiments

Empirical investigation of whether coding-agent preambles measurably change the quality of code that LLMs produce. They do — and the channel is load-bearing in both directions. Two pre-registered investigations, 1,290 generations, 25,140 cross-judge ratings.

TL;DR

The preamble channel is genuinely load-bearing — content choices measurably move outputs in either direction relative to a no-preamble baseline. There is no universal "best preamble"; a preamble's effect is governed by overlap between the dimensions the preamble enumerates and the dimensions your downstream evaluator measures. Modest effect sizes in either direction (~3–6 points out of 100).

The full headline table with empirical evidence per claim is in the five findings.

Start here

If you're a practitioner deciding what to put in your agent's system prompt

If you're a researcher or methodologist

  • Explanation — mechanism arguments: attention-allocation, enumeration-vs-demonstration, the verified system-vs-user channel asymmetry, why static metrics miss, the v1 → v2 instrument correction.
  • Methodology — pre-registration, ml-lab debate workflow, investigation logs, the five spec amendments, limitations.
  • Statistical methods — Kruskal–Wallis omnibus, mixed-effects M0/M1/M2, bootstrap 95% CI.
  • Related work — situating in the 2024–2026 literature on persona/system-prompt effects and LLM-as-judge evaluation.

If you want to reproduce the runs

What this site is

A Diátaxis-organized companion to the llm-preamble repo. The repo's README.md, PREAMBLES.md, RELATED_WORK.md, and per-cycle CONCLUSIONS.md remain the source-of-truth artifacts; this site reorganizes them for navigation and adds the explanation pages that consolidate mechanism arguments developed during and after the investigation. Every page links back to the underlying source — script, results file, or markdown artifact — under the repo root.

ML-Lab

Designed, executed, and analyzed using ml-lab — a Claude Code plugin for rigorous, pre-registered ML hypothesis investigations (hypothesis → adversarial critique → PoC → empirical resolution → peer review). Every artifact in this repo (HYPOTHESIS.md, SPEC_V2.md, CONCLUSIONS.md, REPORT_ADDENDUM.md, INVESTIGATION_LOG.jsonl) is a canonical output of that workflow.