How-to guides¶
Task-oriented recipes for putting the v2 findings to work. Each page is a step-by-step procedure with concrete commands and decision points.
- Design a preamble for your system — the six-step procedure derived from the v2 findings.
- Interpret CQS-craft effect sizes — when the measured effect matters, and when it doesn't.
- A/B test a candidate preamble against a baseline — validation path using your own evaluator.
- Extend the rubric for a non-Python domain or different evaluator — add dimensions, calibrate, then run.
- Add a new preamble condition to the v2 main run — concrete edits to the main script with line refs.
- Run a 3-probe identification test on your own preamble — rule out the rubric-overlap confound before claiming a real-world effect.
For why these recipes work, see the findings and the explanation section. For the canonical metric and judge definitions, see the reference section.