Tasks¶
The v2 main run uses 7 Python coding tasks: 5 creation (one of which is multi-file) and 2 refactor. Each task is run under every (preamble × model × rep) cell.
Source of truth: preamble_quality_v2_main.py lines 278–399 (the TASKS list).
The task task_modeflag_sort was dropped during pre-flight Phase B (Amendment A2) after a behavior-only spec eliminated the smell across all conditions, leaving no discriminative space.
Tasks at a glance¶
| ID | Category | Short description |
|---|---|---|
task_lru_ttl_cache |
creation | Thread-safe LRU cache with per-entry TTL |
task_expr_parser |
creation | Recursive-descent arithmetic expression parser |
task_mini_sql_engine |
creation | In-memory SQL-like query engine with fluent API |
task_rate_limiter_family |
creation | Four rate-limiter algorithms behind one interface |
task_kv_store_package |
multifile_creation | Multi-file KV store package (stdlib only) |
task_flag_class |
refactor | Refactor boolean-flag class with global state |
task_exception_pyramid |
refactor | Flatten nested try/except pyramids |
Creation tasks (5)¶
task_lru_ttl_cache¶
Name. Thread-safe LRU cache with per-entry TTL.
Write a Python class implementing a thread-safe LRU cache with per-entry TTL expiry. Requirements: O(1) get and put; per-entry TTL (entries expire independently); LRU eviction (among non-expired entries; expired entries opportunistically reaped); thread-safe;
get(key)returns value if present and not expired, else None;put(key, value, ttl_seconds)stores the entry;size()returns the count of non-expired entries. Include type hints and a brief usage example at the bottom. Do not usefunctools.lru_cacheor any third-party caching library.
Activates concurrency_safety and example_quality rubric dimensions.
task_expr_parser¶
Name. Recursive-descent arithmetic expression parser and evaluator.
Write a Python recursive-descent parser and evaluator for arithmetic expressions. Supports integers and floats,
+,-,*,/,**(right-associative), unary minus. Correct operator precedence:**> unary minus >*/>+-. Parentheses for grouping. Tokenizer + parser + evaluator. Public API:evaluate(expression: str) -> float. Include type hints. Include at least 3 usage examples covering precedence, parentheses, and unary minus.
Activates example_quality; does not activate concurrency_safety. This is the single task used for all three confound probes (see confound_probes.py).
task_mini_sql_engine¶
Name. In-memory SQL-like query engine with fluent composable API.
Implement a small in-memory SQL-like query engine in Python. A
Tableclass holds a list of dict rows. Support:select(*columns),where(predicate),join(other_table, on=lambda l,r: bool),group_by(column),order_by(column, descending=False),limit(n). Composable fluent API. Include type hints and 2–3 usage examples.
Activates example_quality.
task_rate_limiter_family¶
Name. Family of rate limiters behind a single interface.
Implement a family of rate limiters in Python behind a single interface: token bucket, leaky bucket, sliding-window-log, sliding-window-counter. All implement an
allow(client_id) -> boolmethod. Each algorithm is its own class but they share an interface (base class or protocol). Include type hints, brief usage example showing each in use.
Activates example_quality; concurrency_safety is plausible but not stated.
task_kv_store_package¶
Name. Multi-file KV store package. Category: multifile_creation (not creation).
Design and implement a small Python package for an in-memory key-value store with TTL, atomic updates, and bulk ops. Use stdlib only. Sketch the file layout (you may write each file as a separate
python:path/file.pyfenced block) — typical layout: a top-level package, a core module, an optional CLI module, and a tests file. Include type hints and a brief usage example.
The only multi-file task. Extraction joins fenced blocks on # --- file boundary --- (see extract_python_code() at line 518).
Refactor tasks (2)¶
Both refactor tasks ship a "before" code block (TASK5_BEFORE, TASK6_BEFORE) embedded in the prompt and demand the model return a single refactored block.
task_flag_class¶
Name. Refactor boolean-flag class with global state into clean design.
Refactor the following Python code. The class uses boolean mode-flags to select behavior variants at runtime, and relies on hidden module-global state for caching and statistics. […] - Eliminate boolean mode-flags; use composition, strategy pattern, or explicit subclasses - Remove all module-global state; the class must be fully self-contained and independently testable - Preserve all observable behaviors (filtering, deduplication, normalization, validation, caching, statistics) - The refactored code must be importable and safe for concurrent use across multiple instances - Add type hints where missing
The "before" code is TASK5_BEFORE, defined in preamble_quality_v2_main.py above the TASKS list.
task_exception_pyramid¶
Name. Flatten exception pyramid: make errors explicit and loud.
Refactor the following Python code. It uses deeply nested try/except blocks with broad exception catches and silently swallows most errors. […] - Eliminate all nested try/except structures — maximum 1 level of nesting - Remove all bare
except Exception: passclauses; every exception must either be re-raised, logged with full context, or converted to a specific typed exception - Preserve the function signature:load_and_merge_configs(primary_path, override_path=None)- The function should still return a merged dict (or{}on missing primary) - Make all error conditions visible: callers should be able to distinguish 'primary config missing' from 'primary config malformed' from 'override malformed' - Add type hints
The "before" code is TASK6_BEFORE.
Creation vs. refactor stratification¶
The v2 hypothesis predicts a larger preamble effect on creation tasks than on refactor tasks — refactor prompts are tightly constrained by the input code and leave less surface for preamble-driven craft choices. The per-task and category-stratified CQS tables are in experiment_v2_results/REPORT.md §3.
For the user-message construction (including the trivial_baseline override where the user message is just the task name), see build_user_prompt() at line 642.