Ferrum
A Python visualization library with a Rust core that applies a unified grammar of graphics to statistical and ML diagnostics (from scatter plots to SHAP summaries) without switching abstractions or hitting row limits.
These projects span agentic frameworks, empirical studies, reusable libraries, and paper reproductions. Some are built with those frameworks; some evaluate the frameworks themselves. All are independent of my day job.
A Python visualization library with a Rust core that applies a unified grammar of graphics to statistical and ML diagnostics (from scatter plots to SHAP summaries) without switching abstractions or hitting row limits.
A Claude Code system for structured ML hypothesis investigation. Sharpens a vague claim into a falsifiable one, then puts it through adversarial debate or ensemble critique, empirical testing, peer review, and a coherence audit. Designed for rigor over speed.
Two investigations into whether coding-agent system prompts change LLM code quality. They do, through attention allocation rather than generic lift: preambles improve whichever rubric dimensions they enumerate, so the 'best' preamble is whichever one matches your downstream evaluator.
A controlled FastText-embedding study for account-takeover anomaly detection. Four pre-specified findings reproduced on 5/5 seeds isolate one positive design (mean-pool per-feature beats concatenated strings by +0.131 AUC on spoof attacks) and three implementation traps that standard aggregate evaluation conceals.
A PyTorch library of training losses for class-imbalanced classification (Focal Loss, Smooth-AP, Recall-at-Quantile, and Partial-AUC-at-Budget) with DDP all-gather support for globally correct rank estimation under distributed training.
A personal research wiki that ingests papers, Zotero libraries, PDFs, and experiment logs into a structured directory of interlinked markdown pages, with semantic fragment search, contradiction detection, and staleness tracking, all orchestrated through Claude Code.
Reproductions of unsupervised representation-learning methods, all evaluated under fixed linear-probing protocols. Three tabular SSL papers (MET, SCARF, VIME) as separate repos under one shared protocol, plus a sweep of autoencoder architectures compared on MNIST.
Four narrowly-scoped Python libraries that each solve one repetitive data-prep problem cleanly (classical feature selection, fuzzy string deduplication, time-series feature engineering, and reusable function composition), three of them slotting into the standard scikit-learn Pipeline.
A PyTorch implementation of Deep Sets (Zaheer et al.): permutation-invariant and equivariant neural networks for learning functions on sets, with a test suite that verifies the paper's theorems.