Independent ML research & tooling

These projects span agentic frameworks, empirical studies, reusable libraries, and paper reproductions. Some are built with those frameworks; some evaluate the frameworks themselves. All are independent of my day job.

Projects

Ferrum

A Python visualization library with a Rust core that applies a unified grammar of graphics to statistical and ML diagnostics (from scatter plots to SHAP summaries) without switching abstractions or hitting row limits.

ML Lab

A Claude Code system for structured ML hypothesis investigation. Sharpens a vague claim into a falsifiable one, then puts it through adversarial debate or ensemble critique, empirical testing, peer review, and a coherence audit. Designed for rigor over speed.

LLM Preamble

Two investigations into whether coding-agent system prompts change LLM code quality. They do, through attention allocation rather than generic lift: preambles improve whichever rubric dimensions they enumerate, so the 'best' preamble is whichever one matches your downstream evaluator.

ATO Device Embeddings

A controlled FastText-embedding study for account-takeover anomaly detection. Four pre-specified findings reproduced on 5/5 seeds isolate one positive design (mean-pool per-feature beats concatenated strings by +0.131 AUC on spoof attacks) and three implementation traps that standard aggregate evaluation conceals.

Imbalanced Losses

A PyTorch library of training losses for class-imbalanced classification (Focal Loss, Smooth-AP, Recall-at-Quantile, and Partial-AUC-at-Budget) with DDP all-gather support for globally correct rank estimation under distributed training.

ML Wiki

A personal research wiki that ingests papers, Zotero libraries, PDFs, and experiment logs into a structured directory of interlinked markdown pages, with semantic fragment search, contradiction detection, and staleness tracking, all orchestrated through Claude Code.

Representation-learning reproductions

Reproductions of unsupervised representation-learning methods, all evaluated under fixed linear-probing protocols. Three tabular SSL papers (MET, SCARF, VIME) as separate repos under one shared protocol, plus a sweep of autoencoder architectures compared on MNIST.

Small Python libraries

Four narrowly-scoped Python libraries that each solve one repetitive data-prep problem cleanly (classical feature selection, fuzzy string deduplication, time-series feature engineering, and reusable function composition), three of them slotting into the standard scikit-learn Pipeline.

DeepSets PyTorch

A PyTorch implementation of Deep Sets (Zaheer et al.): permutation-invariant and equivariant neural networks for learning functions on sets, with a test suite that verifies the paper's theorems.

About