Ferrum — Project

Overview

Ferrum is a statistical visualization library for Python built on a Rust core. It covers exploratory charts, statistical graphics, and ML model diagnostics under a single grammar of graphics. A scatter plot, a faceted histogram, a confusion matrix, and a SHAP beeswarm are all built the same way.

The problem

Python visualization fragments one activity into too many mental models. Statistical graphics, interactive charts, convenience plotting, and model diagnostics are treated as separate domains (matplotlib, seaborn, Altair, Plotly, Yellowbrick), and each switch introduces a new object model, new defaults, and new limitations. Ferrum exists to reject that fragmentation.

One chart model

Every visualization follows the same grammar: data, encodings, marks, scales, coordinate systems, transforms, and views. The library is grammar-first but not grammar-only. Convenience helpers like displot and rocchart exist for speed, but they are sugar over the grammar rather than parallel APIs with their own rules, so a chart from a helper stays as themeable and composable as one written from first principles.

Statistical operations live inside the rendering pipeline. KDEs, LOESS fits, bootstrap confidence intervals, and binning are declarative chart operations, not manual preprocessing, which keeps plotting code short and makes statistical assumptions visible in the spec itself. Interactivity is a render mode, not a rewrite: .interactive() changes the rendering path, not the user’s conceptual model.

import ferrum as fm

chart = (
    fm.Chart(iris)
    .mark_point()
    .encode(x="sepal_length", y="petal_length", color="species:N")
)
chart.save("plot.svg")

Charts compose with operators: + layers, | places side by side, & stacks.

What you can build

Beyond general charts (scatter, line, histogram, KDE, box, violin, heatmap, bar), Ferrum treats model output as just another dataset. ROC and PR curves, confusion matrices, calibration curves, residuals, partial dependence plots, learning and validation curves, and SHAP beeswarm/bar/waterfall plots all feed the same chart system, so diagnostics are as composable and themeable as any other plot.

Python declares, Rust computes

Python is the declaration layer; Rust is the computation layer. The boundary is crossed once via the Arrow C Data Interface, avoiding row-level copying. Rust handles layout, statistical transforms, and rendering; Python constructs the specification and manages the API. The same chart grammar works at 100 rows and at production scale.

Scale and operational simplicity

Rendering is pure Rust (SVG and PNG output with no Cairo, X11, or display server), so it runs cleanly in notebooks, containers, and CI. Auto-rasterization and GPU-backed rendering scale from hundreds to millions of rows behind the same spec, including full-sample SHAP plots that would be impractical elsewhere. Dataframes are met where they are: pandas, Polars, cuDF, Dask, and others are normalized to Arrow once, so the Rust core only ever sees one shape.

How it was built

Ferrum is also a record of how it was made. Roughly 104,000 lines of Python and Rust (across twelve phases, 975 commits, and nearly 3,900 tests) were written in ten days by one human directing an agentic Claude Code framework.

The velocity came from process, not typing speed. Every phase followed the same order: brainstorm, write a design spec, write an implementation plan, then execute. No phase began until both documents were approved. That front-loaded the hard architectural decisions (the Rust/Python boundary, the no-matplotlib constraint, the serialization format) into early design sessions, so later work executed against settled architecture instead of re-litigating it.

Review was enforced structurally rather than periodically. A layered automation pipeline placed a read-only review gate on every staged commit, ran full subsystem audits at phase boundaries, and dispatched repeatable quality campaigns: combinatorial test sweeps, parallel bug hunts, and gallery audits that render every chart type against seaborn and Yellowbrick. Coding agents never commit; an orchestrator handles staging and gate dispatch, which makes the review pipeline impossible to skip.

The split that made it scale: a high-capability model orchestrates (reading specs, decomposing work, making cross-cutting decisions) while specialist agents do the line-by-line Python and Rust. Architectural judgment is expensive and rare; execution is cheap and frequent, and the system matches work to cost.