Overview
Imbalanced Losses is a PyTorch library of training losses for class-imbalanced classification, the regime where the positive class is rare: fraud, anomaly detection, object detection. It provides focal and ranking-based objectives and, importantly, makes the ranking losses behave correctly under distributed training.
The problem
Cross-entropy and BCE minimize log-loss over the training distribution. When 99% of samples are negative, the global minimum is a model that predicts negative everywhere: high accuracy on a metric that no longer means anything. As the project puts it, “the loss on the 99% majority overwhelms any signal from the 1% minority.” The fix is either to down-weight the easy majority or to optimize a ranking metric directly.
The losses
- Sigmoid / Softmax Focal Loss. Focal objectives that down-weight easy examples so rare-class signal isn’t drowned out; drop-in replacements for BCE / cross-entropy.
- Smooth-AP. A differentiable approximation of Average Precision, for when optimizing AUCPR directly is the actual goal.
- Recall-at-Quantile. Optimizes recall above a score threshold set at a chosen quantile, matching fixed operating points like an alert system that can only review the top fraction of cases.
- Partial-AUC-at-Budget. Optimizes partial AUC over a false-positive-rate band around a target operating point, for when the constraint is a fixed false-alarm budget rather than a single threshold (fraud at 50 bps, say). Where Recall-at-Quantile pins one decision boundary, this optimizes across the whole band inside the budget.
- Loss Warmup Wrapper. Trains on plain BCE/CE during warmup, blends into the target loss, and decays temperature on a schedule, resetting the rank queue at the phase switch so warmup-era logits don’t poison it.
Globally correct ranking under DDP
This is the part most implementations get wrong. Standard losses decompose across samples, so sharding data across GPUs is harmless. Ranking losses do not decompose: a soft rank is a sum over the whole pool, and a quantile threshold is a property of the whole score distribution. Estimated on one GPU’s shard, both are (in the project’s words) “qualitatively wrong” when positives are rare and unevenly spread.
The library’s all-gather helpers collect logits and targets across every worker, preserve gradient flow for the local shard so autograd still works, and handle variable batch sizes, so each worker computes the same globally correct rank and threshold. A memory queue accumulates past batches to stabilize estimates at very low positive rates (at 0.5% positives with batch size 32, most batches would otherwise contain no positives at all), and temperature soft-ranking replaces the non-differentiable hard rank with a smooth sigmoid that approaches the true rank as temperature drops.
Engineering notes
The losses are instantiated and called like any PyTorch loss: loss = loss_fn(logits, targets); loss.backward(). What sets the library apart is the documentation discipline. An explicit “Assumptions and Failure Modes” guide gives per-loss breakdowns (focal loss stops working below ~0.01% positives; Smooth-AP fails with pools too small or temperatures too low at init; Partial-AUC-at-Budget needs the negative pool to comfortably exceed the inverse of the budget, about 200 negatives for a 50 bps band, or its tail estimate biases toward the largest negative score) and a diagnostic table mapping symptoms (loss stuck near 0.5, rare class never improving, threshold instability) to root causes. The scope is deliberately narrow, and the correctness claims are stated, tested, and bounded.