Getting Started with Tabular SSL¶

Time to complete: 10 minutes

Welcome! This tutorial will get you up and running with Tabular SSL in just a few minutes. You'll explore our interactive demos, understand state-of-the-art corruption strategies, and run your first self-supervised learning experiment.

What You'll Learn¶

How to install and set up Tabular SSL
How to explore corruption strategies with interactive demos
How to train with real credit card transaction data
How to run state-of-the-art SSL experiments (VIME, SCARF, ReConTab)
Basic concepts of tabular self-supervised learning

Prerequisites¶

Python 3.8+
Basic familiarity with command line

Step 1: Installation¶

Let's start by installing Tabular SSL:

# Clone the repository
git clone https://github.com/yourusername/tabular-ssl.git
cd tabular-ssl

# Install the package
pip install -e .

# Set up the Python path
export PYTHONPATH=$PWD/src

✅ Checkpoint: Verify your installation works:

python -c "import tabular_ssl; print('✅ Installation successful!')"

Step 2: Explore Corruption Strategies (Interactive Demo)¶

Before diving into training, let's understand how tabular self-supervised learning works through our interactive demo:

python demo_corruption_strategies.py

This demo shows you: - VIME corruption: Value imputation and mask estimation - SCARF corruption: Contrastive learning with feature swapping - ReConTab corruption: Multi-task reconstruction - Side-by-side comparison of all approaches

✅ What to expect: You'll see how each corruption strategy transforms data, corruption rates, and example outputs.

Step 3: Try Real Data (Credit Card Demo)¶

Now let's work with real transaction data:

python demo_credit_card_data.py

This demo: - Downloads real credit card transaction data from IBM TabFormer - Shows data preprocessing and sequence creation - Demonstrates DataModule integration - Prepares you for actual training

✅ What to expect: Download progress, data statistics, and confirmation that everything is ready for training.

Step 4: Your First SSL Training¶

Now let's train a state-of-the-art self-supervised model:

python train.py +experiment=vime_ssl

This experiment uses VIME (Value Imputation and Mask Estimation): - Corrupts transaction data by masking features - Learns to predict which features were masked (mask estimation) - Learns to reconstruct original values (value imputation) - Creates powerful representations for downstream tasks

✅ What to expect: Training progress with mask estimation and reconstruction losses, checkpoints saved to outputs/.

Step 5: Try Different SSL Methods¶

Let's experiment with other state-of-the-art approaches:

# SCARF: Contrastive learning with feature corruption
python train.py +experiment=scarf_ssl

# ReConTab: Multi-task reconstruction
python train.py +experiment=recontab_ssl

Each approach uses different corruption strategies: - SCARF: Replaces features with values from other samples (contrastive learning) - ReConTab: Combines masking, noise, and swapping (multi-task reconstruction)

Step 6: Customize SSL Training¶

You can easily modify SSL experiments using Hydra's override syntax:

# Adjust corruption rate for VIME
python train.py +experiment=vime_ssl model/corruption.corruption_rate=0.5

# Change SCARF corruption strategy
python train.py +experiment=scarf_ssl model/corruption.corruption_strategy=marginal_sampling

# Use different sequence length
python train.py +experiment=recontab_ssl data.sequence_length=64

# Adjust learning rate and batch size
python train.py +experiment=vime_ssl model.learning_rate=5e-4 data.batch_size=32

Step 7: Check Your Results¶

After running experiments, you'll find results in the outputs/ directory:

ls outputs/  # See your experiment runs

Each run creates a timestamped folder with: - Configuration files (reproducing exact settings) - Training logs (TensorBoard, WandB compatible) - Model checkpoints (best performing models) - Metrics and plots (loss curves, validation metrics)

For SSL experiments, you'll see specific losses: - VIME: mask_estimation_loss, value_imputation_loss - SCARF: contrastive_loss - ReConTab: masked_reconstruction, denoising, unswapping losses

Core Concepts Summary¶

Corruption Strategies: Core of self-supervised learning - VIME: Masks features, learns to predict masks and values - SCARF: Swaps features between samples for contrastive learning
- ReConTab: Multi-task corruption (masking + noise + swapping)

SSL Experiments: Pre-configured state-of-the-art approaches - Located in configs/experiments/ - Use +experiment=vime_ssl (or scarf_ssl, recontab_ssl) to run them

Model Components: Modular architecture - Event encoders (process individual transactions) - Sequence encoders (model temporal patterns) - Corruption modules (transform data for SSL) - Task-specific heads (reconstruction, classification)

Configuration: Hydra-based flexible settings - Override corruption rates: model/corruption.corruption_rate=0.5 - Mix components: model/sequence_encoder=rnn model/corruption=scarf

What's Next?¶

🎯 Ready for more? Continue with: - Custom Components Tutorial - Create your own corruption strategies - How-to: SSL Training - Advanced self-supervised learning - Reference: Corruption Strategies - Technical documentation - Reference: Models - Complete component documentation

Troubleshooting¶

Import errors? Make sure PYTHONPATH is set:

export PYTHONPATH=$PWD/src

CUDA out of memory? Try smaller batch sizes:

python train.py +experiment=simple_mlp data.batch_size=16

Need help? Check our support resources or open an issue on GitHub.

Congratulations! 🎉 You've successfully run your first Tabular SSL experiments. You're now ready to explore more advanced features and customize the library for your specific needs.