Configuring Experiments¶

This guide explains how to configure and run experiments using Hydra in Tabular SSL.

Introduction¶

Tabular SSL uses Hydra for configuration management, which enables hierarchical configuration composition, command-line overrides, and experiment tracking. This guide will show you how to:

Use the configuration structure
Create and run experiments
Override default parameters
Run parameter sweeps

Configuration Structure¶

The configuration files are organized in a hierarchical structure:

configs/
├── config.yaml                # Main configuration
├── model/                     # Model configurations
│   ├── default.yaml          # Default model config
│   ├── event_encoder/        # Event encoder configs
│   ├── sequence_encoder/     # Sequence encoder configs
│   ├── embedding/            # Embedding configs
│   ├── projection_head/      # Projection head configs
│   └── prediction_head/      # Prediction head configs
├── data/                     # Data configurations
├── trainer/                  # Training configurations
├── callbacks/                # Callback configurations
├── logger/                   # Logger configurations
├── experiment/               # Experiment configurations
├── hydra/                    # Hydra-specific configurations
└── paths/                    # Path configurations

Basic Usage¶

Running with Default Configuration¶

To run with the default configuration:

python src/train.py

This will use the configuration in configs/config.yaml, which composes configurations from the other directories.

Overriding Parameters¶

You can override any parameter using the command line:

python src/train.py model.optimizer.lr=0.001 trainer.max_epochs=50

This will override the learning rate and the maximum number of epochs while using the default values for all other parameters.

Using a Specific Configuration¶

You can use a specific configuration for a component:

python src/train.py model/event_encoder=mlp model/sequence_encoder=transformer

This will use the MLP event encoder and Transformer sequence encoder configurations.

Creating Experiments¶

Experiment Configuration Files¶

Experiment configuration files are stored in configs/experiment/ and provide a way to group parameter overrides.

Here's an example experiment configuration file:

# configs/experiment/transformer_ssl.yaml
# @package _global_

defaults:
  - override /model/event_encoder: mlp.yaml
  - override /model/sequence_encoder: transformer.yaml
  - override /trainer: default.yaml
  - override /model: default.yaml
  - override /callbacks: default.yaml
  - _self_

tags: ["transformer", "ssl"]

seed: 12345

trainer:
  max_epochs: 100
  gradient_clip_val: 0.5

model:
  optimizer:
    lr: 1.0e-4
    weight_decay: 0.01

Key things to note:

# @package _global_: This indicates that the configuration should be merged at the global level
defaults: Specifies which configurations to use as defaults
override /path/to/config: Overrides a specific configuration
_self_: Ensures that the current file's configurations are applied after all others

Running an Experiment¶

To run an experiment:

python src/train.py experiment=transformer_ssl

This will use the configuration defined in configs/experiment/transformer_ssl.yaml.

Extending an Experiment¶

You can extend an experiment by overriding its parameters:

python src/train.py experiment=transformer_ssl trainer.max_epochs=200

This will use the transformer_ssl experiment configuration with the maximum epochs set to 200.

Debugging¶

Debug Mode¶

You can run in debug mode to speed up debugging:

python src/train.py debug=true

This will typically: - Run on a smaller dataset - Use fewer epochs - Disable certain features like logging

Experiment Tracking¶

Logging and Output¶

When you run an experiment, Hydra creates an output directory for that run:

outputs/
└── 2023-06-15/
    └── 12-34-56/
        ├── .hydra/
        │   ├── config.yaml
        │   ├── hydra.yaml
        │   └── overrides.yaml
        ├── checkpoints/
        └── logs/

The .hydra/ directory contains the full configuration that was used for the run.

Tags¶

You can add tags to your experiments:

# configs/experiment/transformer_ssl.yaml
tags: ["transformer", "ssl"]

Or via the command line:

python src/train.py tags="[transformer, ssl]"

These tags can be used for filtering and grouping experiments.

Parameter Sweeps¶

Hydra allows you to perform parameter sweeps by specifying multiple values for a parameter.

Basic Sweep¶

python src/train.py -m model.optimizer.lr=1e-3,1e-4,1e-5

This will run three experiments with different learning rates.

Multi-Parameter Sweep¶

python src/train.py -m model.optimizer.lr=1e-3,1e-4 model.optimizer.weight_decay=0.01,0.001

This will run 4 experiments (2 learning rates × 2 weight decay values).

Sweep with Experiment¶

python src/train.py -m experiment=transformer_ssl,s4_ssl

This will run both the transformer_ssl and s4_ssl experiments.

Advanced Configuration¶

Using Environment Variables¶

You can use environment variables in your configurations:

data:
  path: ${oc.env:DATA_PATH,/default/path}

This will use the DATA_PATH environment variable if it exists, or fall back to /default/path.

Using Interpolation¶

You can reference other configuration values:

model:
  input_dim: 64
  hidden_dim: ${model.input_dim}  # References input_dim

Dynamic Default Values¶

You can compute default values based on other parameters:

model:
  input_dim: 64
  hidden_dim: ${eval:2 * ${model.input_dim}}  # Dynamic computation

Best Practices¶

Naming Conventions¶

Use descriptive names for experiment files
Group related parameters together
Use consistent naming across configurations

Configuration Structure¶

Keep configuration files small and focused
Use defaults for common parameters
Override only what's necessary

Experiment Management¶

Use meaningful tags for experiments
Add a brief description in the experiment file
Document key parameter choices

Typical Workflow¶

Start with an existing experiment: python src/train.py experiment=transformer_ssl
Make modifications via the command line: python src/train.py experiment=transformer_ssl model.optimizer.lr=1e-5
If the modifications work well, create a new experiment file
Run parameter sweeps to find optimal values: python src/train.py -m experiment=my_new_experiment model.optimizer.lr=1e-3,1e-4,1e-5

Conclusion¶

Hydra provides a powerful way to configure and track experiments in Tabular SSL. By using experiment configuration files, command-line overrides, and parameter sweeps, you can efficiently explore the parameter space and find the best configurations for your specific task.