Configuring Experiments¶
This guide explains how to configure and run experiments using Hydra in Tabular SSL.
Introduction¶
Tabular SSL uses Hydra for configuration management, which enables hierarchical configuration composition, command-line overrides, and experiment tracking. This guide will show you how to:
- Use the configuration structure
- Create and run experiments
- Override default parameters
- Run parameter sweeps
Configuration Structure¶
The configuration files are organized in a hierarchical structure:
configs/
├── config.yaml # Main configuration
├── model/ # Model configurations
│ ├── default.yaml # Default model config
│ ├── event_encoder/ # Event encoder configs
│ ├── sequence_encoder/ # Sequence encoder configs
│ ├── embedding/ # Embedding configs
│ ├── projection_head/ # Projection head configs
│ └── prediction_head/ # Prediction head configs
├── data/ # Data configurations
├── trainer/ # Training configurations
├── callbacks/ # Callback configurations
├── logger/ # Logger configurations
├── experiment/ # Experiment configurations
├── hydra/ # Hydra-specific configurations
└── paths/ # Path configurations
Basic Usage¶
Running with Default Configuration¶
To run with the default configuration:
python src/train.py
This will use the configuration in configs/config.yaml
, which composes configurations from the other directories.
Overriding Parameters¶
You can override any parameter using the command line:
python src/train.py model.optimizer.lr=0.001 trainer.max_epochs=50
This will override the learning rate and the maximum number of epochs while using the default values for all other parameters.
Using a Specific Configuration¶
You can use a specific configuration for a component:
python src/train.py model/event_encoder=mlp model/sequence_encoder=transformer
This will use the MLP event encoder and Transformer sequence encoder configurations.
Creating Experiments¶
Experiment Configuration Files¶
Experiment configuration files are stored in configs/experiment/
and provide a way to group parameter overrides.
Here's an example experiment configuration file:
# configs/experiment/transformer_ssl.yaml
# @package _global_
defaults:
- override /model/event_encoder: mlp.yaml
- override /model/sequence_encoder: transformer.yaml
- override /trainer: default.yaml
- override /model: default.yaml
- override /callbacks: default.yaml
- _self_
tags: ["transformer", "ssl"]
seed: 12345
trainer:
max_epochs: 100
gradient_clip_val: 0.5
model:
optimizer:
lr: 1.0e-4
weight_decay: 0.01
Key things to note:
# @package _global_
: This indicates that the configuration should be merged at the global leveldefaults
: Specifies which configurations to use as defaultsoverride /path/to/config
: Overrides a specific configuration_self_
: Ensures that the current file's configurations are applied after all others
Running an Experiment¶
To run an experiment:
python src/train.py experiment=transformer_ssl
This will use the configuration defined in configs/experiment/transformer_ssl.yaml
.
Extending an Experiment¶
You can extend an experiment by overriding its parameters:
python src/train.py experiment=transformer_ssl trainer.max_epochs=200
This will use the transformer_ssl experiment configuration with the maximum epochs set to 200.
Debugging¶
Debug Mode¶
You can run in debug mode to speed up debugging:
python src/train.py debug=true
This will typically: - Run on a smaller dataset - Use fewer epochs - Disable certain features like logging
Experiment Tracking¶
Logging and Output¶
When you run an experiment, Hydra creates an output directory for that run:
outputs/
└── 2023-06-15/
└── 12-34-56/
├── .hydra/
│ ├── config.yaml
│ ├── hydra.yaml
│ └── overrides.yaml
├── checkpoints/
└── logs/
The .hydra/
directory contains the full configuration that was used for the run.
Tags¶
You can add tags to your experiments:
# configs/experiment/transformer_ssl.yaml
tags: ["transformer", "ssl"]
Or via the command line:
python src/train.py tags="[transformer, ssl]"
These tags can be used for filtering and grouping experiments.
Parameter Sweeps¶
Hydra allows you to perform parameter sweeps by specifying multiple values for a parameter.
Basic Sweep¶
python src/train.py -m model.optimizer.lr=1e-3,1e-4,1e-5
This will run three experiments with different learning rates.
Multi-Parameter Sweep¶
python src/train.py -m model.optimizer.lr=1e-3,1e-4 model.optimizer.weight_decay=0.01,0.001
This will run 4 experiments (2 learning rates × 2 weight decay values).
Sweep with Experiment¶
python src/train.py -m experiment=transformer_ssl,s4_ssl
This will run both the transformer_ssl and s4_ssl experiments.
Advanced Configuration¶
Using Environment Variables¶
You can use environment variables in your configurations:
data:
path: ${oc.env:DATA_PATH,/default/path}
This will use the DATA_PATH
environment variable if it exists, or fall back to /default/path
.
Using Interpolation¶
You can reference other configuration values:
model:
input_dim: 64
hidden_dim: ${model.input_dim} # References input_dim
Dynamic Default Values¶
You can compute default values based on other parameters:
model:
input_dim: 64
hidden_dim: ${eval:2 * ${model.input_dim}} # Dynamic computation
Best Practices¶
Naming Conventions¶
- Use descriptive names for experiment files
- Group related parameters together
- Use consistent naming across configurations
Configuration Structure¶
- Keep configuration files small and focused
- Use defaults for common parameters
- Override only what's necessary
Experiment Management¶
- Use meaningful tags for experiments
- Add a brief description in the experiment file
- Document key parameter choices
Typical Workflow¶
- Start with an existing experiment:
python src/train.py experiment=transformer_ssl
- Make modifications via the command line:
python src/train.py experiment=transformer_ssl model.optimizer.lr=1e-5
- If the modifications work well, create a new experiment file
- Run parameter sweeps to find optimal values:
python src/train.py -m experiment=my_new_experiment model.optimizer.lr=1e-3,1e-4,1e-5
Conclusion¶
Hydra provides a powerful way to configure and track experiments in Tabular SSL. By using experiment configuration files, command-line overrides, and parameter sweeps, you can efficiently explore the parameter space and find the best configurations for your specific task.