steps package#

Submodules#

steps.forward module#

Forward step selection module.

class steps.forward.ForwardSelector(normalize: bool = False, metric: str = 'aic')[source]#

Bases: BaseEstimator, SelectorMixin, StepsMixin

Class for forward stepwise feature selection.

Constructor method.

Parameters:
  • normalize (bool) – Whether to normalize data; default = False, assuming object used in pipeline.

  • metric (str) – Optimization metric to use; one of [‘aic’, ‘bic’].

__init__(normalize: bool = False, metric: str = 'aic')[source]#

Constructor method.

Parameters:
  • normalize (bool) – Whether to normalize data; default = False, assuming object used in pipeline.

  • metric (str) – Optimization metric to use; one of [‘aic’, ‘bic’].

fit(X: ndarray, y: ndarray) ForwardSelector[source]#

Fit a best subset regression.

Parameters:
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Returns:

self (object)

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

static get_estimator(y: ndarray) Any#

Get an estimator for subset/stepwise feature selection.

Parameters:

y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Returns:

Type[LinearRegression, LogisticRegression] – A Scikit-learn estimator.

get_feature_names_out(input_features=None)#

Mask feature names according to selected features.

Parameters:

input_features (array-like of str or None, default=None) – Input features.

  • If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].

  • If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:

feature_names_out (ndarray of str objects) – Transformed feature names.

static get_loss_func(y: ndarray) mean_squared_error | log_loss#

Get a loss function for subset/stepwise feature selection.

Parameters:

y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Returns:

Union[mean_squared_error, log_loss] – A Scikit-learn loss function.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

get_support(indices=False)#

Get a mask, or integer index, of the features selected.

Parameters:

indices (bool, default=False) – If True, the return value will be an array of integers, rather than a boolean mask.

Returns:

support (array) – An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform(X)#

Reverse the transformation operation.

Parameters:

X (array of shape [n_samples, n_selected_features]) – The input samples.

Returns:

X_r (array of shape [n_samples, n_original_features]) – X with columns of zeros inserted where features would have been removed by transform().

set_output(*, transform=None)#

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)#

Reduce X to the selected features.

Parameters:

X (array of shape [n_samples, n_features]) – The input samples.

Returns:

X_r (array of shape [n_samples, n_selected_features]) – The input samples with only the selected features.

steps.metrics module#

Custom metrics module.

steps.metrics.get_bic(loss: float, n: int, p: int)[source]#

Calcuate BIC score.

Parameters:
  • loss (float) – Model loss (MSE or Log-Loss).

  • n (int) – Number of observations.

  • p (int) – Number of parameters

Returns:

float – BIC value.

steps.metrics.get_aic(loss: float, n: int, p: int)[source]#

Calcuate AIC score.

Parameters:
  • loss (float) – Model loss (MSE or Log-Loss).

  • n (int) – Number of observations.

  • p (int) – Number of parameters

Returns:

float – AIC value.

steps.mixin module#

Step selection mixin module.

class steps.mixin.StepsMixin[source]#

Bases: object

Step selection mixin that returns regressor/classifier estimator and score func.

This mixin provides an estimator based on target dtype using the get_estimator method and a score func based on target dtype using the get_loss_func func.

static get_estimator(y: ndarray) Any[source]#

Get an estimator for subset/stepwise feature selection.

Parameters:

y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Returns:

Type[LinearRegression, LogisticRegression] – A Scikit-learn estimator.

static get_loss_func(y: ndarray) mean_squared_error | log_loss[source]#

Get a loss function for subset/stepwise feature selection.

Parameters:

y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Returns:

Union[mean_squared_error, log_loss] – A Scikit-learn loss function.

steps.subset module#

Best subsets selection module.

class steps.subset.SubsetSelector(normalize: bool = False, metric: str = 'aic', max_p: int | None = None)[source]#

Bases: BaseEstimator, SelectorMixin, StepsMixin

Class for best subsets feature selection.

Constructor method.

Parameters:
  • normalize (bool) – Whether to normalize data; default = False, assuming object used in pipeline.

  • metric (str) – Optimization metric to use; one of [‘aic’, ‘bic’].

  • max_p (Optional[int]) – Maximum number of parameters to include; default None.

__init__(normalize: bool = False, metric: str = 'aic', max_p: int | None = None)[source]#

Constructor method.

Parameters:
  • normalize (bool) – Whether to normalize data; default = False, assuming object used in pipeline.

  • metric (str) – Optimization metric to use; one of [‘aic’, ‘bic’].

  • max_p (Optional[int]) – Maximum number of parameters to include; default None.

fit(X: ndarray, y: ndarray) SubsetSelector[source]#

Fit a best subset regression.

Parameters:
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Returns:

self (object)

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

static get_estimator(y: ndarray) Any#

Get an estimator for subset/stepwise feature selection.

Parameters:

y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Returns:

Type[LinearRegression, LogisticRegression] – A Scikit-learn estimator.

get_feature_names_out(input_features=None)#

Mask feature names according to selected features.

Parameters:

input_features (array-like of str or None, default=None) – Input features.

  • If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].

  • If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:

feature_names_out (ndarray of str objects) – Transformed feature names.

static get_loss_func(y: ndarray) mean_squared_error | log_loss#

Get a loss function for subset/stepwise feature selection.

Parameters:

y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Returns:

Union[mean_squared_error, log_loss] – A Scikit-learn loss function.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

get_support(indices=False)#

Get a mask, or integer index, of the features selected.

Parameters:

indices (bool, default=False) – If True, the return value will be an array of integers, rather than a boolean mask.

Returns:

support (array) – An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform(X)#

Reverse the transformation operation.

Parameters:

X (array of shape [n_samples, n_selected_features]) – The input samples.

Returns:

X_r (array of shape [n_samples, n_original_features]) – X with columns of zeros inserted where features would have been removed by transform().

set_output(*, transform=None)#

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)#

Reduce X to the selected features.

Parameters:

X (array of shape [n_samples, n_features]) – The input samples.

Returns:

X_r (array of shape [n_samples, n_selected_features]) – The input samples with only the selected features.

Module contents#