steps package#
Submodules#
steps.forward module#
Forward step selection module.
- class steps.forward.ForwardSelector(normalize: bool = False, metric: str = 'aic')[source]#
Bases:
BaseEstimator
,SelectorMixin
,StepsMixin
Class for forward stepwise feature selection.
Constructor method.
- Parameters:
normalize (bool) – Whether to normalize data; default = False, assuming object used in pipeline.
metric (str) – Optimization metric to use; one of [‘aic’, ‘bic’].
- __init__(normalize: bool = False, metric: str = 'aic')[source]#
Constructor method.
- Parameters:
normalize (bool) – Whether to normalize data; default = False, assuming object used in pipeline.
metric (str) – Optimization metric to use; one of [‘aic’, ‘bic’].
- fit(X: ndarray, y: ndarray) ForwardSelector [source]#
Fit a best subset regression.
- Parameters:
X (array of shape [n_samples, n_features]) – The input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Returns:
self (object)
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- static get_estimator(y: ndarray) Any #
Get an estimator for subset/stepwise feature selection.
- Parameters:
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Returns:
Type[LinearRegression, LogisticRegression] – A Scikit-learn estimator.
- get_feature_names_out(input_features=None)#
Mask feature names according to selected features.
- Parameters:
input_features (array-like of str or None, default=None) – Input features.
If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.
- Returns:
feature_names_out (ndarray of str objects) – Transformed feature names.
- static get_loss_func(y: ndarray) mean_squared_error | log_loss #
Get a loss function for subset/stepwise feature selection.
- Parameters:
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Returns:
Union[mean_squared_error, log_loss] – A Scikit-learn loss function.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params (dict) – Parameter names mapped to their values.
- get_support(indices=False)#
Get a mask, or integer index, of the features selected.
- Parameters:
indices (bool, default=False) – If True, the return value will be an array of integers, rather than a boolean mask.
- Returns:
support (array) – An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
- inverse_transform(X)#
Reverse the transformation operation.
- Parameters:
X (array of shape [n_samples, n_selected_features]) – The input samples.
- Returns:
X_r (array of shape [n_samples, n_original_features]) – X with columns of zeros inserted where features would have been removed by
transform()
.
- set_output(*, transform=None)#
Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:
self (estimator instance) – Estimator instance.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self (estimator instance) – Estimator instance.
- transform(X)#
Reduce X to the selected features.
- Parameters:
X (array of shape [n_samples, n_features]) – The input samples.
- Returns:
X_r (array of shape [n_samples, n_selected_features]) – The input samples with only the selected features.
steps.metrics module#
Custom metrics module.
steps.mixin module#
Step selection mixin module.
- class steps.mixin.StepsMixin[source]#
Bases:
object
Step selection mixin that returns regressor/classifier estimator and score func.
This mixin provides an estimator based on target dtype using the get_estimator method and a score func based on target dtype using the get_loss_func func.
- static get_estimator(y: ndarray) Any [source]#
Get an estimator for subset/stepwise feature selection.
- Parameters:
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Returns:
Type[LinearRegression, LogisticRegression] – A Scikit-learn estimator.
- static get_loss_func(y: ndarray) mean_squared_error | log_loss [source]#
Get a loss function for subset/stepwise feature selection.
- Parameters:
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Returns:
Union[mean_squared_error, log_loss] – A Scikit-learn loss function.
steps.subset module#
Best subsets selection module.
- class steps.subset.SubsetSelector(normalize: bool = False, metric: str = 'aic', max_p: int | None = None)[source]#
Bases:
BaseEstimator
,SelectorMixin
,StepsMixin
Class for best subsets feature selection.
Constructor method.
- Parameters:
normalize (bool) – Whether to normalize data; default = False, assuming object used in pipeline.
metric (str) – Optimization metric to use; one of [‘aic’, ‘bic’].
max_p (Optional[int]) – Maximum number of parameters to include; default None.
- __init__(normalize: bool = False, metric: str = 'aic', max_p: int | None = None)[source]#
Constructor method.
- Parameters:
normalize (bool) – Whether to normalize data; default = False, assuming object used in pipeline.
metric (str) – Optimization metric to use; one of [‘aic’, ‘bic’].
max_p (Optional[int]) – Maximum number of parameters to include; default None.
- fit(X: ndarray, y: ndarray) SubsetSelector [source]#
Fit a best subset regression.
- Parameters:
X (array of shape [n_samples, n_features]) – The input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Returns:
self (object)
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- static get_estimator(y: ndarray) Any #
Get an estimator for subset/stepwise feature selection.
- Parameters:
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Returns:
Type[LinearRegression, LogisticRegression] – A Scikit-learn estimator.
- get_feature_names_out(input_features=None)#
Mask feature names according to selected features.
- Parameters:
input_features (array-like of str or None, default=None) – Input features.
If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.
- Returns:
feature_names_out (ndarray of str objects) – Transformed feature names.
- static get_loss_func(y: ndarray) mean_squared_error | log_loss #
Get a loss function for subset/stepwise feature selection.
- Parameters:
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Returns:
Union[mean_squared_error, log_loss] – A Scikit-learn loss function.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params (dict) – Parameter names mapped to their values.
- get_support(indices=False)#
Get a mask, or integer index, of the features selected.
- Parameters:
indices (bool, default=False) – If True, the return value will be an array of integers, rather than a boolean mask.
- Returns:
support (array) – An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
- inverse_transform(X)#
Reverse the transformation operation.
- Parameters:
X (array of shape [n_samples, n_selected_features]) – The input samples.
- Returns:
X_r (array of shape [n_samples, n_original_features]) – X with columns of zeros inserted where features would have been removed by
transform()
.
- set_output(*, transform=None)#
Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:
self (estimator instance) – Estimator instance.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self (estimator instance) – Estimator instance.
- transform(X)#
Reduce X to the selected features.
- Parameters:
X (array of shape [n_samples, n_features]) – The input samples.
- Returns:
X_r (array of shape [n_samples, n_selected_features]) – The input samples with only the selected features.