tsfeast package

Submodules

tsfeast.funcs module

Time series feature generator functions.

tsfeast.funcs.get_busdays_in_month(dt: pandas._libs.tslibs.timestamps.Timestamp) int[source]

Get the number of business days in a month period, using US holidays.

Parameters

dt (pd.Timestamp) – Desired month.

Returns

int – Number of business days in the month.

tsfeast.funcs.get_datetime_features(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series], date_col: Optional[str] = None, dt_format: Optional[str] = None, freq: Optional[str] = None) pandas.core.frame.DataFrame[source]

Get features based on datetime index, including year, month, week, weekday, quarter, days in month, business days in month and leap year.

Parameters
  • data (pd.DataFrame, pd.Series) – Original data.

  • date_col (Optional[str]) – Column name containing date/timestamp.

  • dt_format (Optional[str]) – Date/timestamp format, e.g. %Y-%m-%d for 2020-01-31.

  • freq (Optional[str]) – Date frequency.

Returns

pd.DataFrame – Date features.

tsfeast.funcs.get_lag_features(data: pandas.core.frame.DataFrame, n_lags: int) pandas.core.frame.DataFrame[source]

Get n-lagged features for data.

Parameters
  • data (pd.DataFrame) – Original data.

  • n_lags (int) – Number of lags to generate.

Returns

pd.DataFrame – Lagged values of specified dataset.

tsfeast.funcs.get_rolling_features(data: pandas.core.frame.DataFrame, window_lengths: List[int]) pandas.core.frame.DataFrame[source]

Get rolling metrics (mean, std, min, max) for each specified window length.

Parameters
  • data (pd.DataFrame) – Original data.

  • window_lengths (List[int]) – List of window lengths to generate.

Returns

pd.DataFrame – Rolling mean, std, min and max for each specified window length.

tsfeast.funcs.get_ewma_features(data: pandas.core.frame.DataFrame, window_lengths: List[int]) pandas.core.frame.DataFrame[source]

Get an exponentially-weighted moving average for each specified window length.

Parameters
  • data (pd.DataFrame) – Original data.

  • window_lengths (List[int]) – List of window lengths to generate.

Returns

pd.DataFrame – Exponentially-weighted moving average for each specified window length.

tsfeast.funcs.get_change_features(data: pandas.core.frame.DataFrame, period_lengths: List[int]) pandas.core.frame.DataFrame[source]

Get percent change for all features for each specified period length.

Parameters
  • data (pd.DataFrame) – Original data.

  • period_lengths (List[int]) – A list of period lengths to generate.

Returns

pd.DataFrame – Percent changes for all features.

tsfeast.funcs.get_difference_features(data: pandas.core.frame.DataFrame, n_diffs: int) pandas.core.frame.DataFrame[source]

Get n differences for all features.

Parameters
  • data (pd.DataFrame) – Original data.

  • n_diffs (int) – Number of differences to return.

Returns

pd.DataFrame – N-differenced data.

tsfeast.metrics module

Custom scoring metrics.

tsfeast.metrics.bic_score(mse: float, n: int, p: int)[source]

Calcuate BIC score.

Parameters
  • mse (float) – Mean-squared error.

  • n (int) – Number of observations.

  • p (int) – Number of parameters

Returns

float – BIC value.

tsfeast.metrics.bic_scorer(estimator: tsfeast._base.BaseContainer, X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray])[source]

Score SciKit-Learn estimator using BIC.

tsfeast.models module

Module for Scikit-Learn Regressor with ARMA Residuals and Scikit-Learn API wrapper for Statsmodels TSA models.

class tsfeast.models.ARMARegressor(estimator: sklearn.linear_model._base.LinearModel = LinearRegression(), order: Tuple[int, int, int] = (1, 0, 0))[source]

Bases: tsfeast._base.BaseContainer

Estimator for Scikit-Learn estimator with ARMA residuals.

Parameters
  • estimator (LinearModel) – Scikit-Learn linear estimator.

  • order (Tuple[int, int, int]) – ARIMA order for residuals.

estimator

The Scikit-Learn regressor.

Type

LinearModel

order

The (p,d,q,) order of the ARMA model.

Type

Tuple[int, int, int]

intercept_

The fitted estimator’s intercept.

Type

float

coef_

The fitted estimator’s coefficients.

Type

np.ndarray

arma_

The fitted ARMA model.

Type

ARIMA

fitted_values_

The combined estimator and ARMA fitted values.

Type

np.ndarray

resid_

The combined estimator and ARMA residual values.

Type

np.ndarray

Instantiate ARMARegressor object.

Parameters
  • estimator (LinearRegression) – Scikit-Learn linear estimator.

  • order (Tuple[int, int, int]) – ARIMA order for residuals.

__init__(estimator: sklearn.linear_model._base.LinearModel = LinearRegression(), order: Tuple[int, int, int] = (1, 0, 0))[source]

Instantiate ARMARegressor object.

Parameters
  • estimator (LinearRegression) – Scikit-Learn linear estimator.

  • order (Tuple[int, int, int]) – ARIMA order for residuals.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]) tsfeast._base.BaseContainer

Fit the estimator.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Returns

BaseContainer – Self.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

predict(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Make predictions with fitted estimator.

Parameters

X (array of shape [n_samples, n_features]) – The input samples.

Returns

np.ndarray – Array of predicted values.

score(X, y, sample_weight=None)

Return the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - \frac{u}{v}), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns

score (float) – R^2 of self.predict(X) wrt. y.

Notes

The R^2 score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

class tsfeast.models.TSARegressor(model: statsmodels.base.model.Model, use_exog: bool = False, **kwargs)[source]

Bases: tsfeast._base.BaseContainer

Estimator for StatsModels TSA model.

Parameters
  • model (Model) – An uninstantiated Statsmodels TSA model.

  • use_exog (bool) – Whether to use exogenous features; default False.

  • kwargs – Additional kwargs for Statsmodels model.

fitted_model_

The fitted Statmodels model object.

Type

Model

summary_

The fitted Statmodels model summary results.

Instantiate TSARegressor object.

model: Model

An uninstantiated Statsmodels TSA model.

use_exog: bool

Whether to use exogenous features; default False.

kwargs:

Additional kwargs for Statsmodels model.

__init__(model: statsmodels.base.model.Model, use_exog: bool = False, **kwargs)[source]

Instantiate TSARegressor object.

model: Model

An uninstantiated Statsmodels TSA model.

use_exog: bool

Whether to use exogenous features; default False.

kwargs:

Additional kwargs for Statsmodels model.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]) tsfeast._base.BaseContainer

Fit the estimator.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Returns

BaseContainer – Self.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

predict(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Make predictions with fitted estimator.

Parameters

X (array of shape [n_samples, n_features]) – The input samples.

Returns

np.ndarray – Array of predicted values.

score(X, y, sample_weight=None)

Return the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - \frac{u}{v}), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns

score (float) – R^2 of self.predict(X) wrt. y.

Notes

The R^2 score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

tsfeast.splitter module

Time Series Windows Module.

Note These classes split data into n, equal-length, sliding training/test windows. This differs from Scikit-Learn’s TimeSeriesSplit implementation where windows are accumulated:

Scikit-Learn

Win 0 |----| Win 1 |--------| Win2 |------------|

TimeSeriesWindows

Win 0 |----|——– Win 1 -|----|——- Win2 –|----|——

class tsfeast.splitter.TimeSeriesWindows(train_length: int, test_length: int, gap_length: int = 0)[source]

Bases: object

split(y: pandas.core.frame.DataFrame, x: pandas.core.frame.DataFrame) List[pandas.core.frame.DataFrame][source]
class tsfeast.splitter.EndogSeriesWindows(min_train_length: int, test_length: int, max_train_length: Optional[int] = None, gap_length: int = 0)[source]

Bases: tsfeast.splitter.TimeSeriesWindows

split(y: pandas.core.frame.DataFrame, x=None) List[pandas.core.frame.DataFrame][source]

tsfeast.transformers module

Time series feature generators as Scikit-Learn compatible transformers.

class tsfeast.transformers.BaseTransformer(fillna: bool = True)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Base transformer object.

Instantiate transformer object.

__init__(fillna: bool = True)[source]

Instantiate transformer object.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray][source]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

get_feature_names() List[str][source]

Get list of feature names.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer[source]

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

class tsfeast.transformers.OriginalFeatures(fillna: bool = True)[source]

Bases: tsfeast.transformers.BaseTransformer

Return original features.

Instantiate transformer object.

__init__(fillna: bool = True)

Instantiate transformer object.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

class tsfeast.transformers.Scaler[source]

Bases: tsfeast.transformers.BaseTransformer

Wrap StandardScaler to maintain column names.

Instantiate transformer object.

__init__()[source]

Instantiate transformer object.

fit(X: pandas.core.frame.DataFrame, y=None) tsfeast.transformers.Scaler[source]

Fit transformer object to data.

Parameters
  • X (pd.DataFrame) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Transformed features.

transform(X: pandas.core.frame.DataFrame, y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray][source]

Fit transformer object to data.

Parameters
  • X (pd.DataFrame) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Transformed features.

inverse_transform(X: pandas.core.frame.DataFrame, copy: bool = True) pandas.core.frame.DataFrame[source]

Transform scaled data into original feature space.

Parameters
  • X (pd.DataFrame) – The input samples.

  • copy (bool) – Default True; if False, try to avoid a copy and do inplace scaling instead.

Returns

Data – Data in original feature space.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

class tsfeast.transformers.DateTimeFeatures(date_col: Optional[str] = None, dt_format: Optional[str] = None, freq: Optional[str] = None)[source]

Bases: tsfeast.transformers.BaseTransformer

Generate datetime features.

Instantiate transformer object.

date_col: Optional[str]

Column name containing date/timestamp.

dt_format: Optional[str]

Date/timestamp format, e.g. %Y-%m-%d for 2020-01-31.

__init__(date_col: Optional[str] = None, dt_format: Optional[str] = None, freq: Optional[str] = None)[source]

Instantiate transformer object.

date_col: Optional[str]

Column name containing date/timestamp.

dt_format: Optional[str]

Date/timestamp format, e.g. %Y-%m-%d for 2020-01-31.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.DateTimeFeatures[source]

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

class tsfeast.transformers.LagFeatures(n_lags: int, fillna: bool = True)[source]

Bases: tsfeast.transformers.BaseTransformer

Generate lag features.

Instantiate transformer object.

Parameters

n_lags (int) – Number of lags to generate.

__init__(n_lags: int, fillna: bool = True)[source]

Instantiate transformer object.

Parameters

n_lags (int) – Number of lags to generate.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

class tsfeast.transformers.RollingFeatures(window_lengths: List[int], fillna: bool = True)[source]

Bases: tsfeast.transformers.BaseTransformer

Generate rolling features.

Instantiate transformer object.

Parameters

window_lengths (L:ist[int]) – Length of window(s) to create.

__init__(window_lengths: List[int], fillna: bool = True)[source]

Instantiate transformer object.

Parameters

window_lengths (L:ist[int]) – Length of window(s) to create.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

class tsfeast.transformers.EwmaFeatures(window_lengths: List[int], fillna: bool = True)[source]

Bases: tsfeast.transformers.BaseTransformer

Generate exponentially-weighted moving-average features.

Instantiate transformer object.

Parameters

window_lengths (L:ist[int]) – Length of window(s) to create.

__init__(window_lengths: List[int], fillna: bool = True)[source]

Instantiate transformer object.

Parameters

window_lengths (L:ist[int]) – Length of window(s) to create.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

class tsfeast.transformers.ChangeFeatures(period_lengths: List[int], fillna: bool = True)[source]

Bases: tsfeast.transformers.BaseTransformer

Generate period change features.

Instantiate transformer object.

Parameters

period_lengths (List[int]) – Length of period[s] to generate change features.

__init__(period_lengths: List[int], fillna: bool = True)[source]

Instantiate transformer object.

Parameters

period_lengths (List[int]) – Length of period[s] to generate change features.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

class tsfeast.transformers.DifferenceFeatures(n_diffs: int, fillna: bool = True)[source]

Bases: tsfeast.transformers.BaseTransformer

Generate difference features.

Instantiate transformer object.

Parameters

n_diffs (int) – Number of differences to calculate.

__init__(n_diffs: int, fillna: bool = True)[source]

Instantiate transformer object.

Parameters

n_diffs (int) – Number of differences to calculate.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

class tsfeast.transformers.PolyFeatures(degree=2)[source]

Bases: tsfeast.transformers.BaseTransformer

Generate polynomial features.

Instantiate transformer object.

Parameters

degree (int) – Degree of polynomial to use.

__init__(degree=2)[source]

Instantiate transformer object.

Parameters

degree (int) – Degree of polynomial to use.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

class tsfeast.transformers.InteractionFeatures(fillna: bool = True)[source]

Bases: tsfeast.transformers.BaseTransformer

Wrap PolynomialFeatures to extract interactions and keep column names.

Instantiate transformer object.

__init__(fillna: bool = True)

Instantiate transformer object.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

tsfeast.tsfeatures module

Time series features module.

class tsfeast.tsfeatures.TimeSeriesFeatures(datetime: str, trend: str = 'n', lags: Optional[int] = None, rolling: Optional[List[int]] = None, ewma: Optional[List[int]] = None, pct_chg: Optional[List[int]] = None, diffs: Optional[int] = None, polynomial: Optional[int] = None, interactions: bool = True, fillna: bool = True)[source]

Bases: tsfeast.transformers.BaseTransformer

Generate multiple time series feature in one transformer.

Instanatiate transformer object.

__init__(datetime: str, trend: str = 'n', lags: Optional[int] = None, rolling: Optional[List[int]] = None, ewma: Optional[List[int]] = None, pct_chg: Optional[List[int]] = None, diffs: Optional[int] = None, polynomial: Optional[int] = None, interactions: bool = True, fillna: bool = True)[source]

Instanatiate transformer object.

fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer

Fit transformer object to data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

BaseTransformer – Self.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names() List[str]

Get list of feature names.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self (estimator instance) – Estimator instance.

transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]

Transform fitted data.

Parameters
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (None) – Not used; included for compatibility, only.

Returns

Data – Array-like object of transformed data.

Notes

Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.

We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.

tsfeast.utils module

Miscellaneous utility functions.

tsfeast.utils.to_list(x: Union[int, List]) List[int][source]

Ensure parameter is list of integer(s).

tsfeast.utils.array_to_dataframe(x: numpy.ndarray) pandas.core.frame.DataFrame[source]

Convert Numpy array to Pandas DataFrame with default column names.

tsfeast.utils.array_to_series(x: numpy.ndarray) pandas.core.series.Series[source]

Convert Numpy array to Pandas Series with default name.

tsfeast.utils.plot_diag(residuals: Optional[Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]] = None, estimator: Optional[sklearn.linear_model._base.LinearModel] = None, X: Optional[Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]] = None, y: Optional[Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]] = None)[source]

Plot regression diagnostics.

Generate residuals plot, QQ plot, ACF plot and PACF plot, given either an array-like object of residuals or and estimator and X and y data arrays.

Parameters
  • residuals (Data) – Model residual errors.

  • estimator (LinearModel) – Scikit-Learn generalized linear model.

  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

Raises

ValueError

  • If neither residuals or estimator provider. - If estimator provided without X and y data.

Module contents