tsfeast package¶
Submodules¶
tsfeast.funcs module¶
Time series feature generator functions.
- tsfeast.funcs.get_busdays_in_month(dt: pandas._libs.tslibs.timestamps.Timestamp) int [source]¶
Get the number of business days in a month period, using US holidays.
- Parameters
dt (pd.Timestamp) – Desired month.
- Returns
int – Number of business days in the month.
- tsfeast.funcs.get_datetime_features(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series], date_col: Optional[str] = None, dt_format: Optional[str] = None, freq: Optional[str] = None) pandas.core.frame.DataFrame [source]¶
Get features based on datetime index, including year, month, week, weekday, quarter, days in month, business days in month and leap year.
- Parameters
data (pd.DataFrame, pd.Series) – Original data.
date_col (Optional[str]) – Column name containing date/timestamp.
dt_format (Optional[str]) – Date/timestamp format, e.g. %Y-%m-%d for 2020-01-31.
freq (Optional[str]) – Date frequency.
- Returns
pd.DataFrame – Date features.
- tsfeast.funcs.get_lag_features(data: pandas.core.frame.DataFrame, n_lags: int) pandas.core.frame.DataFrame [source]¶
Get n-lagged features for data.
- Parameters
data (pd.DataFrame) – Original data.
n_lags (int) – Number of lags to generate.
- Returns
pd.DataFrame – Lagged values of specified dataset.
- tsfeast.funcs.get_rolling_features(data: pandas.core.frame.DataFrame, window_lengths: List[int]) pandas.core.frame.DataFrame [source]¶
Get rolling metrics (mean, std, min, max) for each specified window length.
- Parameters
data (pd.DataFrame) – Original data.
window_lengths (List[int]) – List of window lengths to generate.
- Returns
pd.DataFrame – Rolling mean, std, min and max for each specified window length.
- tsfeast.funcs.get_ewma_features(data: pandas.core.frame.DataFrame, window_lengths: List[int]) pandas.core.frame.DataFrame [source]¶
Get an exponentially-weighted moving average for each specified window length.
- Parameters
data (pd.DataFrame) – Original data.
window_lengths (List[int]) – List of window lengths to generate.
- Returns
pd.DataFrame – Exponentially-weighted moving average for each specified window length.
- tsfeast.funcs.get_change_features(data: pandas.core.frame.DataFrame, period_lengths: List[int]) pandas.core.frame.DataFrame [source]¶
Get percent change for all features for each specified period length.
- Parameters
data (pd.DataFrame) – Original data.
period_lengths (List[int]) – A list of period lengths to generate.
- Returns
pd.DataFrame – Percent changes for all features.
- tsfeast.funcs.get_difference_features(data: pandas.core.frame.DataFrame, n_diffs: int) pandas.core.frame.DataFrame [source]¶
Get n differences for all features.
- Parameters
data (pd.DataFrame) – Original data.
n_diffs (int) – Number of differences to return.
- Returns
pd.DataFrame – N-differenced data.
tsfeast.metrics module¶
Custom scoring metrics.
tsfeast.models module¶
Module for Scikit-Learn Regressor with ARMA Residuals and Scikit-Learn API wrapper for Statsmodels TSA models.
- class tsfeast.models.ARMARegressor(estimator: sklearn.linear_model._base.LinearModel = LinearRegression(), order: Tuple[int, int, int] = (1, 0, 0))[source]¶
Bases:
tsfeast._base.BaseContainer
Estimator for Scikit-Learn estimator with ARMA residuals.
- Parameters
estimator (LinearModel) – Scikit-Learn linear estimator.
order (Tuple[int, int, int]) – ARIMA order for residuals.
- estimator¶
The Scikit-Learn regressor.
- Type
LinearModel
- order¶
The (p,d,q,) order of the ARMA model.
- Type
Tuple[int, int, int]
- intercept_¶
The fitted estimator’s intercept.
- Type
float
- coef_¶
The fitted estimator’s coefficients.
- Type
np.ndarray
- arma_¶
The fitted ARMA model.
- Type
ARIMA
- fitted_values_¶
The combined estimator and ARMA fitted values.
- Type
np.ndarray
- resid_¶
The combined estimator and ARMA residual values.
- Type
np.ndarray
Instantiate ARMARegressor object.
- Parameters
estimator (LinearRegression) – Scikit-Learn linear estimator.
order (Tuple[int, int, int]) – ARIMA order for residuals.
- __init__(estimator: sklearn.linear_model._base.LinearModel = LinearRegression(), order: Tuple[int, int, int] = (1, 0, 0))[source]¶
Instantiate ARMARegressor object.
- Parameters
estimator (LinearRegression) – Scikit-Learn linear estimator.
order (Tuple[int, int, int]) – ARIMA order for residuals.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]) tsfeast._base.BaseContainer ¶
Fit the estimator.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Returns
BaseContainer – Self.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- predict(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Make predictions with fitted estimator.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
- Returns
np.ndarray – Array of predicted values.
- score(X, y, sample_weight=None)¶
Return the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - \frac{u}{v}), where u is the residual sum of squares
((y_true - y_pred) ** 2).sum()
and v is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.- Parameters
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns
score (float) – R^2 of
self.predict(X)
wrt. y.
Notes
The R^2 score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- class tsfeast.models.TSARegressor(model: statsmodels.base.model.Model, use_exog: bool = False, **kwargs)[source]¶
Bases:
tsfeast._base.BaseContainer
Estimator for StatsModels TSA model.
- Parameters
model (Model) – An uninstantiated Statsmodels TSA model.
use_exog (bool) – Whether to use exogenous features; default False.
kwargs – Additional kwargs for Statsmodels model.
- fitted_model_¶
The fitted Statmodels model object.
- Type
Model
- summary_¶
The fitted Statmodels model summary results.
Instantiate TSARegressor object.
- model: Model
An uninstantiated Statsmodels TSA model.
- use_exog: bool
Whether to use exogenous features; default False.
- kwargs:
Additional kwargs for Statsmodels model.
- __init__(model: statsmodels.base.model.Model, use_exog: bool = False, **kwargs)[source]¶
Instantiate TSARegressor object.
- model: Model
An uninstantiated Statsmodels TSA model.
- use_exog: bool
Whether to use exogenous features; default False.
- kwargs:
Additional kwargs for Statsmodels model.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]) tsfeast._base.BaseContainer ¶
Fit the estimator.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Returns
BaseContainer – Self.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- predict(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Make predictions with fitted estimator.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
- Returns
np.ndarray – Array of predicted values.
- score(X, y, sample_weight=None)¶
Return the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - \frac{u}{v}), where u is the residual sum of squares
((y_true - y_pred) ** 2).sum()
and v is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.- Parameters
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns
score (float) – R^2 of
self.predict(X)
wrt. y.
Notes
The R^2 score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
tsfeast.splitter module¶
Time Series Windows Module.
Note These classes split data into n, equal-length, sliding training/test windows. This differs from Scikit-Learn’s TimeSeriesSplit implementation where windows are accumulated:
Scikit-Learn¶
Win 0 |----| Win 1 |--------| Win2 |------------|
TimeSeriesWindows¶
Win 0 |----|——– Win 1 -|----|——- Win2 –|----|——
- class tsfeast.splitter.TimeSeriesWindows(train_length: int, test_length: int, gap_length: int = 0)[source]¶
Bases:
object
tsfeast.transformers module¶
Time series feature generators as Scikit-Learn compatible transformers.
- class tsfeast.transformers.BaseTransformer(fillna: bool = True)[source]¶
Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Base transformer object.
Instantiate transformer object.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] [source]¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer [source]¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- class tsfeast.transformers.OriginalFeatures(fillna: bool = True)[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Return original features.
Instantiate transformer object.
- __init__(fillna: bool = True)¶
Instantiate transformer object.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer ¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
- class tsfeast.transformers.Scaler[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Wrap StandardScaler to maintain column names.
Instantiate transformer object.
- fit(X: pandas.core.frame.DataFrame, y=None) tsfeast.transformers.Scaler [source]¶
Fit transformer object to data.
- Parameters
X (pd.DataFrame) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Transformed features.
- transform(X: pandas.core.frame.DataFrame, y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] [source]¶
Fit transformer object to data.
- Parameters
X (pd.DataFrame) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Transformed features.
- inverse_transform(X: pandas.core.frame.DataFrame, copy: bool = True) pandas.core.frame.DataFrame [source]¶
Transform scaled data into original feature space.
- Parameters
X (pd.DataFrame) – The input samples.
copy (bool) – Default True; if False, try to avoid a copy and do inplace scaling instead.
- Returns
Data – Data in original feature space.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- class tsfeast.transformers.DateTimeFeatures(date_col: Optional[str] = None, dt_format: Optional[str] = None, freq: Optional[str] = None)[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Generate datetime features.
Instantiate transformer object.
- date_col: Optional[str]
Column name containing date/timestamp.
- dt_format: Optional[str]
Date/timestamp format, e.g. %Y-%m-%d for 2020-01-31.
- __init__(date_col: Optional[str] = None, dt_format: Optional[str] = None, freq: Optional[str] = None)[source]¶
Instantiate transformer object.
- date_col: Optional[str]
Column name containing date/timestamp.
- dt_format: Optional[str]
Date/timestamp format, e.g. %Y-%m-%d for 2020-01-31.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.DateTimeFeatures [source]¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
- class tsfeast.transformers.LagFeatures(n_lags: int, fillna: bool = True)[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Generate lag features.
Instantiate transformer object.
- Parameters
n_lags (int) – Number of lags to generate.
- __init__(n_lags: int, fillna: bool = True)[source]¶
Instantiate transformer object.
- Parameters
n_lags (int) – Number of lags to generate.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer ¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
- class tsfeast.transformers.RollingFeatures(window_lengths: List[int], fillna: bool = True)[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Generate rolling features.
Instantiate transformer object.
- Parameters
window_lengths (L:ist[int]) – Length of window(s) to create.
- __init__(window_lengths: List[int], fillna: bool = True)[source]¶
Instantiate transformer object.
- Parameters
window_lengths (L:ist[int]) – Length of window(s) to create.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer ¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
- class tsfeast.transformers.EwmaFeatures(window_lengths: List[int], fillna: bool = True)[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Generate exponentially-weighted moving-average features.
Instantiate transformer object.
- Parameters
window_lengths (L:ist[int]) – Length of window(s) to create.
- __init__(window_lengths: List[int], fillna: bool = True)[source]¶
Instantiate transformer object.
- Parameters
window_lengths (L:ist[int]) – Length of window(s) to create.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer ¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
- class tsfeast.transformers.ChangeFeatures(period_lengths: List[int], fillna: bool = True)[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Generate period change features.
Instantiate transformer object.
- Parameters
period_lengths (List[int]) – Length of period[s] to generate change features.
- __init__(period_lengths: List[int], fillna: bool = True)[source]¶
Instantiate transformer object.
- Parameters
period_lengths (List[int]) – Length of period[s] to generate change features.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer ¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
- class tsfeast.transformers.DifferenceFeatures(n_diffs: int, fillna: bool = True)[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Generate difference features.
Instantiate transformer object.
- Parameters
n_diffs (int) – Number of differences to calculate.
- __init__(n_diffs: int, fillna: bool = True)[source]¶
Instantiate transformer object.
- Parameters
n_diffs (int) – Number of differences to calculate.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer ¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
- class tsfeast.transformers.PolyFeatures(degree=2)[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Generate polynomial features.
Instantiate transformer object.
- Parameters
degree (int) – Degree of polynomial to use.
- __init__(degree=2)[source]¶
Instantiate transformer object.
- Parameters
degree (int) – Degree of polynomial to use.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer ¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
- class tsfeast.transformers.InteractionFeatures(fillna: bool = True)[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Wrap PolynomialFeatures to extract interactions and keep column names.
Instantiate transformer object.
- __init__(fillna: bool = True)¶
Instantiate transformer object.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer ¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
tsfeast.tsfeatures module¶
Time series features module.
- class tsfeast.tsfeatures.TimeSeriesFeatures(datetime: str, trend: str = 'n', lags: Optional[int] = None, rolling: Optional[List[int]] = None, ewma: Optional[List[int]] = None, pct_chg: Optional[List[int]] = None, diffs: Optional[int] = None, polynomial: Optional[int] = None, interactions: bool = True, fillna: bool = True)[source]¶
Bases:
tsfeast.transformers.BaseTransformer
Generate multiple time series feature in one transformer.
Instanatiate transformer object.
- __init__(datetime: str, trend: str = 'n', lags: Optional[int] = None, rolling: Optional[List[int]] = None, ewma: Optional[List[int]] = None, pct_chg: Optional[List[int]] = None, diffs: Optional[int] = None, polynomial: Optional[int] = None, interactions: bool = True, fillna: bool = True)[source]¶
Instanatiate transformer object.
- fit(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) tsfeast.transformers.BaseTransformer ¶
Fit transformer object to data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
BaseTransformer – Self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
- get_feature_names() List[str] ¶
Get list of feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params (dict) – Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self (estimator instance) – Estimator instance.
- transform(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], y=None) Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray] ¶
Transform fitted data.
- Parameters
X (array of shape [n_samples, n_features]) – The input samples.
y (None) – Not used; included for compatibility, only.
- Returns
Data – Array-like object of transformed data.
Notes
Scikit-Learn Pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this module take a set of features and generate new features; there’s no inherent method to transform some timeseries features given a fitted estimator.
For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros.
We append new X to our original features and transform on entire dataset, keeping only the last n rows. Appropriate for time series transformations, only.
tsfeast.utils module¶
Miscellaneous utility functions.
- tsfeast.utils.to_list(x: Union[int, List]) List[int] [source]¶
Ensure parameter is list of integer(s).
- tsfeast.utils.array_to_dataframe(x: numpy.ndarray) pandas.core.frame.DataFrame [source]¶
Convert Numpy array to Pandas DataFrame with default column names.
- tsfeast.utils.array_to_series(x: numpy.ndarray) pandas.core.series.Series [source]¶
Convert Numpy array to Pandas Series with default name.
- tsfeast.utils.plot_diag(residuals: Optional[Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]] = None, estimator: Optional[sklearn.linear_model._base.LinearModel] = None, X: Optional[Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]] = None, y: Optional[Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]] = None)[source]¶
Plot regression diagnostics.
Generate residuals plot, QQ plot, ACF plot and PACF plot, given either an array-like object of residuals or and estimator and X and y data arrays.
- Parameters
residuals (Data) – Model residual errors.
estimator (LinearModel) – Scikit-Learn generalized linear model.
X (array of shape [n_samples, n_features]) – The input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- Raises
ValueError –
If neither residuals or estimator provider. - If estimator provided without X and y data.