wetsuit.models module

Models module.

class wetsuit.models.BaseContainer(estimator: H2OEstimator, features: List[Union[str, int]], response: Union[str, int])[source]

Bases: BaseEstimator

Base container class.

Instantiate estimator.

Parameters:
  • estimator (H2OEstimator) – An instantiated H2OEstimator.

  • features (List[Union[str, int]]) – A list of column names or indices indicating the predictor variables.

  • response (Union[str, int]) – A column name or index indicating the response variable.

__init__(estimator: H2OEstimator, features: List[Union[str, int]], response: Union[str, int])[source]

Instantiate estimator.

Parameters:
  • estimator (H2OEstimator) – An instantiated H2OEstimator.

  • features (List[Union[str, int]]) – A list of column names or indices indicating the predictor variables.

  • response (Union[str, int]) – A column name or index indicating the response variable.

fit(X, y) BaseContainer[source]

Fit the estimator.

Parameters:
  • X (Array-like of shape [n_samples, n_features]) – The input samples.

  • y (Array-like of shape (n_samples,) or (n_samples, n_outputs)) – Target values (None for unsupervised transformations).

Returns:

BaseContainer – Self.

Notes

Conversion to H2OFrame is handled in the .fit() method. Conversion to DataFrame is handled in the .predict() method.

predict(X) DataFrame[source]

Make predictions with fitted estimator.

Parameters:

X (array of shape [n_samples, n_features]) – The input samples.

Returns:

np.ndarray – Array of predicted values.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

class wetsuit.models.WetsuitRegressor(estimator: H2OEstimator, features: List[Union[str, int]], response: Union[str, int])[source]

Bases: BaseContainer, RegressorMixin

Scikit-Learn wrapper for H2O Regressors.

Instantiate estimator.

Parameters:
  • estimator (H2OEstimator) – An instantiated H2OEstimator.

  • features (List[Union[str, int]]) – A list of column names or indices indicating the predictor variables.

  • response (Union[str, int]) – A column name or index indicating the response variable.

__init__(estimator: H2OEstimator, features: List[Union[str, int]], response: Union[str, int])

Instantiate estimator.

Parameters:
  • estimator (H2OEstimator) – An instantiated H2OEstimator.

  • features (List[Union[str, int]]) – A list of column names or indices indicating the predictor variables.

  • response (Union[str, int]) – A column name or index indicating the response variable.

fit(X, y) BaseContainer

Fit the estimator.

Parameters:
  • X (Array-like of shape [n_samples, n_features]) – The input samples.

  • y (Array-like of shape (n_samples,) or (n_samples, n_outputs)) – Target values (None for unsupervised transformations).

Returns:

BaseContainer – Self.

Notes

Conversion to H2OFrame is handled in the .fit() method. Conversion to DataFrame is handled in the .predict() method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

predict(X) DataFrame

Make predictions with fitted estimator.

Parameters:

X (array of shape [n_samples, n_features]) – The input samples.

Returns:

np.ndarray – Array of predicted values.

score(X, y, sample_weight=None)

Return the coefficient of determination of the prediction.

The coefficient of determination R^2 is defined as (1 - \frac{u}{v}), where u is the residual sum of squares ((y_true - y_pred)** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score (float) – R^2 of self.predict(X) wrt. y.

Notes

The R^2 score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

class wetsuit.models.WetsuitClassifier(estimator: H2OEstimator, features: List[Union[str, int]], response: Union[str, int])[source]

Bases: BaseContainer, ClassifierMixin

Scikit-Learn wrapper for H2O Classifiers.

Instantiate estimator.

Parameters:
  • estimator (H2OEstimator) – An instantiated H2OEstimator.

  • features (List[Union[str, int]]) – A list of column names or indices indicating the predictor variables.

  • response (Union[str, int]) – A column name or index indicating the response variable.

__init__(estimator: H2OEstimator, features: List[Union[str, int]], response: Union[str, int])

Instantiate estimator.

Parameters:
  • estimator (H2OEstimator) – An instantiated H2OEstimator.

  • features (List[Union[str, int]]) – A list of column names or indices indicating the predictor variables.

  • response (Union[str, int]) – A column name or index indicating the response variable.

fit(X, y) BaseContainer

Fit the estimator.

Parameters:
  • X (Array-like of shape [n_samples, n_features]) – The input samples.

  • y (Array-like of shape (n_samples,) or (n_samples, n_outputs)) – Target values (None for unsupervised transformations).

Returns:

BaseContainer – Self.

Notes

Conversion to H2OFrame is handled in the .fit() method. Conversion to DataFrame is handled in the .predict() method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

predict(X) DataFrame

Make predictions with fitted estimator.

Parameters:

X (array of shape [n_samples, n_features]) – The input samples.

Returns:

np.ndarray – Array of predicted values.

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score (float) – Mean accuracy of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.