Skip to content

Feature selection

GenericUnivariateSelect

Module Sklearn.​Feature_selection.​GenericUnivariateSelect wraps Python class sklearn.feature_selection.GenericUnivariateSelect.

type t

create

constructor and attributes create
val create :
  ?score_func:Py.Object.t ->
  ?mode:[`Percentile | `K_best | `Fpr | `Fdr | `Fwe] ->
  ?param:[`F of float | `Int_depending_on_the_feature_selection_mode of Py.Object.t] ->
  unit ->
  t

Univariate feature selector with configurable strategy.

Read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). For modes 'percentile' or 'kbest' it can return a single array scores.

  • mode : {'percentile', 'k_best', 'fpr', 'fdr', 'fwe'} Feature selection mode.

  • param : float or int depending on the feature selection mode Parameter of the corresponding mode.

Attributes

  • scores_ : array-like of shape (n_features,) Scores of features.

  • pvalues_ : array-like of shape (n_features,) p-values of feature scores, None if score_func returned scores only.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import GenericUnivariateSelect, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> transformer = GenericUnivariateSelect(chi2, mode='k_best', param=20)
>>> X_new = transformer.fit_transform(X, y)
>>> X_new.shape
(569, 20)

See also

  • f_classif: ANOVA F-value between label/feature for classification tasks.

  • mutual_info_classif: Mutual information for a discrete target.

  • chi2: Chi-squared stats of non-negative features for classification tasks.

  • f_regression: F-value between label/feature for regression tasks.

  • mutual_info_regression: Mutual information for a continuous target.

  • SelectPercentile: Select features based on percentile of the highest scores.

  • SelectKBest: Select features based on the k highest scores.

  • SelectFpr: Select features based on a false positive rate test.

  • SelectFdr: Select features based on an estimated false discovery rate.

  • SelectFwe: Select features based on family-wise error rate.

fit

method fit
val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Run score function on (X, y) and get the appropriate features.

Parameters

  • X : array-like of shape (n_samples, n_features) The training input samples.

  • y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).

Returns

  • self : object

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

scores_

attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

pvalues_

attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

RFE

Module Sklearn.​Feature_selection.​RFE wraps Python class sklearn.feature_selection.RFE.

type t

create

constructor and attributes create
val create :
  ?n_features_to_select:int ->
  ?step:[`F of float | `I of int] ->
  ?verbose:int ->
  estimator:[>`BaseEstimator] Np.Obj.t ->
  unit ->
  t

Feature ranking with recursive feature elimination.

Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through a coef_ attribute or through a feature_importances_ attribute. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.

Read more in the :ref:User Guide <rfe>.

Parameters

  • estimator : object A supervised learning estimator with a fit method that provides information about feature importance either through a coef_ attribute or through a feature_importances_ attribute.

  • n_features_to_select : int or None (default=None) The number of features to select. If None, half of the features are selected.

  • step : int or float, optional (default=1) If greater than or equal to 1, then step corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then step corresponds to the percentage (rounded down) of features to remove at each iteration.

  • verbose : int, (default=0) Controls verbosity of output.

Attributes

  • n_features_ : int The number of selected features.

  • support_ : array of shape [n_features] The mask of selected features.

  • ranking_ : array of shape [n_features] The feature ranking, such that ranking_[i] corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1.

  • estimator_ : object The external estimator fit on the reduced dataset.

Examples

The following example shows how to retrieve the 5 most informative features in the Friedman #1 dataset.

>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFE
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel='linear')
>>> selector = RFE(estimator, n_features_to_select=5, step=1)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True,  True,  True,  True,  True, False, False, False, False,
       False])
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

Notes

Allows NaN/Inf in the input if the underlying estimator does as well.

See also

  • RFECV : Recursive feature elimination with built-in cross-validated selection of the best number of features

References

.. [1] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., 'Gene selection for cancer classification using support vector machines', Mach. Learn., 46(1-3), 389--422, 2002.

decision_function

method decision_function
val decision_function :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Compute the decision function of X.

Parameters

  • X : {array-like or sparse matrix} of shape (n_samples, n_features) The input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.

Returns

  • score : array, shape = [n_samples, n_classes] or [n_samples] The decision function of the input samples. The order of the classes corresponds to that in the attribute :term:classes_. Regression and binary classification produce an array of shape [n_samples].

fit

method fit
val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Fit the RFE model and then the underlying estimator on the selected features.

Parameters

  • X : {array-like, sparse matrix} of shape (n_samples, n_features) The training input samples.

  • y : array-like of shape (n_samples,) The target values.

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

predict

method predict
val predict :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features and then predict using the underlying estimator.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • y : array of shape [n_samples] The predicted target values.

predict_log_proba

method predict_log_proba
val predict_log_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Predict class log-probabilities for X.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • p : array of shape (n_samples, n_classes) The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:classes_.

predict_proba

method predict_proba
val predict_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Predict class probabilities for X.

Parameters

  • X : {array-like or sparse matrix} of shape (n_samples, n_features) The input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.

Returns

  • p : array of shape (n_samples, n_classes) The class probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:classes_.

score

method score
val score :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  Py.Object.t

Reduce X to the selected features and then return the score of the underlying estimator.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

  • y : array of shape [n_samples] The target values.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

n_features_

attribute n_features_
val n_features_ : t -> int
val n_features_opt : t -> (int) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

support_

attribute support_
val support_ : t -> [>`ArrayLike] Np.Obj.t
val support_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

ranking_

attribute ranking_
val ranking_ : t -> [>`ArrayLike] Np.Obj.t
val ranking_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

estimator_

attribute estimator_
val estimator_ : t -> Py.Object.t
val estimator_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

RFECV

Module Sklearn.​Feature_selection.​RFECV wraps Python class sklearn.feature_selection.RFECV.

type t

create

constructor and attributes create
val create :
  ?step:[`F of float | `I of int] ->
  ?min_features_to_select:int ->
  ?cv:[`BaseCrossValidator of [>`BaseCrossValidator] Np.Obj.t | `I of int | `Arr of [>`ArrayLike] Np.Obj.t] ->
  ?scoring:[`Score of [`Explained_variance | `R2 | `Max_error | `Neg_median_absolute_error | `Neg_mean_absolute_error | `Neg_mean_squared_error | `Neg_mean_squared_log_error | `Neg_root_mean_squared_error | `Neg_mean_poisson_deviance | `Neg_mean_gamma_deviance | `Accuracy | `Roc_auc | `Roc_auc_ovr | `Roc_auc_ovo | `Roc_auc_ovr_weighted | `Roc_auc_ovo_weighted | `Balanced_accuracy | `Average_precision | `Neg_log_loss | `Neg_brier_score | `Adjusted_rand_score | `Homogeneity_score | `Completeness_score | `V_measure_score | `Mutual_info_score | `Adjusted_mutual_info_score | `Normalized_mutual_info_score | `Fowlkes_mallows_score | `Precision | `Precision_macro | `Precision_micro | `Precision_samples | `Precision_weighted | `Recall | `Recall_macro | `Recall_micro | `Recall_samples | `Recall_weighted | `F1 | `F1_macro | `F1_micro | `F1_samples | `F1_weighted | `Jaccard | `Jaccard_macro | `Jaccard_micro | `Jaccard_samples | `Jaccard_weighted] | `Callable of Py.Object.t] ->
  ?verbose:int ->
  ?n_jobs:int ->
  estimator:[>`BaseEstimator] Np.Obj.t ->
  unit ->
  t

Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.

See glossary entry for :term:cross-validation estimator.

Read more in the :ref:User Guide <rfe>.

Parameters

  • estimator : object A supervised learning estimator with a fit method that provides information about feature importance either through a coef_ attribute or through a feature_importances_ attribute.

  • step : int or float, optional (default=1) If greater than or equal to 1, then step corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then step corresponds to the percentage (rounded down) of features to remove at each iteration. Note that the last iteration may remove fewer than step features in order to reach min_features_to_select.

  • min_features_to_select : int, (default=1) The minimum number of features to be selected. This number of features will always be scored, even if the difference between the original feature count and min_features_to_select isn't divisible by step.

    .. versionadded:: 0.20

  • cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are:

    • None, to use the default 5-fold cross-validation,
    • integer, to specify the number of folds.
    • :term:CV splitter,
    • An iterable yielding (train, test) splits as arrays of indices.

    For integer/None inputs, if y is binary or multiclass, :class:sklearn.model_selection.StratifiedKFold is used. If the estimator is a classifier or if y is neither binary nor multiclass, :class:sklearn.model_selection.KFold is used.

  • Refer :ref:User Guide <cross_validation> for the various cross-validation strategies that can be used here.

    .. versionchanged:: 0.22 cv default value of None changed from 3-fold to 5-fold.

  • scoring : string, callable or None, optional, (default=None) A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).

  • verbose : int, (default=0) Controls verbosity of output.

  • n_jobs : int or None, optional (default=None) Number of cores to run in parallel while fitting across folds. None means 1 unless in a :obj:joblib.parallel_backend context. -1 means using all processors. See :term:Glossary <n_jobs> for more details.

    .. versionadded:: 0.18

Attributes

  • n_features_ : int The number of selected features with cross-validation.

  • support_ : array of shape [n_features] The mask of selected features.

  • ranking_ : array of shape [n_features] The feature ranking, such that ranking_[i] corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1.

  • grid_scores_ : array of shape [n_subsets_of_features] The cross-validation scores such that grid_scores_[i] corresponds to the CV score of the i-th subset of features.

  • estimator_ : object The external estimator fit on the reduced dataset.

Notes

The size of grid_scores_ is equal to ceil((n_features - min_features_to_select) / step) + 1, where step is the number of features removed at each iteration.

Allows NaN/Inf in the input if the underlying estimator does as well.

Examples

The following example shows how to retrieve the a-priori not known 5 informative features in the Friedman #1 dataset.

>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel='linear')
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True,  True,  True,  True,  True, False, False, False, False,
       False])
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

See also

  • RFE : Recursive feature elimination

References

.. [1] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., 'Gene selection for cancer classification using support vector machines', Mach. Learn., 46(1-3), 389--422, 2002.

decision_function

method decision_function
val decision_function :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Compute the decision function of X.

Parameters

  • X : {array-like or sparse matrix} of shape (n_samples, n_features) The input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.

Returns

  • score : array, shape = [n_samples, n_classes] or [n_samples] The decision function of the input samples. The order of the classes corresponds to that in the attribute :term:classes_. Regression and binary classification produce an array of shape [n_samples].

fit

method fit
val fit :
  ?groups:[>`ArrayLike] Np.Obj.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Fit the RFE model and automatically tune the number of selected features.

Parameters

  • X : {array-like, sparse matrix} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the total number of features.

  • y : array-like of shape (n_samples,) Target values (integers for classification, real numbers for regression).

  • groups : array-like of shape (n_samples,) or None Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a 'Group' :term:cv instance (e.g., :class:~sklearn.model_selection.GroupKFold).

    .. versionadded:: 0.20

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

predict

method predict
val predict :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features and then predict using the underlying estimator.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • y : array of shape [n_samples] The predicted target values.

predict_log_proba

method predict_log_proba
val predict_log_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Predict class log-probabilities for X.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • p : array of shape (n_samples, n_classes) The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:classes_.

predict_proba

method predict_proba
val predict_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Predict class probabilities for X.

Parameters

  • X : {array-like or sparse matrix} of shape (n_samples, n_features) The input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.

Returns

  • p : array of shape (n_samples, n_classes) The class probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:classes_.

score

method score
val score :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  Py.Object.t

Reduce X to the selected features and then return the score of the underlying estimator.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

  • y : array of shape [n_samples] The target values.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

n_features_

attribute n_features_
val n_features_ : t -> int
val n_features_opt : t -> (int) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

support_

attribute support_
val support_ : t -> [>`ArrayLike] Np.Obj.t
val support_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

ranking_

attribute ranking_
val ranking_ : t -> [>`ArrayLike] Np.Obj.t
val ranking_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

grid_scores_

attribute grid_scores_
val grid_scores_ : t -> [>`ArrayLike] Np.Obj.t
val grid_scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

estimator_

attribute estimator_
val estimator_ : t -> Py.Object.t
val estimator_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

SelectFdr

Module Sklearn.​Feature_selection.​SelectFdr wraps Python class sklearn.feature_selection.SelectFdr.

type t

create

constructor and attributes create
val create :
  ?score_func:Py.Object.t ->
  ?alpha:float ->
  unit ->
  t
  • Filter: Select the p-values for an estimated false discovery rate

This uses the Benjamini-Hochberg procedure. alpha is an upper bound on the expected false discovery rate.

Read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below 'See also'). The default function only works with classification tasks.

  • alpha : float, optional The highest uncorrected p-value for features to keep.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import SelectFdr, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> X_new = SelectFdr(chi2, alpha=0.01).fit_transform(X, y)
>>> X_new.shape
(569, 16)

Attributes

  • scores_ : array-like of shape (n_features,) Scores of features.

  • pvalues_ : array-like of shape (n_features,) p-values of feature scores.

References

  • https://en.wikipedia.org/wiki/False_discovery_rate

See also

  • f_classif: ANOVA F-value between label/feature for classification tasks.

  • mutual_info_classif: Mutual information for a discrete target.

  • chi2: Chi-squared stats of non-negative features for classification tasks.

  • f_regression: F-value between label/feature for regression tasks.

  • mutual_info_regression: Mutual information for a contnuous target.

  • SelectPercentile: Select features based on percentile of the highest scores.

  • SelectKBest: Select features based on the k highest scores.

  • SelectFpr: Select features based on a false positive rate test.

  • SelectFwe: Select features based on family-wise error rate.

  • GenericUnivariateSelect: Univariate feature selector with configurable mode.

fit

method fit
val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Run score function on (X, y) and get the appropriate features.

Parameters

  • X : array-like of shape (n_samples, n_features) The training input samples.

  • y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).

Returns

  • self : object

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

scores_

attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

pvalues_

attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

SelectFpr

Module Sklearn.​Feature_selection.​SelectFpr wraps Python class sklearn.feature_selection.SelectFpr.

type t

create

constructor and attributes create
val create :
  ?score_func:Py.Object.t ->
  ?alpha:float ->
  unit ->
  t
  • Filter: Select the pvalues below alpha based on a FPR test.

FPR test stands for False Positive Rate test. It controls the total amount of false detections.

Read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below 'See also'). The default function only works with classification tasks.

  • alpha : float, optional The highest p-value for features to be kept.

Attributes

  • scores_ : array-like of shape (n_features,) Scores of features.

  • pvalues_ : array-like of shape (n_features,) p-values of feature scores.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import SelectFpr, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> X_new = SelectFpr(chi2, alpha=0.01).fit_transform(X, y)
>>> X_new.shape
(569, 16)

See also

  • f_classif: ANOVA F-value between label/feature for classification tasks.

  • chi2: Chi-squared stats of non-negative features for classification tasks. mutual_info_classif:

  • f_regression: F-value between label/feature for regression tasks.

  • mutual_info_regression: Mutual information between features and the target.

  • SelectPercentile: Select features based on percentile of the highest scores.

  • SelectKBest: Select features based on the k highest scores.

  • SelectFdr: Select features based on an estimated false discovery rate.

  • SelectFwe: Select features based on family-wise error rate.

  • GenericUnivariateSelect: Univariate feature selector with configurable mode.

fit

method fit
val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Run score function on (X, y) and get the appropriate features.

Parameters

  • X : array-like of shape (n_samples, n_features) The training input samples.

  • y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).

Returns

  • self : object

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

scores_

attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

pvalues_

attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

SelectFromModel

Module Sklearn.​Feature_selection.​SelectFromModel wraps Python class sklearn.feature_selection.SelectFromModel.

type t

create

constructor and attributes create
val create :
  ?threshold:[`F of float | `S of string] ->
  ?prefit:bool ->
  ?norm_order:Py.Object.t ->
  ?max_features:int ->
  estimator:[>`BaseEstimator] Np.Obj.t ->
  unit ->
  t

Meta-transformer for selecting features based on importance weights.

.. versionadded:: 0.17

Parameters

  • estimator : object The base estimator from which the transformer is built. This can be both a fitted (if prefit is set to True) or a non-fitted estimator. The estimator must have either a feature_importances_ or coef_ attribute after fitting.

  • threshold : string, float, optional default None The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If 'median' (resp. 'mean'), then the threshold value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., '1.25*mean') may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, 'mean' is used by default.

  • prefit : bool, default False Whether a prefit model is expected to be passed into the constructor directly or not. If True, transform must be called directly and SelectFromModel cannot be used with cross_val_score, GridSearchCV and similar utilities that clone the estimator. Otherwise train the model using fit and then transform to do feature selection.

  • norm_order : non-zero int, inf, -inf, default 1 Order of the norm used to filter the vectors of coefficients below threshold in the case where the coef_ attribute of the estimator is of dimension 2.

  • max_features : int or None, optional The maximum number of features to select. To only select based on max_features, set threshold=-np.inf.

    .. versionadded:: 0.20

Attributes

  • estimator_ : an estimator The base estimator from which the transformer is built. This is stored only when a non-fitted estimator is passed to the SelectFromModel, i.e when prefit is False.

  • threshold_ : float The threshold value used for feature selection.

Notes

Allows NaN/Inf in the input if the underlying estimator does as well.

Examples

>>> from sklearn.feature_selection import SelectFromModel
>>> from sklearn.linear_model import LogisticRegression
>>> X = [[ 0.87, -1.34,  0.31 ],
...      [-2.79, -0.02, -0.85 ],
...      [-1.34, -0.48, -2.55 ],
...      [ 1.92,  1.48,  0.65 ]]
>>> y = [0, 1, 0, 1]
>>> selector = SelectFromModel(estimator=LogisticRegression()).fit(X, y)
>>> selector.estimator_.coef_
array([[-0.3252302 ,  0.83462377,  0.49750423]])
>>> selector.threshold_
0.55245...
>>> selector.get_support()
array([False,  True, False])
>>> selector.transform(X)
array([[-1.34],
       [-0.02],
       [-0.48],
       [ 1.48]])

fit

method fit
val fit :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Fit the SelectFromModel meta-transformer.

Parameters

  • X : array-like of shape (n_samples, n_features) The training input samples.

  • y : array-like, shape (n_samples,) The target values (integers that correspond to classes in classification, real numbers in regression).

  • **fit_params : Other estimator specific parameters

Returns

  • self : object

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

partial_fit

method partial_fit
val partial_fit :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Fit the SelectFromModel meta-transformer only once.

Parameters

  • X : array-like of shape (n_samples, n_features) The training input samples.

  • y : array-like, shape (n_samples,) The target values (integers that correspond to classes in classification, real numbers in regression).

  • **fit_params : Other estimator specific parameters

Returns

  • self : object

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

estimator_

attribute estimator_
val estimator_ : t -> Py.Object.t
val estimator_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

threshold_

attribute threshold_
val threshold_ : t -> float
val threshold_opt : t -> (float) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

SelectFwe

Module Sklearn.​Feature_selection.​SelectFwe wraps Python class sklearn.feature_selection.SelectFwe.

type t

create

constructor and attributes create
val create :
  ?score_func:Py.Object.t ->
  ?alpha:float ->
  unit ->
  t
  • Filter: Select the p-values corresponding to Family-wise error rate

Read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below 'See also'). The default function only works with classification tasks.

  • alpha : float, optional The highest uncorrected p-value for features to keep.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import SelectFwe, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> X_new = SelectFwe(chi2, alpha=0.01).fit_transform(X, y)
>>> X_new.shape
(569, 15)

Attributes

  • scores_ : array-like of shape (n_features,) Scores of features.

  • pvalues_ : array-like of shape (n_features,) p-values of feature scores.

See also

  • f_classif: ANOVA F-value between label/feature for classification tasks.

  • chi2: Chi-squared stats of non-negative features for classification tasks.

  • f_regression: F-value between label/feature for regression tasks.

  • SelectPercentile: Select features based on percentile of the highest scores.

  • SelectKBest: Select features based on the k highest scores.

  • SelectFpr: Select features based on a false positive rate test.

  • SelectFdr: Select features based on an estimated false discovery rate.

  • GenericUnivariateSelect: Univariate feature selector with configurable mode.

fit

method fit
val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Run score function on (X, y) and get the appropriate features.

Parameters

  • X : array-like of shape (n_samples, n_features) The training input samples.

  • y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).

Returns

  • self : object

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

scores_

attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

pvalues_

attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

SelectKBest

Module Sklearn.​Feature_selection.​SelectKBest wraps Python class sklearn.feature_selection.SelectKBest.

type t

create

constructor and attributes create
val create :
  ?score_func:Py.Object.t ->
  ?k:[`All | `I of int] ->
  unit ->
  t

Select features according to the k highest scores.

Read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below 'See also'). The default function only works with classification tasks.

    .. versionadded:: 0.18

  • k : int or 'all', optional, default=10 Number of top features to select. The 'all' option bypasses selection, for use in a parameter search.

Attributes

  • scores_ : array-like of shape (n_features,) Scores of features.

  • pvalues_ : array-like of shape (n_features,) p-values of feature scores, None if score_func returned only scores.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.feature_selection import SelectKBest, chi2
>>> X, y = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> X_new = SelectKBest(chi2, k=20).fit_transform(X, y)
>>> X_new.shape
(1797, 20)

Notes

Ties between features with equal scores will be broken in an unspecified way.

See also

  • f_classif: ANOVA F-value between label/feature for classification tasks.

  • mutual_info_classif: Mutual information for a discrete target.

  • chi2: Chi-squared stats of non-negative features for classification tasks.

  • f_regression: F-value between label/feature for regression tasks.

  • mutual_info_regression: Mutual information for a continuous target.

  • SelectPercentile: Select features based on percentile of the highest scores.

  • SelectFpr: Select features based on a false positive rate test.

  • SelectFdr: Select features based on an estimated false discovery rate.

  • SelectFwe: Select features based on family-wise error rate.

  • GenericUnivariateSelect: Univariate feature selector with configurable mode.

fit

method fit
val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Run score function on (X, y) and get the appropriate features.

Parameters

  • X : array-like of shape (n_samples, n_features) The training input samples.

  • y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).

Returns

  • self : object

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

scores_

attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

pvalues_

attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

SelectPercentile

Module Sklearn.​Feature_selection.​SelectPercentile wraps Python class sklearn.feature_selection.SelectPercentile.

type t

create

constructor and attributes create
val create :
  ?score_func:Py.Object.t ->
  ?percentile:int ->
  unit ->
  t

Select features according to a percentile of the highest scores.

Read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below 'See also'). The default function only works with classification tasks.

    .. versionadded:: 0.18

  • percentile : int, optional, default=10 Percent of features to keep.

Attributes

  • scores_ : array-like of shape (n_features,) Scores of features.

  • pvalues_ : array-like of shape (n_features,) p-values of feature scores, None if score_func returned only scores.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.feature_selection import SelectPercentile, chi2
>>> X, y = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> X_new = SelectPercentile(chi2, percentile=10).fit_transform(X, y)
>>> X_new.shape
(1797, 7)

Notes

Ties between features with equal scores will be broken in an unspecified way.

See also

  • f_classif: ANOVA F-value between label/feature for classification tasks.

  • mutual_info_classif: Mutual information for a discrete target.

  • chi2: Chi-squared stats of non-negative features for classification tasks.

  • f_regression: F-value between label/feature for regression tasks.

  • mutual_info_regression: Mutual information for a continuous target.

  • SelectKBest: Select features based on the k highest scores.

  • SelectFpr: Select features based on a false positive rate test.

  • SelectFdr: Select features based on an estimated false discovery rate.

  • SelectFwe: Select features based on family-wise error rate.

  • GenericUnivariateSelect: Univariate feature selector with configurable mode.

fit

method fit
val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Run score function on (X, y) and get the appropriate features.

Parameters

  • X : array-like of shape (n_samples, n_features) The training input samples.

  • y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).

Returns

  • self : object

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

scores_

attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

pvalues_

attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

SelectorMixin

Module Sklearn.​Feature_selection.​SelectorMixin wraps Python class sklearn.feature_selection.SelectorMixin.

type t

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

VarianceThreshold

Module Sklearn.​Feature_selection.​VarianceThreshold wraps Python class sklearn.feature_selection.VarianceThreshold.

type t

create

constructor and attributes create
val create :
  ?threshold:float ->
  unit ->
  t

Feature selector that removes all low-variance features.

This feature selection algorithm looks only at the features (X), not the desired outputs (y), and can thus be used for unsupervised learning.

Read more in the :ref:User Guide <variance_threshold>.

Parameters

  • threshold : float, optional Features with a training-set variance lower than this threshold will be removed. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples.

Attributes

  • variances_ : array, shape (n_features,) Variances of individual features.

Notes

Allows NaN in the input.

Examples

The following dataset has integer features, two of which are the same in every sample. These are removed with the default setting for threshold::

>>> X = [[0, 2, 0, 3], [0, 1, 4, 3], [0, 1, 1, 3]]
>>> selector = VarianceThreshold()
>>> selector.fit_transform(X)
array([[2, 0],
       [1, 4],
       [1, 1]])

fit

method fit
val fit :
  ?y:Py.Object.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Learn empirical variances from X.

Parameters

  • X : {array-like, sparse matrix}, shape (n_samples, n_features) Sample vectors from which to compute variances.

  • y : any Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

Returns

self

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

get_support

method get_support
val get_support :
  ?indices:bool ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Get a mask, or integer index, of the features selected

Parameters

  • indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns

  • support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform

method inverse_transform
val inverse_transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reverse the transformation operation

Parameters

  • X : array of shape [n_samples, n_selected_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by :meth:transform.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Reduce X to the selected features.

Parameters

  • X : array of shape [n_samples, n_features] The input samples.

Returns

  • X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

variances_

attribute variances_
val variances_ : t -> [>`ArrayLike] Np.Obj.t
val variances_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

chi2

function chi2
val chi2 :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  unit ->
  ([>`ArrayLike] Np.Obj.t * [>`ArrayLike] Np.Obj.t)

Compute chi-squared stats between each non-negative feature and class.

This score can be used to select the n_features features with the highest values for the test chi-squared statistic from X, which must contain only non-negative features such as booleans or frequencies (e.g., term counts in document classification), relative to the classes.

Recall that the chi-square test measures dependence between stochastic variables, so using this function 'weeds out' the features that are the most likely to be independent of class and therefore irrelevant for classification.

Read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • X : {array-like, sparse matrix} of shape (n_samples, n_features) Sample vectors.

  • y : array-like of shape (n_samples,) Target vector (class labels).

Returns

  • chi2 : array, shape = (n_features,) chi2 statistics of each feature.

  • pval : array, shape = (n_features,) p-values of each feature.

Notes

Complexity of this algorithm is O(n_classes * n_features).

See also

  • f_classif: ANOVA F-value between label/feature for classification tasks.

  • f_regression: F-value between label/feature for regression tasks.

f_classif

function f_classif
val f_classif :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  unit ->
  ([>`ArrayLike] Np.Obj.t * [>`ArrayLike] Np.Obj.t)

Compute the ANOVA F-value for the provided sample.

Read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • X : {array-like, sparse matrix} shape = [n_samples, n_features] The set of regressors that will be tested sequentially.

  • y : array of shape(n_samples) The data matrix.

Returns

  • F : array, shape = [n_features,] The set of F values.

  • pval : array, shape = [n_features,] The set of p-values.

See also

  • chi2: Chi-squared stats of non-negative features for classification tasks.

  • f_regression: F-value between label/feature for regression tasks.

f_oneway

function f_oneway
val f_oneway :
  Py.Object.t list ->
  Py.Object.t

Performs a 1-way ANOVA.

The one-way ANOVA tests the null hypothesis that 2 or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

Read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • *args : array_like, sparse matrices sample1, sample2... The sample measurements should be given as arguments.

Returns

  • F-value : float The computed F-value of the test.

  • p-value : float The associated p-value from the F-distribution.

Notes

The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid.

  1. The samples are independent
  2. Each sample is from a normally distributed population
  3. The population standard deviations of the groups are all equal. This property is known as homoscedasticity.

If these assumptions are not true for a given set of data, it may still be possible to use the Kruskal-Wallis H-test (scipy.stats.kruskal_) although with some loss of power.

The algorithm is from Heiman[2], pp.394-7.

See scipy.stats.f_oneway that should give the same results while being less efficient.

References

.. [1] Lowry, Richard. 'Concepts and Applications of Inferential Statistics'. Chapter 14.

  • http://faculty.vassar.edu/lowry/ch14pt1.html

.. [2] Heiman, G.W. Research Methods in Statistics. 2002.

f_regression

function f_regression
val f_regression :
  ?center:[`True | `Bool of bool] ->
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  unit ->
  ([>`ArrayLike] Np.Obj.t * [>`ArrayLike] Np.Obj.t)

Univariate linear regression tests.

Linear model for testing the individual effect of each of many regressors. This is a scoring function to be used in a feature selection procedure, not a free standing feature selection procedure.

This is done in 2 steps:

  1. The correlation between each regressor and the target is computed, that is, ((X[:, i] - mean(X[:, i])) * (y - mean_y)) / (std(X[:, i]) * std(y)).
  2. It is converted to an F score then to a p-value.

For more on usage see the :ref:User Guide <univariate_feature_selection>.

Parameters

  • X : {array-like, sparse matrix} shape = (n_samples, n_features) The set of regressors that will be tested sequentially.

  • y : array of shape(n_samples). The data matrix

  • center : True, bool, If true, X and y will be centered.

Returns

  • F : array, shape=(n_features,) F values of features.

  • pval : array, shape=(n_features,) p-values of F-scores.

See also

  • mutual_info_regression: Mutual information for a continuous target.

  • f_classif: ANOVA F-value between label/feature for classification tasks.

  • chi2: Chi-squared stats of non-negative features for classification tasks.

  • SelectKBest: Select features based on the k highest scores.

  • SelectFpr: Select features based on a false positive rate test.

  • SelectFdr: Select features based on an estimated false discovery rate.

  • SelectFwe: Select features based on family-wise error rate.

  • SelectPercentile: Select features based on percentile of the highest scores.

mutual_info_classif

function mutual_info_classif
val mutual_info_classif :
  ?discrete_features:[`Arr of [>`ArrayLike] Np.Obj.t | `Auto | `Bool of bool] ->
  ?n_neighbors:int ->
  ?copy:bool ->
  ?random_state:int ->
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  unit ->
  [>`ArrayLike] Np.Obj.t

Estimate mutual information for a discrete target variable.

Mutual information (MI) [1]_ between two random variables is a non-negative value, which measures the dependency between the variables. It is equal to zero if and only if two random variables are independent, and higher values mean higher dependency.

The function relies on nonparametric methods based on entropy estimation from k-nearest neighbors distances as described in [2] and [3]. Both methods are based on the idea originally proposed in [4]_.

It can be used for univariate features selection, read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • X : array_like or sparse matrix, shape (n_samples, n_features) Feature matrix.

  • y : array_like, shape (n_samples,) Target vector.

  • discrete_features : {'auto', bool, array_like}, default 'auto' If bool, then determines whether to consider all features discrete or continuous. If array, then it should be either a boolean mask with shape (n_features,) or array with indices of discrete features. If 'auto', it is assigned to False for dense X and to True for sparse X.

  • n_neighbors : int, default 3 Number of neighbors to use for MI estimation for continuous variables, see [2] and [3]. Higher values reduce variance of the estimation, but could introduce a bias.

  • copy : bool, default True Whether to make a copy of the given data. If set to False, the initial data will be overwritten.

  • random_state : int, RandomState instance or None, optional, default None Determines random number generation for adding small noise to continuous variables in order to remove repeated values. Pass an int for reproducible results across multiple function calls.

  • See :term:Glossary <random_state>.

Returns

  • mi : ndarray, shape (n_features,) Estimated mutual information between each feature and the target.

Notes

  1. The term 'discrete features' is used instead of naming them 'categorical', because it describes the essence more accurately. For example, pixel intensities of an image are discrete features (but hardly categorical) and you will get better results if mark them as such. Also note, that treating a continuous variable as discrete and vice versa will usually give incorrect results, so be attentive about that.
  2. True mutual information can't be negative. If its estimate turns out to be negative, it is replaced by zero.

References

.. [1] Mutual Information <https://en.wikipedia.org/wiki/Mutual_information>_ on Wikipedia. .. [2] A. Kraskov, H. Stogbauer and P. Grassberger, 'Estimating mutual information'. Phys. Rev. E 69, 2004. .. [3] B. C. Ross 'Mutual Information between Discrete and Continuous Data Sets'. PLoS ONE 9(2), 2014. .. [4] L. F. Kozachenko, N. N. Leonenko, 'Sample Estimate of the Entropy of a Random Vector:, Probl. Peredachi Inf., 23:2 (1987), 9-16

mutual_info_regression

function mutual_info_regression
val mutual_info_regression :
  ?discrete_features:[`Arr of [>`ArrayLike] Np.Obj.t | `Auto | `Bool of bool] ->
  ?n_neighbors:int ->
  ?copy:bool ->
  ?random_state:int ->
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  unit ->
  [>`ArrayLike] Np.Obj.t

Estimate mutual information for a continuous target variable.

Mutual information (MI) [1]_ between two random variables is a non-negative value, which measures the dependency between the variables. It is equal to zero if and only if two random variables are independent, and higher values mean higher dependency.

The function relies on nonparametric methods based on entropy estimation from k-nearest neighbors distances as described in [2] and [3]. Both methods are based on the idea originally proposed in [4]_.

It can be used for univariate features selection, read more in the :ref:User Guide <univariate_feature_selection>.

Parameters

  • X : array_like or sparse matrix, shape (n_samples, n_features) Feature matrix.

  • y : array_like, shape (n_samples,) Target vector.

  • discrete_features : {'auto', bool, array_like}, default 'auto' If bool, then determines whether to consider all features discrete or continuous. If array, then it should be either a boolean mask with shape (n_features,) or array with indices of discrete features. If 'auto', it is assigned to False for dense X and to True for sparse X.

  • n_neighbors : int, default 3 Number of neighbors to use for MI estimation for continuous variables, see [2] and [3]. Higher values reduce variance of the estimation, but could introduce a bias.

  • copy : bool, default True Whether to make a copy of the given data. If set to False, the initial data will be overwritten.

  • random_state : int, RandomState instance or None, optional, default None Determines random number generation for adding small noise to continuous variables in order to remove repeated values. Pass an int for reproducible results across multiple function calls.

  • See :term:Glossary <random_state>.

Returns

  • mi : ndarray, shape (n_features,) Estimated mutual information between each feature and the target.

Notes

  1. The term 'discrete features' is used instead of naming them 'categorical', because it describes the essence more accurately. For example, pixel intensities of an image are discrete features (but hardly categorical) and you will get better results if mark them as such. Also note, that treating a continuous variable as discrete and vice versa will usually give incorrect results, so be attentive about that.
  2. True mutual information can't be negative. If its estimate turns out to be negative, it is replaced by zero.

References

.. [1] Mutual Information <https://en.wikipedia.org/wiki/Mutual_information>_ on Wikipedia. .. [2] A. Kraskov, H. Stogbauer and P. Grassberger, 'Estimating mutual information'. Phys. Rev. E 69, 2004. .. [3] B. C. Ross 'Mutual Information between Discrete and Continuous Data Sets'. PLoS ONE 9(2), 2014. .. [4] L. F. Kozachenko, N. N. Leonenko, 'Sample Estimate of the Entropy of a Random Vector', Probl. Peredachi Inf., 23:2 (1987), 9-16