Feature selection
GenericUnivariateSelect¶
Module Sklearn.Feature_selection.GenericUnivariateSelect
wraps Python class sklearn.feature_selection.GenericUnivariateSelect
.
type t
create¶
constructor and attributes create
val create :
?score_func:Py.Object.t ->
?mode:[`Percentile | `K_best | `Fpr | `Fdr | `Fwe] ->
?param:[`F of float | `Int_depending_on_the_feature_selection_mode of Py.Object.t] ->
unit ->
t
Univariate feature selector with configurable strategy.
Read more in the :ref:User Guide <univariate_feature_selection>
.
Parameters
-
score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). For modes 'percentile' or 'kbest' it can return a single array scores.
-
mode : {'percentile', 'k_best', 'fpr', 'fdr', 'fwe'} Feature selection mode.
-
param : float or int depending on the feature selection mode Parameter of the corresponding mode.
Attributes
-
scores_ : array-like of shape (n_features,) Scores of features.
-
pvalues_ : array-like of shape (n_features,) p-values of feature scores, None if
score_func
returned scores only.
Examples
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import GenericUnivariateSelect, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> transformer = GenericUnivariateSelect(chi2, mode='k_best', param=20)
>>> X_new = transformer.fit_transform(X, y)
>>> X_new.shape
(569, 20)
See also
-
f_classif: ANOVA F-value between label/feature for classification tasks.
-
mutual_info_classif: Mutual information for a discrete target.
-
chi2: Chi-squared stats of non-negative features for classification tasks.
-
f_regression: F-value between label/feature for regression tasks.
-
mutual_info_regression: Mutual information for a continuous target.
-
SelectPercentile: Select features based on percentile of the highest scores.
-
SelectKBest: Select features based on the k highest scores.
-
SelectFpr: Select features based on a false positive rate test.
-
SelectFdr: Select features based on an estimated false discovery rate.
-
SelectFwe: Select features based on family-wise error rate.
fit¶
method fit
val fit :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Run score function on (X, y) and get the appropriate features.
Parameters
-
X : array-like of shape (n_samples, n_features) The training input samples.
-
y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).
Returns
- self : object
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
scores_¶
attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
pvalues_¶
attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
RFE¶
Module Sklearn.Feature_selection.RFE
wraps Python class sklearn.feature_selection.RFE
.
type t
create¶
constructor and attributes create
val create :
?n_features_to_select:int ->
?step:[`F of float | `I of int] ->
?verbose:int ->
estimator:[>`BaseEstimator] Np.Obj.t ->
unit ->
t
Feature ranking with recursive feature elimination.
Given an external estimator that assigns weights to features (e.g., the
coefficients of a linear model), the goal of recursive feature elimination
(RFE) is to select features by recursively considering smaller and smaller
sets of features. First, the estimator is trained on the initial set of
features and the importance of each feature is obtained either through a
coef_
attribute or through a feature_importances_
attribute.
Then, the least important features are pruned from current set of features.
That procedure is recursively repeated on the pruned set until the desired
number of features to select is eventually reached.
Read more in the :ref:User Guide <rfe>
.
Parameters
-
estimator : object A supervised learning estimator with a
fit
method that provides information about feature importance either through acoef_
attribute or through afeature_importances_
attribute. -
n_features_to_select : int or None (default=None) The number of features to select. If
None
, half of the features are selected. -
step : int or float, optional (default=1) If greater than or equal to 1, then
step
corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), thenstep
corresponds to the percentage (rounded down) of features to remove at each iteration. -
verbose : int, (default=0) Controls verbosity of output.
Attributes
-
n_features_ : int The number of selected features.
-
support_ : array of shape [n_features] The mask of selected features.
-
ranking_ : array of shape [n_features] The feature ranking, such that
ranking_[i]
corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1. -
estimator_ : object The external estimator fit on the reduced dataset.
Examples
The following example shows how to retrieve the 5 most informative features in the Friedman #1 dataset.
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFE
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel='linear')
>>> selector = RFE(estimator, n_features_to_select=5, step=1)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True, True, True, True, True, False, False, False, False,
False])
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
Notes
Allows NaN/Inf in the input if the underlying estimator does as well.
See also
- RFECV : Recursive feature elimination with built-in cross-validated selection of the best number of features
References
.. [1] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., 'Gene selection for cancer classification using support vector machines', Mach. Learn., 46(1-3), 389--422, 2002.
decision_function¶
method decision_function
val decision_function :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Compute the decision function of X
.
Parameters
- X : {array-like or sparse matrix} of shape (n_samples, n_features)
The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
.
Returns
- score : array, shape = [n_samples, n_classes] or [n_samples]
The decision function of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
. Regression and binary classification produce an array of shape [n_samples].
fit¶
method fit
val fit :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Fit the RFE model and then the underlying estimator on the selected features.
Parameters
-
X : {array-like, sparse matrix} of shape (n_samples, n_features) The training input samples.
-
y : array-like of shape (n_samples,) The target values.
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
predict¶
method predict
val predict :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features and then predict using the underlying estimator.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- y : array of shape [n_samples] The predicted target values.
predict_log_proba¶
method predict_log_proba
val predict_log_proba :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Predict class log-probabilities for X.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- p : array of shape (n_samples, n_classes)
The class log-probabilities of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
.
predict_proba¶
method predict_proba
val predict_proba :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Predict class probabilities for X.
Parameters
- X : {array-like or sparse matrix} of shape (n_samples, n_features)
The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
.
Returns
- p : array of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
.
score¶
method score
val score :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
Py.Object.t
Reduce X to the selected features and then return the score of the underlying estimator.
Parameters
-
X : array of shape [n_samples, n_features] The input samples.
-
y : array of shape [n_samples] The target values.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
n_features_¶
attribute n_features_
val n_features_ : t -> int
val n_features_opt : t -> (int) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
support_¶
attribute support_
val support_ : t -> [>`ArrayLike] Np.Obj.t
val support_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
ranking_¶
attribute ranking_
val ranking_ : t -> [>`ArrayLike] Np.Obj.t
val ranking_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
estimator_¶
attribute estimator_
val estimator_ : t -> Py.Object.t
val estimator_opt : t -> (Py.Object.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
RFECV¶
Module Sklearn.Feature_selection.RFECV
wraps Python class sklearn.feature_selection.RFECV
.
type t
create¶
constructor and attributes create
val create :
?step:[`F of float | `I of int] ->
?min_features_to_select:int ->
?cv:[`BaseCrossValidator of [>`BaseCrossValidator] Np.Obj.t | `I of int | `Arr of [>`ArrayLike] Np.Obj.t] ->
?scoring:[`Score of [`Explained_variance | `R2 | `Max_error | `Neg_median_absolute_error | `Neg_mean_absolute_error | `Neg_mean_squared_error | `Neg_mean_squared_log_error | `Neg_root_mean_squared_error | `Neg_mean_poisson_deviance | `Neg_mean_gamma_deviance | `Accuracy | `Roc_auc | `Roc_auc_ovr | `Roc_auc_ovo | `Roc_auc_ovr_weighted | `Roc_auc_ovo_weighted | `Balanced_accuracy | `Average_precision | `Neg_log_loss | `Neg_brier_score | `Adjusted_rand_score | `Homogeneity_score | `Completeness_score | `V_measure_score | `Mutual_info_score | `Adjusted_mutual_info_score | `Normalized_mutual_info_score | `Fowlkes_mallows_score | `Precision | `Precision_macro | `Precision_micro | `Precision_samples | `Precision_weighted | `Recall | `Recall_macro | `Recall_micro | `Recall_samples | `Recall_weighted | `F1 | `F1_macro | `F1_micro | `F1_samples | `F1_weighted | `Jaccard | `Jaccard_macro | `Jaccard_micro | `Jaccard_samples | `Jaccard_weighted] | `Callable of Py.Object.t] ->
?verbose:int ->
?n_jobs:int ->
estimator:[>`BaseEstimator] Np.Obj.t ->
unit ->
t
Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.
See glossary entry for :term:cross-validation estimator
.
Read more in the :ref:User Guide <rfe>
.
Parameters
-
estimator : object A supervised learning estimator with a
fit
method that provides information about feature importance either through acoef_
attribute or through afeature_importances_
attribute. -
step : int or float, optional (default=1) If greater than or equal to 1, then
step
corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), thenstep
corresponds to the percentage (rounded down) of features to remove at each iteration. Note that the last iteration may remove fewer thanstep
features in order to reachmin_features_to_select
. -
min_features_to_select : int, (default=1) The minimum number of features to be selected. This number of features will always be scored, even if the difference between the original feature count and
min_features_to_select
isn't divisible bystep
... versionadded:: 0.20
-
cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 5-fold cross-validation,
- integer, to specify the number of folds.
- :term:
CV splitter
, - An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if
y
is binary or multiclass, :class:sklearn.model_selection.StratifiedKFold
is used. If the estimator is a classifier or ify
is neither binary nor multiclass, :class:sklearn.model_selection.KFold
is used. -
Refer :ref:
User Guide <cross_validation>
for the various cross-validation strategies that can be used here... versionchanged:: 0.22
cv
default value of None changed from 3-fold to 5-fold. -
scoring : string, callable or None, optional, (default=None) A string (see model evaluation documentation) or a scorer callable object / function with signature
scorer(estimator, X, y)
. -
verbose : int, (default=0) Controls verbosity of output.
-
n_jobs : int or None, optional (default=None) Number of cores to run in parallel while fitting across folds.
None
means 1 unless in a :obj:joblib.parallel_backend
context.-1
means using all processors. See :term:Glossary <n_jobs>
for more details... versionadded:: 0.18
Attributes
-
n_features_ : int The number of selected features with cross-validation.
-
support_ : array of shape [n_features] The mask of selected features.
-
ranking_ : array of shape [n_features] The feature ranking, such that
ranking_[i]
corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1. -
grid_scores_ : array of shape [n_subsets_of_features] The cross-validation scores such that
grid_scores_[i]
corresponds to the CV score of the i-th subset of features. -
estimator_ : object The external estimator fit on the reduced dataset.
Notes
The size of grid_scores_
is equal to
ceil((n_features - min_features_to_select) / step) + 1
,
where step is the number of features removed at each iteration.
Allows NaN/Inf in the input if the underlying estimator does as well.
Examples
The following example shows how to retrieve the a-priori not known 5 informative features in the Friedman #1 dataset.
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel='linear')
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True, True, True, True, True, False, False, False, False,
False])
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
See also
- RFE : Recursive feature elimination
References
.. [1] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., 'Gene selection for cancer classification using support vector machines', Mach. Learn., 46(1-3), 389--422, 2002.
decision_function¶
method decision_function
val decision_function :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Compute the decision function of X
.
Parameters
- X : {array-like or sparse matrix} of shape (n_samples, n_features)
The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
.
Returns
- score : array, shape = [n_samples, n_classes] or [n_samples]
The decision function of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
. Regression and binary classification produce an array of shape [n_samples].
fit¶
method fit
val fit :
?groups:[>`ArrayLike] Np.Obj.t ->
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Fit the RFE model and automatically tune the number of selected features.
Parameters
-
X : {array-like, sparse matrix} of shape (n_samples, n_features) Training vector, where
n_samples
is the number of samples andn_features
is the total number of features. -
y : array-like of shape (n_samples,) Target values (integers for classification, real numbers for regression).
-
groups : array-like of shape (n_samples,) or None Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a 'Group' :term:
cv
instance (e.g., :class:~sklearn.model_selection.GroupKFold
)... versionadded:: 0.20
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
predict¶
method predict
val predict :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features and then predict using the underlying estimator.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- y : array of shape [n_samples] The predicted target values.
predict_log_proba¶
method predict_log_proba
val predict_log_proba :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Predict class log-probabilities for X.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- p : array of shape (n_samples, n_classes)
The class log-probabilities of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
.
predict_proba¶
method predict_proba
val predict_proba :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Predict class probabilities for X.
Parameters
- X : {array-like or sparse matrix} of shape (n_samples, n_features)
The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
.
Returns
- p : array of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
.
score¶
method score
val score :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
Py.Object.t
Reduce X to the selected features and then return the score of the underlying estimator.
Parameters
-
X : array of shape [n_samples, n_features] The input samples.
-
y : array of shape [n_samples] The target values.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
n_features_¶
attribute n_features_
val n_features_ : t -> int
val n_features_opt : t -> (int) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
support_¶
attribute support_
val support_ : t -> [>`ArrayLike] Np.Obj.t
val support_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
ranking_¶
attribute ranking_
val ranking_ : t -> [>`ArrayLike] Np.Obj.t
val ranking_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
grid_scores_¶
attribute grid_scores_
val grid_scores_ : t -> [>`ArrayLike] Np.Obj.t
val grid_scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
estimator_¶
attribute estimator_
val estimator_ : t -> Py.Object.t
val estimator_opt : t -> (Py.Object.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
SelectFdr¶
Module Sklearn.Feature_selection.SelectFdr
wraps Python class sklearn.feature_selection.SelectFdr
.
type t
create¶
constructor and attributes create
val create :
?score_func:Py.Object.t ->
?alpha:float ->
unit ->
t
- Filter: Select the p-values for an estimated false discovery rate
This uses the Benjamini-Hochberg procedure. alpha
is an upper bound
on the expected false discovery rate.
Read more in the :ref:User Guide <univariate_feature_selection>
.
Parameters
-
score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below 'See also'). The default function only works with classification tasks.
-
alpha : float, optional The highest uncorrected p-value for features to keep.
Examples
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import SelectFdr, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> X_new = SelectFdr(chi2, alpha=0.01).fit_transform(X, y)
>>> X_new.shape
(569, 16)
Attributes
-
scores_ : array-like of shape (n_features,) Scores of features.
-
pvalues_ : array-like of shape (n_features,) p-values of feature scores.
References
- https://en.wikipedia.org/wiki/False_discovery_rate
See also
-
f_classif: ANOVA F-value between label/feature for classification tasks.
-
mutual_info_classif: Mutual information for a discrete target.
-
chi2: Chi-squared stats of non-negative features for classification tasks.
-
f_regression: F-value between label/feature for regression tasks.
-
mutual_info_regression: Mutual information for a contnuous target.
-
SelectPercentile: Select features based on percentile of the highest scores.
-
SelectKBest: Select features based on the k highest scores.
-
SelectFpr: Select features based on a false positive rate test.
-
SelectFwe: Select features based on family-wise error rate.
-
GenericUnivariateSelect: Univariate feature selector with configurable mode.
fit¶
method fit
val fit :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Run score function on (X, y) and get the appropriate features.
Parameters
-
X : array-like of shape (n_samples, n_features) The training input samples.
-
y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).
Returns
- self : object
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
scores_¶
attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
pvalues_¶
attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
SelectFpr¶
Module Sklearn.Feature_selection.SelectFpr
wraps Python class sklearn.feature_selection.SelectFpr
.
type t
create¶
constructor and attributes create
val create :
?score_func:Py.Object.t ->
?alpha:float ->
unit ->
t
- Filter: Select the pvalues below alpha based on a FPR test.
FPR test stands for False Positive Rate test. It controls the total amount of false detections.
Read more in the :ref:User Guide <univariate_feature_selection>
.
Parameters
-
score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below 'See also'). The default function only works with classification tasks.
-
alpha : float, optional The highest p-value for features to be kept.
Attributes
-
scores_ : array-like of shape (n_features,) Scores of features.
-
pvalues_ : array-like of shape (n_features,) p-values of feature scores.
Examples
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import SelectFpr, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> X_new = SelectFpr(chi2, alpha=0.01).fit_transform(X, y)
>>> X_new.shape
(569, 16)
See also
-
f_classif: ANOVA F-value between label/feature for classification tasks.
-
chi2: Chi-squared stats of non-negative features for classification tasks. mutual_info_classif:
-
f_regression: F-value between label/feature for regression tasks.
-
mutual_info_regression: Mutual information between features and the target.
-
SelectPercentile: Select features based on percentile of the highest scores.
-
SelectKBest: Select features based on the k highest scores.
-
SelectFdr: Select features based on an estimated false discovery rate.
-
SelectFwe: Select features based on family-wise error rate.
-
GenericUnivariateSelect: Univariate feature selector with configurable mode.
fit¶
method fit
val fit :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Run score function on (X, y) and get the appropriate features.
Parameters
-
X : array-like of shape (n_samples, n_features) The training input samples.
-
y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).
Returns
- self : object
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
scores_¶
attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
pvalues_¶
attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
SelectFromModel¶
Module Sklearn.Feature_selection.SelectFromModel
wraps Python class sklearn.feature_selection.SelectFromModel
.
type t
create¶
constructor and attributes create
val create :
?threshold:[`F of float | `S of string] ->
?prefit:bool ->
?norm_order:Py.Object.t ->
?max_features:int ->
estimator:[>`BaseEstimator] Np.Obj.t ->
unit ->
t
Meta-transformer for selecting features based on importance weights.
.. versionadded:: 0.17
Parameters
-
estimator : object The base estimator from which the transformer is built. This can be both a fitted (if
prefit
is set to True) or a non-fitted estimator. The estimator must have either afeature_importances_
orcoef_
attribute after fitting. -
threshold : string, float, optional default None The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If 'median' (resp. 'mean'), then the
threshold
value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., '1.25*mean') may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, 'mean' is used by default. -
prefit : bool, default False Whether a prefit model is expected to be passed into the constructor directly or not. If True,
transform
must be called directly and SelectFromModel cannot be used withcross_val_score
,GridSearchCV
and similar utilities that clone the estimator. Otherwise train the model usingfit
and thentransform
to do feature selection. -
norm_order : non-zero int, inf, -inf, default 1 Order of the norm used to filter the vectors of coefficients below
threshold
in the case where thecoef_
attribute of the estimator is of dimension 2. -
max_features : int or None, optional The maximum number of features to select. To only select based on
max_features
, setthreshold=-np.inf
... versionadded:: 0.20
Attributes
-
estimator_ : an estimator The base estimator from which the transformer is built. This is stored only when a non-fitted estimator is passed to the
SelectFromModel
, i.e when prefit is False. -
threshold_ : float The threshold value used for feature selection.
Notes
Allows NaN/Inf in the input if the underlying estimator does as well.
Examples
>>> from sklearn.feature_selection import SelectFromModel
>>> from sklearn.linear_model import LogisticRegression
>>> X = [[ 0.87, -1.34, 0.31 ],
... [-2.79, -0.02, -0.85 ],
... [-1.34, -0.48, -2.55 ],
... [ 1.92, 1.48, 0.65 ]]
>>> y = [0, 1, 0, 1]
>>> selector = SelectFromModel(estimator=LogisticRegression()).fit(X, y)
>>> selector.estimator_.coef_
array([[-0.3252302 , 0.83462377, 0.49750423]])
>>> selector.threshold_
0.55245...
>>> selector.get_support()
array([False, True, False])
>>> selector.transform(X)
array([[-1.34],
[-0.02],
[-0.48],
[ 1.48]])
fit¶
method fit
val fit :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Fit the SelectFromModel meta-transformer.
Parameters
-
X : array-like of shape (n_samples, n_features) The training input samples.
-
y : array-like, shape (n_samples,) The target values (integers that correspond to classes in classification, real numbers in regression).
-
**fit_params : Other estimator specific parameters
Returns
- self : object
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
partial_fit¶
method partial_fit
val partial_fit :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Fit the SelectFromModel meta-transformer only once.
Parameters
-
X : array-like of shape (n_samples, n_features) The training input samples.
-
y : array-like, shape (n_samples,) The target values (integers that correspond to classes in classification, real numbers in regression).
-
**fit_params : Other estimator specific parameters
Returns
- self : object
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
estimator_¶
attribute estimator_
val estimator_ : t -> Py.Object.t
val estimator_opt : t -> (Py.Object.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
threshold_¶
attribute threshold_
val threshold_ : t -> float
val threshold_opt : t -> (float) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
SelectFwe¶
Module Sklearn.Feature_selection.SelectFwe
wraps Python class sklearn.feature_selection.SelectFwe
.
type t
create¶
constructor and attributes create
val create :
?score_func:Py.Object.t ->
?alpha:float ->
unit ->
t
- Filter: Select the p-values corresponding to Family-wise error rate
Read more in the :ref:User Guide <univariate_feature_selection>
.
Parameters
-
score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below 'See also'). The default function only works with classification tasks.
-
alpha : float, optional The highest uncorrected p-value for features to keep.
Examples
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import SelectFwe, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> X_new = SelectFwe(chi2, alpha=0.01).fit_transform(X, y)
>>> X_new.shape
(569, 15)
Attributes
-
scores_ : array-like of shape (n_features,) Scores of features.
-
pvalues_ : array-like of shape (n_features,) p-values of feature scores.
See also
-
f_classif: ANOVA F-value between label/feature for classification tasks.
-
chi2: Chi-squared stats of non-negative features for classification tasks.
-
f_regression: F-value between label/feature for regression tasks.
-
SelectPercentile: Select features based on percentile of the highest scores.
-
SelectKBest: Select features based on the k highest scores.
-
SelectFpr: Select features based on a false positive rate test.
-
SelectFdr: Select features based on an estimated false discovery rate.
-
GenericUnivariateSelect: Univariate feature selector with configurable mode.
fit¶
method fit
val fit :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Run score function on (X, y) and get the appropriate features.
Parameters
-
X : array-like of shape (n_samples, n_features) The training input samples.
-
y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).
Returns
- self : object
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
scores_¶
attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
pvalues_¶
attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
SelectKBest¶
Module Sklearn.Feature_selection.SelectKBest
wraps Python class sklearn.feature_selection.SelectKBest
.
type t
create¶
constructor and attributes create
val create :
?score_func:Py.Object.t ->
?k:[`All | `I of int] ->
unit ->
t
Select features according to the k highest scores.
Read more in the :ref:User Guide <univariate_feature_selection>
.
Parameters
-
score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below 'See also'). The default function only works with classification tasks.
.. versionadded:: 0.18
-
k : int or 'all', optional, default=10 Number of top features to select. The 'all' option bypasses selection, for use in a parameter search.
Attributes
-
scores_ : array-like of shape (n_features,) Scores of features.
-
pvalues_ : array-like of shape (n_features,) p-values of feature scores, None if
score_func
returned only scores.
Examples
>>> from sklearn.datasets import load_digits
>>> from sklearn.feature_selection import SelectKBest, chi2
>>> X, y = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> X_new = SelectKBest(chi2, k=20).fit_transform(X, y)
>>> X_new.shape
(1797, 20)
Notes
Ties between features with equal scores will be broken in an unspecified way.
See also
-
f_classif: ANOVA F-value between label/feature for classification tasks.
-
mutual_info_classif: Mutual information for a discrete target.
-
chi2: Chi-squared stats of non-negative features for classification tasks.
-
f_regression: F-value between label/feature for regression tasks.
-
mutual_info_regression: Mutual information for a continuous target.
-
SelectPercentile: Select features based on percentile of the highest scores.
-
SelectFpr: Select features based on a false positive rate test.
-
SelectFdr: Select features based on an estimated false discovery rate.
-
SelectFwe: Select features based on family-wise error rate.
-
GenericUnivariateSelect: Univariate feature selector with configurable mode.
fit¶
method fit
val fit :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Run score function on (X, y) and get the appropriate features.
Parameters
-
X : array-like of shape (n_samples, n_features) The training input samples.
-
y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).
Returns
- self : object
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
scores_¶
attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
pvalues_¶
attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
SelectPercentile¶
Module Sklearn.Feature_selection.SelectPercentile
wraps Python class sklearn.feature_selection.SelectPercentile
.
type t
create¶
constructor and attributes create
val create :
?score_func:Py.Object.t ->
?percentile:int ->
unit ->
t
Select features according to a percentile of the highest scores.
Read more in the :ref:User Guide <univariate_feature_selection>
.
Parameters
-
score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below 'See also'). The default function only works with classification tasks.
.. versionadded:: 0.18
-
percentile : int, optional, default=10 Percent of features to keep.
Attributes
-
scores_ : array-like of shape (n_features,) Scores of features.
-
pvalues_ : array-like of shape (n_features,) p-values of feature scores, None if
score_func
returned only scores.
Examples
>>> from sklearn.datasets import load_digits
>>> from sklearn.feature_selection import SelectPercentile, chi2
>>> X, y = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> X_new = SelectPercentile(chi2, percentile=10).fit_transform(X, y)
>>> X_new.shape
(1797, 7)
Notes
Ties between features with equal scores will be broken in an unspecified way.
See also
-
f_classif: ANOVA F-value between label/feature for classification tasks.
-
mutual_info_classif: Mutual information for a discrete target.
-
chi2: Chi-squared stats of non-negative features for classification tasks.
-
f_regression: F-value between label/feature for regression tasks.
-
mutual_info_regression: Mutual information for a continuous target.
-
SelectKBest: Select features based on the k highest scores.
-
SelectFpr: Select features based on a false positive rate test.
-
SelectFdr: Select features based on an estimated false discovery rate.
-
SelectFwe: Select features based on family-wise error rate.
-
GenericUnivariateSelect: Univariate feature selector with configurable mode.
fit¶
method fit
val fit :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Run score function on (X, y) and get the appropriate features.
Parameters
-
X : array-like of shape (n_samples, n_features) The training input samples.
-
y : array-like of shape (n_samples,) The target values (class labels in classification, real numbers in regression).
Returns
- self : object
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
scores_¶
attribute scores_
val scores_ : t -> [>`ArrayLike] Np.Obj.t
val scores_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
pvalues_¶
attribute pvalues_
val pvalues_ : t -> [>`ArrayLike] Np.Obj.t
val pvalues_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
SelectorMixin¶
Module Sklearn.Feature_selection.SelectorMixin
wraps Python class sklearn.feature_selection.SelectorMixin
.
type t
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
VarianceThreshold¶
Module Sklearn.Feature_selection.VarianceThreshold
wraps Python class sklearn.feature_selection.VarianceThreshold
.
type t
create¶
constructor and attributes create
val create :
?threshold:float ->
unit ->
t
Feature selector that removes all low-variance features.
This feature selection algorithm looks only at the features (X), not the desired outputs (y), and can thus be used for unsupervised learning.
Read more in the :ref:User Guide <variance_threshold>
.
Parameters
- threshold : float, optional Features with a training-set variance lower than this threshold will be removed. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples.
Attributes
- variances_ : array, shape (n_features,) Variances of individual features.
Notes
Allows NaN in the input.
Examples
The following dataset has integer features, two of which are the same in every sample. These are removed with the default setting for threshold::
>>> X = [[0, 2, 0, 3], [0, 1, 4, 3], [0, 1, 1, 3]]
>>> selector = VarianceThreshold()
>>> selector.fit_transform(X)
array([[2, 0],
[1, 4],
[1, 1]])
fit¶
method fit
val fit :
?y:Py.Object.t ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Learn empirical variances from X.
Parameters
-
X : {array-like, sparse matrix}, shape (n_samples, n_features) Sample vectors from which to compute variances.
-
y : any Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.
Returns
self
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
get_support¶
method get_support
val get_support :
?indices:bool ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Get a mask, or integer index, of the features selected
Parameters
- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns
- support : array
An index that selects the retained features from a feature vector.
If
indices
is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
inverse_transform¶
method inverse_transform
val inverse_transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reverse the transformation operation
Parameters
- X : array of shape [n_samples, n_selected_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_original_features]
X
with columns of zeros inserted where features would have been removed by :meth:transform
.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Reduce X to the selected features.
Parameters
- X : array of shape [n_samples, n_features] The input samples.
Returns
- X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.
variances_¶
attribute variances_
val variances_ : t -> [>`ArrayLike] Np.Obj.t
val variances_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
chi2¶
function chi2
val chi2 :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
unit ->
([>`ArrayLike] Np.Obj.t * [>`ArrayLike] Np.Obj.t)
Compute chi-squared stats between each non-negative feature and class.
This score can be used to select the n_features features with the highest values for the test chi-squared statistic from X, which must contain only non-negative features such as booleans or frequencies (e.g., term counts in document classification), relative to the classes.
Recall that the chi-square test measures dependence between stochastic variables, so using this function 'weeds out' the features that are the most likely to be independent of class and therefore irrelevant for classification.
Read more in the :ref:User Guide <univariate_feature_selection>
.
Parameters
-
X : {array-like, sparse matrix} of shape (n_samples, n_features) Sample vectors.
-
y : array-like of shape (n_samples,) Target vector (class labels).
Returns
-
chi2 : array, shape = (n_features,) chi2 statistics of each feature.
-
pval : array, shape = (n_features,) p-values of each feature.
Notes
Complexity of this algorithm is O(n_classes * n_features).
See also
-
f_classif: ANOVA F-value between label/feature for classification tasks.
-
f_regression: F-value between label/feature for regression tasks.
f_classif¶
function f_classif
val f_classif :
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
unit ->
([>`ArrayLike] Np.Obj.t * [>`ArrayLike] Np.Obj.t)
Compute the ANOVA F-value for the provided sample.
Read more in the :ref:User Guide <univariate_feature_selection>
.
Parameters
-
X : {array-like, sparse matrix} shape = [n_samples, n_features] The set of regressors that will be tested sequentially.
-
y : array of shape(n_samples) The data matrix.
Returns
-
F : array, shape = [n_features,] The set of F values.
-
pval : array, shape = [n_features,] The set of p-values.
See also
-
chi2: Chi-squared stats of non-negative features for classification tasks.
-
f_regression: F-value between label/feature for regression tasks.
f_oneway¶
function f_oneway
val f_oneway :
Py.Object.t list ->
Py.Object.t
Performs a 1-way ANOVA.
The one-way ANOVA tests the null hypothesis that 2 or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.
Read more in the :ref:User Guide <univariate_feature_selection>
.
Parameters
- *args : array_like, sparse matrices sample1, sample2... The sample measurements should be given as arguments.
Returns
-
F-value : float The computed F-value of the test.
-
p-value : float The associated p-value from the F-distribution.
Notes
The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid.
- The samples are independent
- Each sample is from a normally distributed population
- The population standard deviations of the groups are all equal. This property is known as homoscedasticity.
If these assumptions are not true for a given set of data, it may still be
possible to use the Kruskal-Wallis H-test (scipy.stats.kruskal
_) although
with some loss of power.
The algorithm is from Heiman[2], pp.394-7.
See scipy.stats.f_oneway
that should give the same results while
being less efficient.
References
.. [1] Lowry, Richard. 'Concepts and Applications of Inferential Statistics'. Chapter 14.
- http://faculty.vassar.edu/lowry/ch14pt1.html
.. [2] Heiman, G.W. Research Methods in Statistics. 2002.
f_regression¶
function f_regression
val f_regression :
?center:[`True | `Bool of bool] ->
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
unit ->
([>`ArrayLike] Np.Obj.t * [>`ArrayLike] Np.Obj.t)
Univariate linear regression tests.
Linear model for testing the individual effect of each of many regressors. This is a scoring function to be used in a feature selection procedure, not a free standing feature selection procedure.
This is done in 2 steps:
- The correlation between each regressor and the target is computed, that is, ((X[:, i] - mean(X[:, i])) * (y - mean_y)) / (std(X[:, i]) * std(y)).
- It is converted to an F score then to a p-value.
For more on usage see the :ref:User Guide <univariate_feature_selection>
.
Parameters
-
X : {array-like, sparse matrix} shape = (n_samples, n_features) The set of regressors that will be tested sequentially.
-
y : array of shape(n_samples). The data matrix
-
center : True, bool, If true, X and y will be centered.
Returns
-
F : array, shape=(n_features,) F values of features.
-
pval : array, shape=(n_features,) p-values of F-scores.
See also
-
mutual_info_regression: Mutual information for a continuous target.
-
f_classif: ANOVA F-value between label/feature for classification tasks.
-
chi2: Chi-squared stats of non-negative features for classification tasks.
-
SelectKBest: Select features based on the k highest scores.
-
SelectFpr: Select features based on a false positive rate test.
-
SelectFdr: Select features based on an estimated false discovery rate.
-
SelectFwe: Select features based on family-wise error rate.
-
SelectPercentile: Select features based on percentile of the highest scores.
mutual_info_classif¶
function mutual_info_classif
val mutual_info_classif :
?discrete_features:[`Arr of [>`ArrayLike] Np.Obj.t | `Auto | `Bool of bool] ->
?n_neighbors:int ->
?copy:bool ->
?random_state:int ->
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
unit ->
[>`ArrayLike] Np.Obj.t
Estimate mutual information for a discrete target variable.
Mutual information (MI) [1]_ between two random variables is a non-negative value, which measures the dependency between the variables. It is equal to zero if and only if two random variables are independent, and higher values mean higher dependency.
The function relies on nonparametric methods based on entropy estimation from k-nearest neighbors distances as described in [2] and [3]. Both methods are based on the idea originally proposed in [4]_.
It can be used for univariate features selection, read more in the
:ref:User Guide <univariate_feature_selection>
.
Parameters
-
X : array_like or sparse matrix, shape (n_samples, n_features) Feature matrix.
-
y : array_like, shape (n_samples,) Target vector.
-
discrete_features : {'auto', bool, array_like}, default 'auto' If bool, then determines whether to consider all features discrete or continuous. If array, then it should be either a boolean mask with shape (n_features,) or array with indices of discrete features. If 'auto', it is assigned to False for dense
X
and to True for sparseX
. -
n_neighbors : int, default 3 Number of neighbors to use for MI estimation for continuous variables, see [2] and [3]. Higher values reduce variance of the estimation, but could introduce a bias.
-
copy : bool, default True Whether to make a copy of the given data. If set to False, the initial data will be overwritten.
-
random_state : int, RandomState instance or None, optional, default None Determines random number generation for adding small noise to continuous variables in order to remove repeated values. Pass an int for reproducible results across multiple function calls.
-
See :term:
Glossary <random_state>
.
Returns
- mi : ndarray, shape (n_features,) Estimated mutual information between each feature and the target.
Notes
- The term 'discrete features' is used instead of naming them 'categorical', because it describes the essence more accurately. For example, pixel intensities of an image are discrete features (but hardly categorical) and you will get better results if mark them as such. Also note, that treating a continuous variable as discrete and vice versa will usually give incorrect results, so be attentive about that.
- True mutual information can't be negative. If its estimate turns out to be negative, it is replaced by zero.
References
.. [1] Mutual Information <https://en.wikipedia.org/wiki/Mutual_information>
_
on Wikipedia.
.. [2] A. Kraskov, H. Stogbauer and P. Grassberger, 'Estimating mutual
information'. Phys. Rev. E 69, 2004.
.. [3] B. C. Ross 'Mutual Information between Discrete and Continuous
Data Sets'. PLoS ONE 9(2), 2014.
.. [4] L. F. Kozachenko, N. N. Leonenko, 'Sample Estimate of the Entropy
of a Random Vector:, Probl. Peredachi Inf., 23:2 (1987), 9-16
mutual_info_regression¶
function mutual_info_regression
val mutual_info_regression :
?discrete_features:[`Arr of [>`ArrayLike] Np.Obj.t | `Auto | `Bool of bool] ->
?n_neighbors:int ->
?copy:bool ->
?random_state:int ->
x:[>`ArrayLike] Np.Obj.t ->
y:[>`ArrayLike] Np.Obj.t ->
unit ->
[>`ArrayLike] Np.Obj.t
Estimate mutual information for a continuous target variable.
Mutual information (MI) [1]_ between two random variables is a non-negative value, which measures the dependency between the variables. It is equal to zero if and only if two random variables are independent, and higher values mean higher dependency.
The function relies on nonparametric methods based on entropy estimation from k-nearest neighbors distances as described in [2] and [3]. Both methods are based on the idea originally proposed in [4]_.
It can be used for univariate features selection, read more in the
:ref:User Guide <univariate_feature_selection>
.
Parameters
-
X : array_like or sparse matrix, shape (n_samples, n_features) Feature matrix.
-
y : array_like, shape (n_samples,) Target vector.
-
discrete_features : {'auto', bool, array_like}, default 'auto' If bool, then determines whether to consider all features discrete or continuous. If array, then it should be either a boolean mask with shape (n_features,) or array with indices of discrete features. If 'auto', it is assigned to False for dense
X
and to True for sparseX
. -
n_neighbors : int, default 3 Number of neighbors to use for MI estimation for continuous variables, see [2] and [3]. Higher values reduce variance of the estimation, but could introduce a bias.
-
copy : bool, default True Whether to make a copy of the given data. If set to False, the initial data will be overwritten.
-
random_state : int, RandomState instance or None, optional, default None Determines random number generation for adding small noise to continuous variables in order to remove repeated values. Pass an int for reproducible results across multiple function calls.
-
See :term:
Glossary <random_state>
.
Returns
- mi : ndarray, shape (n_features,) Estimated mutual information between each feature and the target.
Notes
- The term 'discrete features' is used instead of naming them 'categorical', because it describes the essence more accurately. For example, pixel intensities of an image are discrete features (but hardly categorical) and you will get better results if mark them as such. Also note, that treating a continuous variable as discrete and vice versa will usually give incorrect results, so be attentive about that.
- True mutual information can't be negative. If its estimate turns out to be negative, it is replaced by zero.
References
.. [1] Mutual Information <https://en.wikipedia.org/wiki/Mutual_information>
_
on Wikipedia.
.. [2] A. Kraskov, H. Stogbauer and P. Grassberger, 'Estimating mutual
information'. Phys. Rev. E 69, 2004.
.. [3] B. C. Ross 'Mutual Information between Discrete and Continuous
Data Sets'. PLoS ONE 9(2), 2014.
.. [4] L. F. Kozachenko, N. N. Leonenko, 'Sample Estimate of the Entropy
of a Random Vector', Probl. Peredachi Inf., 23:2 (1987), 9-16