Impute
KNNImputer¶
Module Sklearn.Impute.KNNImputer
wraps Python class sklearn.impute.KNNImputer
.
type t
create¶
constructor and attributes create
val create :
?missing_values:[`F of float | `S of string | `Np_nan of Py.Object.t | `I of int | `None] ->
?n_neighbors:int ->
?weights:[`Callable of Py.Object.t | `Distance | `Uniform] ->
?metric:[`Nan_euclidean | `Callable of Py.Object.t] ->
?copy:bool ->
?add_indicator:bool ->
unit ->
t
Imputation for completing missing values using k-Nearest Neighbors.
Each sample's missing values are imputed using the mean value from
n_neighbors
nearest neighbors found in the training set. Two samples are
close if the features that neither is missing are close.
Read more in the :ref:User Guide <knnimpute>
.
.. versionadded:: 0.22
Parameters
-
missing_values : number, string, np.nan or None, default=
np.nan
The placeholder for the missing values. All occurrences ofmissing_values
will be imputed. For pandas' dataframes with nullable integer dtypes with missing values,missing_values
should be set tonp.nan
, sincepd.NA
will be converted tonp.nan
. -
n_neighbors : int, default=5 Number of neighboring samples to use for imputation.
-
weights : {'uniform', 'distance'} or callable, default='uniform' Weight function used in prediction. Possible values:
- 'uniform' : uniform weights. All points in each neighborhood are weighted equally.
- 'distance' : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
- callable : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
-
metric : {'nan_euclidean'} or callable, default='nan_euclidean' Distance metric for searching neighbors. Possible values:
- 'nan_euclidean'
- callable : a user-defined function which conforms to the definition
of
_pairwise_callable(X, Y, metric, **kwds)
. The function accepts two arrays, X and Y, and amissing_values
keyword inkwds
and returns a scalar distance value.
-
copy : bool, default=True If True, a copy of X will be created. If False, imputation will be done in-place whenever possible.
-
add_indicator : bool, default=False If True, a :class:
MissingIndicator
transform will stack onto the output of the imputer's transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won't appear on the missing indicator even if there are missing values at transform/test time.
Attributes
- indicator_ : :class:
sklearn.impute.MissingIndicator
Indicator used to add binary indicators for missing values.None
if add_indicator is False.
References
- Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays, BIOINFORMATICS Vol. 17 no. 6, 2001 Pages 520-525.
Examples
>>> import numpy as np
>>> from sklearn.impute import KNNImputer
>>> X = [[1, 2, np.nan], [3, 4, 3], [np.nan, 6, 5], [8, 8, 7]]
>>> imputer = KNNImputer(n_neighbors=2)
>>> imputer.fit_transform(X)
array([[1. , 2. , 4. ],
[3. , 4. , 3. ],
[5.5, 6. , 5. ],
[8. , 8. , 7. ]])
fit¶
method fit
val fit :
?y:Py.Object.t ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Fit the imputer on X.
Parameters
- X : array-like shape of (n_samples, n_features)
Input data, where
n_samples
is the number of samples andn_features
is the number of features.
Returns
- self : object
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Impute all missing values in X.
Parameters
- X : array-like of shape (n_samples, n_features) The input data to complete.
Returns
- X : array-like of shape (n_samples, n_output_features)
The imputed dataset.
n_output_features
is the number of features that is not always missing duringfit
.
indicator_¶
attribute indicator_
val indicator_ : t -> Py.Object.t
val indicator_opt : t -> (Py.Object.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
MissingIndicator¶
Module Sklearn.Impute.MissingIndicator
wraps Python class sklearn.impute.MissingIndicator
.
type t
create¶
constructor and attributes create
val create :
?missing_values:[`F of float | `S of string | `Np_nan of Py.Object.t | `I of int | `None] ->
?features:string ->
?sparse:[`Auto | `Bool of bool] ->
?error_on_new:bool ->
unit ->
t
Binary indicators for missing values.
Note that this component typically should not be used in a vanilla
:class:Pipeline
consisting of transformers and a classifier, but rather
could be added using a :class:FeatureUnion
or :class:ColumnTransformer
.
Read more in the :ref:User Guide <impute>
.
.. versionadded:: 0.20
Parameters
-
missing_values : number, string, np.nan (default) or None The placeholder for the missing values. All occurrences of
missing_values
will be imputed. For pandas' dataframes with nullable integer dtypes with missing values,missing_values
should be set tonp.nan
, sincepd.NA
will be converted tonp.nan
. -
features : str, default=None Whether the imputer mask should represent all or a subset of features.
- If 'missing-only' (default), the imputer mask will only represent features containing missing values during fit time.
- If 'all', the imputer mask will represent all features.
-
sparse : boolean or 'auto', default=None Whether the imputer mask format should be sparse or dense.
- If 'auto' (default), the imputer mask will be of same type as input.
- If True, the imputer mask will be a sparse matrix.
- If False, the imputer mask will be a numpy array.
-
error_on_new : boolean, default=None If True (default), transform will raise an error when there are features with missing values in transform that have no missing values in fit. This is applicable only when
features='missing-only'
.
Attributes
- features_ : ndarray, shape (n_missing_features,) or (n_features,)
The features indices which will be returned when calling
transform
. They are computed duringfit
. Forfeatures='all'
, it is torange(n_features)
.
Examples
>>> import numpy as np
>>> from sklearn.impute import MissingIndicator
>>> X1 = np.array([[np.nan, 1, 3],
... [4, 0, np.nan],
... [8, 1, 0]])
>>> X2 = np.array([[5, 1, np.nan],
... [np.nan, 2, 3],
... [2, 4, 0]])
>>> indicator = MissingIndicator()
>>> indicator.fit(X1)
MissingIndicator()
>>> X2_tr = indicator.transform(X2)
>>> X2_tr
array([[False, True],
[ True, False],
[False, False]])
fit¶
method fit
val fit :
?y:Py.Object.t ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Fit the transformer on X.
Parameters
- X : {array-like, sparse matrix}, shape (n_samples, n_features)
Input data, where
n_samples
is the number of samples andn_features
is the number of features.
Returns
- self : object Returns self.
fit_transform¶
method fit_transform
val fit_transform :
?y:Py.Object.t ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Generate missing values indicator for X.
Parameters
- X : {array-like, sparse matrix}, shape (n_samples, n_features) The input data to complete.
Returns
- Xt : {ndarray or sparse matrix}, shape (n_samples, n_features) or (n_samples, n_features_with_missing)
The missing indicator for input data. The data type of
Xt
will be boolean.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Generate missing values indicator for X.
Parameters
- X : {array-like, sparse matrix}, shape (n_samples, n_features) The input data to complete.
Returns
- Xt : {ndarray or sparse matrix}, shape (n_samples, n_features) or (n_samples, n_features_with_missing)
The missing indicator for input data. The data type of
Xt
will be boolean.
features_¶
attribute features_
val features_ : t -> [>`ArrayLike] Np.Obj.t
val features_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.
SimpleImputer¶
Module Sklearn.Impute.SimpleImputer
wraps Python class sklearn.impute.SimpleImputer
.
type t
create¶
constructor and attributes create
val create :
?missing_values:[`F of float | `S of string | `Np_nan of Py.Object.t | `I of int | `None] ->
?strategy:[`Mean | `Median | `Most_frequent | `Constant] ->
?fill_value:[`F of float | `S of string | `I of int] ->
?verbose:int ->
?copy:bool ->
?add_indicator:bool ->
unit ->
t
Imputation transformer for completing missing values.
Read more in the :ref:User Guide <impute>
.
.. versionadded:: 0.20
SimpleImputer
replaces the previous sklearn.preprocessing.Imputer
estimator which is now removed.
Parameters
-
missing_values : number, string, np.nan (default) or None The placeholder for the missing values. All occurrences of
missing_values
will be imputed. For pandas' dataframes with nullable integer dtypes with missing values,missing_values
should be set tonp.nan
, sincepd.NA
will be converted tonp.nan
. -
strategy : string, default='mean' The imputation strategy.
- If 'mean', then replace missing values using the mean along each column. Can only be used with numeric data.
- If 'median', then replace missing values using the median along each column. Can only be used with numeric data.
- If 'most_frequent', then replace missing using the most frequent value along each column. Can be used with strings or numeric data.
- If 'constant', then replace missing values with fill_value. Can be used with strings or numeric data.
.. versionadded:: 0.20 strategy='constant' for fixed value imputation.
-
fill_value : string or numerical value, default=None When strategy == 'constant', fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data and 'missing_value' for strings or object data types.
-
verbose : integer, default=0 Controls the verbosity of the imputer.
-
copy : boolean, default=True If True, a copy of X will be created. If False, imputation will be done in-place whenever possible. Note that, in the following cases, a new copy will always be made, even if
copy=False
:- If X is not an array of floating values;
- If X is encoded as a CSR matrix;
- If add_indicator=True.
-
add_indicator : boolean, default=False If True, a :class:
MissingIndicator
transform will stack onto output of the imputer's transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won't appear on the missing indicator even if there are missing values at transform/test time.
Attributes
-
statistics_ : array of shape (n_features,) The imputation fill value for each feature. Computing statistics can result in
np.nan
values. -
During :meth:
transform
, features corresponding tonp.nan
statistics will be discarded. -
indicator_ : :class:
sklearn.impute.MissingIndicator
Indicator used to add binary indicators for missing values.None
if add_indicator is False.
See also
- IterativeImputer : Multivariate imputation of missing values.
Examples
>>> import numpy as np
>>> from sklearn.impute import SimpleImputer
>>> imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
>>> imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])
SimpleImputer()
>>> X = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]]
>>> print(imp_mean.transform(X))
[[ 7. 2. 3. ]
[ 4. 3.5 6. ]
[10. 3.5 9. ]]
Notes
Columns which only contained missing values at :meth:fit
are discarded
- upon :meth:
transform
if strategy is not 'constant'.
fit¶
method fit
val fit :
?y:Py.Object.t ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
t
Fit the imputer on X.
Parameters
- X : {array-like, sparse matrix}, shape (n_samples, n_features)
Input data, where
n_samples
is the number of samples andn_features
is the number of features.
Returns
- self : SimpleImputer
fit_transform¶
method fit_transform
val fit_transform :
?y:[>`ArrayLike] Np.Obj.t ->
?fit_params:(string * Py.Object.t) list ->
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
-
y : ndarray of shape (n_samples,), default=None Target values.
-
**fit_params : dict Additional fit parameters.
Returns
- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.
get_params¶
method get_params
val get_params :
?deep:bool ->
[> tag] Obj.t ->
Dict.t
Get parameters for this estimator.
Parameters
- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- params : mapping of string to any Parameter names mapped to their values.
set_params¶
method set_params
val set_params :
?params:(string * Py.Object.t) list ->
[> tag] Obj.t ->
t
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
- **params : dict Estimator parameters.
Returns
- self : object Estimator instance.
transform¶
method transform
val transform :
x:[>`ArrayLike] Np.Obj.t ->
[> tag] Obj.t ->
[>`ArrayLike] Np.Obj.t
Impute all missing values in X.
Parameters
- X : {array-like, sparse matrix}, shape (n_samples, n_features) The input data to complete.
statistics_¶
attribute statistics_
val statistics_ : t -> [>`ArrayLike] Np.Obj.t
val statistics_opt : t -> ([>`ArrayLike] Np.Obj.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
indicator_¶
attribute indicator_
val indicator_ : t -> Py.Object.t
val indicator_opt : t -> (Py.Object.t) option
This attribute is documented in create
above. The first version raises Not_found
if the attribute is None. The _opt version returns an option.
to_string¶
method to_string
val to_string: t -> string
Print the object to a human-readable representation.
show¶
method show
val show: t -> string
Print the object to a human-readable representation.
pp¶
method pp
val pp: Format.formatter -> t -> unit
Pretty-print the object to a formatter.