Skip to content

Discriminant analysis

LinearClassifierMixin

Module Sklearn.​Discriminant_analysis.​LinearClassifierMixin wraps Python class sklearn.discriminant_analysis.LinearClassifierMixin.

type t

create

constructor and attributes create
val create :
  unit ->
  t

Mixin for linear classifiers.

Handles prediction for sparse and dense X.

decision_function

method decision_function
val decision_function :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Predict confidence scores for samples.

The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters

  • X : array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns

array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes) Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

predict

method predict
val predict :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Predict class labels for samples in X.

Parameters

  • X : array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns

  • C : array, shape [n_samples] Predicted class label per sample.

score

method score
val score :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  float

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

  • X : array-like of shape (n_samples, n_features) Test samples.

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X.

  • sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

  • score : float Mean accuracy of self.predict(X) wrt. y.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

LinearDiscriminantAnalysis

Module Sklearn.​Discriminant_analysis.​LinearDiscriminantAnalysis wraps Python class sklearn.discriminant_analysis.LinearDiscriminantAnalysis.

type t

create

constructor and attributes create
val create :
  ?solver:[`Svd | `Lsqr | `Eigen] ->
  ?shrinkage:[`F of float | `Auto] ->
  ?priors:[>`ArrayLike] Np.Obj.t ->
  ?n_components:int ->
  ?store_covariance:bool ->
  ?tol:float ->
  unit ->
  t

Linear Discriminant Analysis

A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.

The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.

The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions, using the transform method.

.. versionadded:: 0.17 LinearDiscriminantAnalysis.

Read more in the :ref:User Guide <lda_qda>.

Parameters

  • solver : {'svd', 'lsqr', 'eigen'}, default='svd' Solver to use, possible values:

    • 'svd': Singular value decomposition (default). Does not compute the covariance matrix, therefore this solver is recommended for data with a large number of features.
    • 'lsqr': Least squares solution, can be combined with shrinkage.
    • 'eigen': Eigenvalue decomposition, can be combined with shrinkage.
  • shrinkage : 'auto' or float, default=None Shrinkage parameter, possible values:

    • None: no shrinkage (default).
    • 'auto': automatic shrinkage using the Ledoit-Wolf lemma.
    • float between 0 and 1: fixed shrinkage parameter.

    Note that shrinkage works only with 'lsqr' and 'eigen' solvers.

  • priors : array-like of shape (n_classes,), default=None The class prior probabilities. By default, the class proportions are inferred from the training data.

  • n_components : int, default=None Number of components (<= min(n_classes - 1, n_features)) for dimensionality reduction. If None, will be set to min(n_classes - 1, n_features). This parameter only affects the transform method.

  • store_covariance : bool, default=False If True, explicitely compute the weighted within-class covariance matrix when solver is 'svd'. The matrix is always computed and stored for the other solvers.

    .. versionadded:: 0.17

  • tol : float, default=1.0e-4 Absolute threshold for a singular value of X to be considered significant, used to estimate the rank of X. Dimensions whose singular values are non-significant are discarded. Only used if solver is 'svd'.

    .. versionadded:: 0.17

Attributes

  • coef_ : ndarray of shape (n_features,) or (n_classes, n_features) Weight vector(s).

  • intercept_ : ndarray of shape (n_classes,) Intercept term.

  • covariance_ : array-like of shape (n_features, n_features) Weighted within-class covariance matrix. It corresponds to sum_k prior_k * C_k where C_k is the covariance matrix of the samples in class k. The C_k are estimated using the (potentially shrunk) biased estimator of covariance. If solver is 'svd', only exists when store_covariance is True.

  • explained_variance_ratio_ : ndarray of shape (n_components,) Percentage of variance explained by each of the selected components. If n_components is not set then all components are stored and the sum of explained variances is equal to 1.0. Only available when eigen or svd solver is used.

  • means_ : array-like of shape (n_classes, n_features) Class-wise means.

  • priors_ : array-like of shape (n_classes,) Class priors (sum to 1).

  • scalings_ : array-like of shape (rank, n_classes - 1) Scaling of the features in the space spanned by the class centroids. Only available for 'svd' and 'eigen' solvers.

  • xbar_ : array-like of shape (n_features,) Overall mean. Only present if solver is 'svd'.

  • classes_ : array-like of shape (n_classes,) Unique class labels.

See also

  • sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis: Quadratic Discriminant Analysis

Examples

>>> import numpy as np
>>> from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LinearDiscriminantAnalysis()
>>> clf.fit(X, y)
LinearDiscriminantAnalysis()
>>> print(clf.predict([[-0.8, -1]]))
[1]

decision_function

method decision_function
val decision_function :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Apply decision function to an array of samples.

The decision function is equal (up to a constant factor) to the log-posterior of the model, i.e. log p(y = k | x). In a binary classification setting this instead corresponds to the difference log p(y = 1 | x) - log p(y = 0 | x). See :ref:lda_qda_math.

Parameters

  • X : array-like of shape (n_samples, n_features) Array of samples (test vectors).

Returns

  • C : ndarray of shape (n_samples,) or (n_samples, n_classes) Decision function values related to each class, per sample. In the two-class case, the shape is (n_samples,), giving the log likelihood ratio of the positive class.

fit

method fit
val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Fit LinearDiscriminantAnalysis model according to the given training data and parameters.

.. versionchanged:: 0.19 store_covariance has been moved to main constructor.

.. versionchanged:: 0.19 tol has been moved to main constructor.

Parameters

  • X : array-like of shape (n_samples, n_features) Training data.

  • y : array-like of shape (n_samples,) Target values.

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

predict

method predict
val predict :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Predict class labels for samples in X.

Parameters

  • X : array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns

  • C : array, shape [n_samples] Predicted class label per sample.

predict_log_proba

method predict_log_proba
val predict_log_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Estimate log probability.

Parameters

  • X : array-like of shape (n_samples, n_features) Input data.

Returns

  • C : ndarray of shape (n_samples, n_classes) Estimated log probabilities.

predict_proba

method predict_proba
val predict_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Estimate probability.

Parameters

  • X : array-like of shape (n_samples, n_features) Input data.

Returns

  • C : ndarray of shape (n_samples, n_classes) Estimated probabilities.

score

method score
val score :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  float

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

  • X : array-like of shape (n_samples, n_features) Test samples.

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X.

  • sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

  • score : float Mean accuracy of self.predict(X) wrt. y.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Project data to maximize class separation.

Parameters

  • X : array-like of shape (n_samples, n_features) Input data.

Returns

  • X_new : ndarray of shape (n_samples, n_components) Transformed data.

coef_

attribute coef_
val coef_ : t -> [>`ArrayLike] Np.Obj.t
val coef_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

intercept_

attribute intercept_
val intercept_ : t -> [>`ArrayLike] Np.Obj.t
val intercept_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

covariance_

attribute covariance_
val covariance_ : t -> [>`ArrayLike] Np.Obj.t
val covariance_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

explained_variance_ratio_

attribute explained_variance_ratio_
val explained_variance_ratio_ : t -> [>`ArrayLike] Np.Obj.t
val explained_variance_ratio_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

means_

attribute means_
val means_ : t -> [>`ArrayLike] Np.Obj.t
val means_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

priors_

attribute priors_
val priors_ : t -> [>`ArrayLike] Np.Obj.t
val priors_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

scalings_

attribute scalings_
val scalings_ : t -> [>`ArrayLike] Np.Obj.t
val scalings_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

xbar_

attribute xbar_
val xbar_ : t -> [>`ArrayLike] Np.Obj.t
val xbar_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

classes_

attribute classes_
val classes_ : t -> [>`ArrayLike] Np.Obj.t
val classes_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

QuadraticDiscriminantAnalysis

Module Sklearn.​Discriminant_analysis.​QuadraticDiscriminantAnalysis wraps Python class sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis.

type t

create

constructor and attributes create
val create :
  ?priors:[>`ArrayLike] Np.Obj.t ->
  ?reg_param:float ->
  ?store_covariance:bool ->
  ?tol:float ->
  unit ->
  t

Quadratic Discriminant Analysis

A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.

The model fits a Gaussian density to each class.

.. versionadded:: 0.17 QuadraticDiscriminantAnalysis

Read more in the :ref:User Guide <lda_qda>.

Parameters

  • priors : ndarray of shape (n_classes,), default=None Class priors. By default, the class proportions are inferred from the training data.

  • reg_param : float, default=0.0 Regularizes the per-class covariance estimates by transforming S2 as S2 = (1 - reg_param) * S2 + reg_param * np.eye(n_features), where S2 corresponds to the scaling_ attribute of a given class.

  • store_covariance : bool, default=False If True, the class covariance matrices are explicitely computed and stored in the self.covariance_ attribute.

    .. versionadded:: 0.17

  • tol : float, default=1.0e-4 Absolute threshold for a singular value to be considered significant, used to estimate the rank of Xk where Xk is the centered matrix of samples in class k. This parameter does not affect the predictions. It only controls a warning that is raised when features are considered to be colinear.

    .. versionadded:: 0.17

Attributes

  • covariance_ : list of len n_classes of ndarray of shape (n_features, n_features) For each class, gives the covariance matrix estimated using the samples of that class. The estimations are unbiased. Only present if store_covariance is True.

  • means_ : array-like of shape (n_classes, n_features) Class-wise means.

  • priors_ : array-like of shape (n_classes,) Class priors (sum to 1).

  • rotations_ : list of len n_classes of ndarray of shape (n_features, n_k) For each class k an array of shape (n_features, n_k), where n_k = min(n_features, number of elements in class k) It is the rotation of the Gaussian distribution, i.e. its principal axis. It corresponds to V, the matrix of eigenvectors coming from the SVD of Xk = U S Vt where Xk is the centered matrix of samples from class k.

  • scalings_ : list of len n_classes of ndarray of shape (n_k,) For each class, contains the scaling of the Gaussian distributions along its principal axes, i.e. the variance in the rotated coordinate system. It corresponds to S^2 / (n_samples - 1), where S is the diagonal matrix of singular values from the SVD of Xk, where Xk is the centered matrix of samples from class k.

  • classes_ : ndarray of shape (n_classes,) Unique class labels.

Examples

>>> from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = QuadraticDiscriminantAnalysis()
>>> clf.fit(X, y)
QuadraticDiscriminantAnalysis()
>>> print(clf.predict([[-0.8, -1]]))
[1]

See also

  • sklearn.discriminant_analysis.LinearDiscriminantAnalysis: Linear Discriminant Analysis

decision_function

method decision_function
val decision_function :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Apply decision function to an array of samples.

The decision function is equal (up to a constant factor) to the log-posterior of the model, i.e. log p(y = k | x). In a binary classification setting this instead corresponds to the difference log p(y = 1 | x) - log p(y = 0 | x). See :ref:lda_qda_math.

Parameters

  • X : array-like of shape (n_samples, n_features) Array of samples (test vectors).

Returns

  • C : ndarray of shape (n_samples,) or (n_samples, n_classes) Decision function values related to each class, per sample. In the two-class case, the shape is (n_samples,), giving the log likelihood ratio of the positive class.

fit

method fit
val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Fit the model according to the given training data and parameters.

.. versionchanged:: 0.19
   ``store_covariances`` has been moved to main constructor as
   ``store_covariance``

.. versionchanged:: 0.19
   ``tol`` has been moved to main constructor.

Parameters

  • X : array-like of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y : array-like of shape (n_samples,) Target values (integers)

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

predict

method predict
val predict :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Perform classification on an array of test vectors X.

The predicted class C for each sample in X is returned.

Parameters

  • X : array-like of shape (n_samples, n_features)

Returns

  • C : ndarray of shape (n_samples,)

predict_log_proba

method predict_log_proba
val predict_log_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Return log of posterior probabilities of classification.

Parameters

  • X : array-like of shape (n_samples, n_features) Array of samples/test vectors.

Returns

  • C : ndarray of shape (n_samples, n_classes) Posterior log-probabilities of classification per class.

predict_proba

method predict_proba
val predict_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Return posterior probabilities of classification.

Parameters

  • X : array-like of shape (n_samples, n_features) Array of samples/test vectors.

Returns

  • C : ndarray of shape (n_samples, n_classes) Posterior probabilities of classification per class.

score

method score
val score :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  float

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

  • X : array-like of shape (n_samples, n_features) Test samples.

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X.

  • sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

  • score : float Mean accuracy of self.predict(X) wrt. y.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

covariance_

attribute covariance_
val covariance_ : t -> Py.Object.t
val covariance_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

means_

attribute means_
val means_ : t -> [>`ArrayLike] Np.Obj.t
val means_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

priors_

attribute priors_
val priors_ : t -> [>`ArrayLike] Np.Obj.t
val priors_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

rotations_

attribute rotations_
val rotations_ : t -> Py.Object.t
val rotations_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

scalings_

attribute scalings_
val scalings_ : t -> Py.Object.t
val scalings_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

classes_

attribute classes_
val classes_ : t -> [>`ArrayLike] Np.Obj.t
val classes_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

StandardScaler

Module Sklearn.​Discriminant_analysis.​StandardScaler wraps Python class sklearn.discriminant_analysis.StandardScaler.

type t

create

constructor and attributes create
val create :
  ?copy:bool ->
  ?with_mean:bool ->
  ?with_std:bool ->
  unit ->
  t

Standardize features by removing the mean and scaling to unit variance

The standard score of a sample x is calculated as:

z = (x - u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using :meth:transform.

Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

This scaler can also be applied to sparse CSR or CSC matrices by passing with_mean=False to avoid breaking the sparsity structure of the data.

Read more in the :ref:User Guide <preprocessing_scaler>.

Parameters

  • copy : boolean, optional, default True If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.

  • with_mean : boolean, True by default If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

  • with_std : boolean, True by default If True, scale the data to unit variance (or equivalently, unit standard deviation).

Attributes

  • scale_ : ndarray or None, shape (n_features,) Per feature relative scaling of the data. This is calculated using np.sqrt(var_). Equal to None when with_std=False.

    .. versionadded:: 0.17 scale_

  • mean_ : ndarray or None, shape (n_features,) The mean value for each feature in the training set. Equal to None when with_mean=False.

  • var_ : ndarray or None, shape (n_features,) The variance for each feature in the training set. Used to compute scale_. Equal to None when with_std=False.

  • n_samples_seen_ : int or array, shape (n_features,) The number of samples processed by the estimator for each feature. If there are not missing samples, the n_samples_seen will be an integer, otherwise it will be an array. Will be reset on new calls to fit, but increments across partial_fit calls.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> scaler = StandardScaler()
>>> print(scaler.fit(data))
StandardScaler()
>>> print(scaler.mean_)
[0.5 0.5]
>>> print(scaler.transform(data))
[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]]
>>> print(scaler.transform([[2, 2]]))
[[3. 3.]]

See also

  • scale: Equivalent function without the estimator API.

:class:sklearn.decomposition.PCA Further removes the linear correlation across features with 'whiten=True'.

Notes

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

We use a biased estimator for the standard deviation, equivalent to numpy.std(x, ddof=0). Note that the choice of ddof is unlikely to affect model performance.

For a comparison of the different scalers, transformers, and normalizers,

  • **see :ref:examples/preprocessing/plot_all_scaling.py** <sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>.

fit

method fit
val fit :
  ?y:Py.Object.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Compute the mean and std to be used for later scaling.

Parameters

  • X : {array-like, sparse matrix}, shape [n_samples, n_features] The data used to compute the mean and standard deviation used for later scaling along the features axis.

y Ignored

fit_transform

method fit_transform
val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

  • X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

  • y : ndarray of shape (n_samples,), default=None Target values.

  • **fit_params : dict Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

inverse_transform

method inverse_transform
val inverse_transform :
  ?copy:bool ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Scale back the data to the original representation

Parameters

  • X : array-like, shape [n_samples, n_features] The data used to scale along the features axis.

  • copy : bool, optional (default: None) Copy the input X or not.

Returns

  • X_tr : array-like, shape [n_samples, n_features] Transformed array.

partial_fit

method partial_fit
val partial_fit :
  ?y:Py.Object.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Online computation of mean and std on X for later scaling.

All of X is processed as a single batch. This is intended for cases

  • when :meth:fit is not feasible due to very large number of n_samples or because X is read from a continuous stream.

The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. 'Algorithms for computing the sample variance: Analysis and recommendations.' The American Statistician 37.3 (1983): 242-247:

Parameters

  • X : {array-like, sparse matrix}, shape [n_samples, n_features] The data used to compute the mean and standard deviation used for later scaling along the features axis.

  • y : None Ignored.

Returns

  • self : object Transformer instance.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

transform

method transform
val transform :
  ?copy:bool ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Perform standardization by centering and scaling

Parameters

  • X : array-like, shape [n_samples, n_features] The data used to scale along the features axis.

  • copy : bool, optional (default: None) Copy the input X or not.

scale_

attribute scale_
val scale_ : t -> [>`ArrayLike] Np.Obj.t
val scale_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

mean_

attribute mean_
val mean_ : t -> [>`ArrayLike] Np.Obj.t
val mean_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

var_

attribute var_
val var_ : t -> [>`ArrayLike] Np.Obj.t
val var_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

n_samples_seen_

attribute n_samples_seen_
val n_samples_seen_ : t -> Py.Object.t
val n_samples_seen_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

check_array

function check_array
val check_array :
  ?accept_sparse:[`S of string | `StringList of string list | `Bool of bool] ->
  ?accept_large_sparse:bool ->
  ?dtype:[`Dtypes of Np.Dtype.t list | `S of string | `Dtype of Np.Dtype.t | `None] ->
  ?order:[`F | `C] ->
  ?copy:bool ->
  ?force_all_finite:[`Allow_nan | `Bool of bool] ->
  ?ensure_2d:bool ->
  ?allow_nd:bool ->
  ?ensure_min_samples:int ->
  ?ensure_min_features:int ->
  ?estimator:[>`BaseEstimator] Np.Obj.t ->
  array:Py.Object.t ->
  unit ->
  Py.Object.t

Input validation on an array, list, sparse matrix or similar.

By default, the input is checked to be a non-empty 2D array containing only finite values. If the dtype of the array is object, attempt converting to float, raising on failure.

Parameters

  • array : object Input object to check / convert.

  • accept_sparse : string, boolean or list/tuple of strings (default=False) String[s] representing allowed sparse matrix formats, such as 'csc', 'csr', etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.

  • accept_large_sparse : bool (default=True) If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse=False will cause it to be accepted only if its indices are stored with a 32-bit dtype.

    .. versionadded:: 0.20

  • dtype : string, type, list of types or None (default='numeric') Data type of result. If None, the dtype of the input is preserved. If 'numeric', dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.

  • order : 'F', 'C' or None (default=None) Whether an array will be forced to be fortran or c-style. When order is None (default), then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array.

  • copy : boolean (default=False) Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

  • force_all_finite : boolean or 'allow-nan', (default=True) Whether to raise an error on np.inf, np.nan, pd.NA in array. The possibilities are:

    • True: Force all values of array to be finite.
    • False: accepts np.inf, np.nan, pd.NA in array.
    • 'allow-nan': accepts only np.nan and pd.NA values in array. Values cannot be infinite.

    .. versionadded:: 0.20 force_all_finite accepts the string 'allow-nan'.

    .. versionchanged:: 0.23 Accepts pd.NA and converts it into np.nan

  • ensure_2d : boolean (default=True) Whether to raise a value error if array is not 2D.

  • allow_nd : boolean (default=False) Whether to allow array.ndim > 2.

  • ensure_min_samples : int (default=1) Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.

  • ensure_min_features : int (default=1) Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and ensure_2d is True. Setting to 0 disables this check.

  • estimator : str or estimator instance (default=None) If passed, include the name of the estimator in warning messages.

Returns

  • array_converted : object The converted and validated array.

check_classification_targets

function check_classification_targets
val check_classification_targets :
  [>`ArrayLike] Np.Obj.t ->
  Py.Object.t

Ensure that target y is of a non-regression type.

Only the following target types (as defined in type_of_target) are allowed: 'binary', 'multiclass', 'multiclass-multioutput', 'multilabel-indicator', 'multilabel-sequences'

Parameters

  • y : array-like

check_is_fitted

function check_is_fitted
val check_is_fitted :
  ?attributes:[`Arr of [>`ArrayLike] Np.Obj.t | `S of string | `StringList of string list] ->
  ?msg:string ->
  ?all_or_any:[`Callable of Py.Object.t | `PyObject of Py.Object.t] ->
  estimator:[>`BaseEstimator] Np.Obj.t ->
  unit ->
  Py.Object.t

Perform is_fitted validation for estimator.

Checks if the estimator is fitted by verifying the presence of fitted attributes (ending with a trailing underscore) and otherwise raises a NotFittedError with the given message.

This utility is meant to be used internally by estimators themselves, typically in their own predict / transform methods.

Parameters

  • estimator : estimator instance. estimator instance for which the check is performed.

  • attributes : str, list or tuple of str, default=None Attribute name(s) given as string or a list/tuple of strings

  • Eg.: ['coef_', 'estimator_', ...], 'coef_'

    If None, estimator is considered fitted if there exist an attribute that ends with a underscore and does not start with double underscore.

  • msg : string The default error message is, 'This %(name)s instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.'

    For custom messages if '%(name)s' is present in the message string, it is substituted for the estimator name.

  • Eg. : 'Estimator, %(name)s, must be fitted before sparsifying'.

  • all_or_any : callable, {all, any}, default all Specify whether all or any of the given attributes must exist.

Returns

None

Raises

NotFittedError If the attributes are not found.

empirical_covariance

function empirical_covariance
val empirical_covariance :
  ?assume_centered:bool ->
  x:[>`ArrayLike] Np.Obj.t ->
  unit ->
  [>`ArrayLike] Np.Obj.t

Computes the Maximum likelihood covariance estimator

Parameters

  • X : ndarray of shape (n_samples, n_features) Data from which to compute the covariance estimate

  • assume_centered : bool, default=False If True, data will not be centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False, data will be centered before computation.

Returns

  • covariance : ndarray of shape (n_features, n_features) Empirical covariance (Maximum Likelihood Estimator).

Examples

>>> from sklearn.covariance import empirical_covariance
>>> X = [[1,1,1],[1,1,1],[1,1,1],
...      [0,0,0],[0,0,0],[0,0,0]]
>>> empirical_covariance(X)
array([[0.25, 0.25, 0.25],
       [0.25, 0.25, 0.25],
       [0.25, 0.25, 0.25]])

expit

function expit
val expit :
  ?out:Py.Object.t ->
  ?where:Py.Object.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  unit ->
  [>`ArrayLike] Np.Obj.t

expit(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])

expit(x)

Expit (a.k.a. logistic sigmoid) ufunc for ndarrays.

The expit function, also known as the logistic sigmoid function, is defined as expit(x) = 1/(1+exp(-x)). It is the inverse of the logit function.

Parameters

  • x : ndarray The ndarray to apply expit to element-wise.

Returns

  • out : ndarray An ndarray of the same shape as x. Its entries are expit of the corresponding entry of x.

See Also

logit

Notes

As a ufunc expit takes a number of optional keyword arguments. For more information see ufuncs <https://docs.scipy.org/doc/numpy/reference/ufuncs.html>_

.. versionadded:: 0.10.0

Examples

>>> from scipy.special import expit, logit
>>> expit([-np.inf, -1.5, 0, 1.5, np.inf])
array([ 0.        ,  0.18242552,  0.5       ,  0.81757448,  1.        ])

logit is the inverse of expit:

>>> logit(expit([-2.5, 0, 3.1, 5.0]))
array([-2.5,  0. ,  3.1,  5. ])

Plot expit(x) for x in [-6, 6]:

>>> import matplotlib.pyplot as plt
>>> x = np.linspace(-6, 6, 121)
>>> y = expit(x)
>>> plt.plot(x, y)
>>> plt.grid()
>>> plt.xlim(-6, 6)
>>> plt.xlabel('x')
>>> plt.title('expit(x)')
>>> plt.show()

ledoit_wolf

function ledoit_wolf
val ledoit_wolf :
  ?assume_centered:bool ->
  ?block_size:int ->
  x:[>`ArrayLike] Np.Obj.t ->
  unit ->
  ([>`ArrayLike] Np.Obj.t * float)

Estimates the shrunk Ledoit-Wolf covariance matrix.

Read more in the :ref:User Guide <shrunk_covariance>.

Parameters

  • X : array-like of shape (n_samples, n_features) Data from which to compute the covariance estimate

  • assume_centered : bool, default=False If True, data will not be centered before computation. Useful to work with data whose mean is significantly equal to zero but is not exactly zero. If False, data will be centered before computation.

  • block_size : int, default=1000 Size of the blocks into which the covariance matrix will be split. This is purely a memory optimization and does not affect results.

Returns

  • shrunk_cov : ndarray of shape (n_features, n_features) Shrunk covariance.

  • shrinkage : float Coefficient in the convex combination used for the computation of the shrunk estimate.

Notes

The regularized (shrunk) covariance is:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features

shrunk_covariance

function shrunk_covariance
val shrunk_covariance :
  ?shrinkage:float ->
  emp_cov:[>`ArrayLike] Np.Obj.t ->
  unit ->
  [>`ArrayLike] Np.Obj.t

Calculates a covariance matrix shrunk on the diagonal

Read more in the :ref:User Guide <shrunk_covariance>.

Parameters

  • emp_cov : array-like of shape (n_features, n_features) Covariance matrix to be shrunk

  • shrinkage : float, default=0.1 Coefficient in the convex combination used for the computation of the shrunk estimate. Range is [0, 1].

Returns

  • shrunk_cov : ndarray of shape (n_features, n_features) Shrunk covariance.

Notes

The regularized (shrunk) covariance is given by:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features

softmax

function softmax
val softmax :
  ?copy:bool ->
  x:[>`ArrayLike] Np.Obj.t ->
  unit ->
  [>`ArrayLike] Np.Obj.t

Calculate the softmax function.

The softmax function is calculated by np.exp(X) / np.sum(np.exp(X), axis=1)

This will cause overflow when large values are exponentiated. Hence the largest value in each row is subtracted from each data point to prevent this.

Parameters

  • X : array-like of floats, shape (M, N) Argument to the logistic function

  • copy : bool, optional Copy X or not.

Returns

  • out : array, shape (M, N) Softmax function evaluated at every point in x

unique_labels

function unique_labels
val unique_labels :
  Py.Object.t list ->
  [>`ArrayLike] Np.Obj.t

Extract an ordered array of unique labels

We don't allow: - mix of multilabel and multiclass (single label) targets - mix of label indicator matrix and anything else, because there are no explicit labels) - mix of label indicator matrices of different sizes - mix of string and integer labels

At the moment, we also don't allow 'multiclass-multioutput' input type.

Parameters

  • *ys : array-likes

Returns

  • out : numpy array of shape [n_unique_labels] An ordered array of unique labels.

Examples

>>> from sklearn.utils.multiclass import unique_labels
>>> unique_labels([3, 5, 5, 5, 7, 7])
array([3, 5, 7])
>>> unique_labels([1, 2, 3, 4], [2, 2, 3, 4])
array([1, 2, 3, 4])
>>> unique_labels([1, 2, 10], [5, 11])
array([ 1,  2,  5, 10, 11])