Discriminant analysis

LinearClassifierMixin¶

Module Sklearn.Discriminant_analysis.LinearClassifierMixin wraps Python class sklearn.discriminant_analysis.LinearClassifierMixin.

type t

create¶

constructor and attributes create

val create :
  unit ->
  t

Mixin for linear classifiers.

Handles prediction for sparse and dense X.

decision_function¶

method decision_function

val decision_function :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Predict confidence scores for samples.

The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters

X : array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns

array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes) Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

predict¶

method predict

val predict :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Predict class labels for samples in X.

Parameters

X : array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns

C : array, shape [n_samples] Predicted class label per sample.

score¶

method score

val score :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  float

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

X : array-like of shape (n_samples, n_features) Test samples.
y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X.
sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

score : float Mean accuracy of self.predict(X) wrt. y.

to_string¶

method to_string

val to_string: t -> string

Print the object to a human-readable representation.

show¶

method show

val show: t -> string

Print the object to a human-readable representation.

pp¶

method pp

val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

LinearDiscriminantAnalysis¶

Module Sklearn.Discriminant_analysis.LinearDiscriminantAnalysis wraps Python class sklearn.discriminant_analysis.LinearDiscriminantAnalysis.

type t

create¶

constructor and attributes create

val create :
  ?solver:[`Svd | `Lsqr | `Eigen] ->
  ?shrinkage:[`F of float | `Auto] ->
  ?priors:[>`ArrayLike] Np.Obj.t ->
  ?n_components:int ->
  ?store_covariance:bool ->
  ?tol:float ->
  unit ->
  t

Linear Discriminant Analysis

A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.

The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.

The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions, using the transform method.

.. versionadded:: 0.17 LinearDiscriminantAnalysis.

Read more in the :ref:User Guide <lda_qda>.

Parameters

solver : {'svd', 'lsqr', 'eigen'}, default='svd' Solver to use, possible values:
- 'svd': Singular value decomposition (default). Does not compute the covariance matrix, therefore this solver is recommended for data with a large number of features.
- 'lsqr': Least squares solution, can be combined with shrinkage.
- 'eigen': Eigenvalue decomposition, can be combined with shrinkage.
shrinkage : 'auto' or float, default=None Shrinkage parameter, possible values:
- None: no shrinkage (default).
- 'auto': automatic shrinkage using the Ledoit-Wolf lemma.
- float between 0 and 1: fixed shrinkage parameter.
Note that shrinkage works only with 'lsqr' and 'eigen' solvers.
priors : array-like of shape (n_classes,), default=None The class prior probabilities. By default, the class proportions are inferred from the training data.
n_components : int, default=None Number of components (<= min(n_classes - 1, n_features)) for dimensionality reduction. If None, will be set to min(n_classes - 1, n_features). This parameter only affects the transform method.
store_covariance : bool, default=False If True, explicitely compute the weighted within-class covariance matrix when solver is 'svd'. The matrix is always computed and stored for the other solvers.

.. versionadded:: 0.17
tol : float, default=1.0e-4 Absolute threshold for a singular value of X to be considered significant, used to estimate the rank of X. Dimensions whose singular values are non-significant are discarded. Only used if solver is 'svd'.

.. versionadded:: 0.17

Attributes

coef_ : ndarray of shape (n_features,) or (n_classes, n_features) Weight vector(s).
intercept_ : ndarray of shape (n_classes,) Intercept term.
covariance_ : array-like of shape (n_features, n_features) Weighted within-class covariance matrix. It corresponds to sum_k prior_k * C_k where C_k is the covariance matrix of the samples in class k. The C_k are estimated using the (potentially shrunk) biased estimator of covariance. If solver is 'svd', only exists when store_covariance is True.
explained_variance_ratio_ : ndarray of shape (n_components,) Percentage of variance explained by each of the selected components. If n_components is not set then all components are stored and the sum of explained variances is equal to 1.0. Only available when eigen or svd solver is used.
means_ : array-like of shape (n_classes, n_features) Class-wise means.
priors_ : array-like of shape (n_classes,) Class priors (sum to 1).
scalings_ : array-like of shape (rank, n_classes - 1) Scaling of the features in the space spanned by the class centroids. Only available for 'svd' and 'eigen' solvers.
xbar_ : array-like of shape (n_features,) Overall mean. Only present if solver is 'svd'.
classes_ : array-like of shape (n_classes,) Unique class labels.

Examples

>>> import numpy as np
>>> from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LinearDiscriminantAnalysis()
>>> clf.fit(X, y)
LinearDiscriminantAnalysis()
>>> print(clf.predict([[-0.8, -1]]))
[1]

decision_function¶

method decision_function

val decision_function :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Apply decision function to an array of samples.

The decision function is equal (up to a constant factor) to the log-posterior of the model, i.e. log p(y = k | x). In a binary classification setting this instead corresponds to the difference log p(y = 1 | x) - log p(y = 0 | x). See :ref:lda_qda_math.

Parameters

X : array-like of shape (n_samples, n_features) Array of samples (test vectors).

Returns

C : ndarray of shape (n_samples,) or (n_samples, n_classes) Decision function values related to each class, per sample. In the two-class case, the shape is (n_samples,), giving the log likelihood ratio of the positive class.

fit¶

method fit

val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Fit LinearDiscriminantAnalysis model according to the given training data and parameters.

.. versionchanged:: 0.19 store_covariance has been moved to main constructor.

.. versionchanged:: 0.19 tol has been moved to main constructor.

Parameters

X : array-like of shape (n_samples, n_features) Training data.
y : array-like of shape (n_samples,) Target values.

fit_transform¶

method fit_transform

val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None Target values.
**fit_params : dict Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params¶

method get_params

val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any Parameter names mapped to their values.

predict¶

method predict

val predict :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Predict class labels for samples in X.

Parameters

X : array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns

C : array, shape [n_samples] Predicted class label per sample.

predict_log_proba¶

method predict_log_proba

val predict_log_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Estimate log probability.

Parameters

X : array-like of shape (n_samples, n_features) Input data.

Returns

C : ndarray of shape (n_samples, n_classes) Estimated log probabilities.

predict_proba¶

method predict_proba

val predict_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Estimate probability.

Parameters

X : array-like of shape (n_samples, n_features) Input data.

Returns

C : ndarray of shape (n_samples, n_classes) Estimated probabilities.

score¶

method score

val score :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  float

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

X : array-like of shape (n_samples, n_features) Test samples.
y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X.
sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

score : float Mean accuracy of self.predict(X) wrt. y.

set_params¶

method set_params

val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

**params : dict Estimator parameters.

Returns

self : object Estimator instance.

transform¶

method transform

val transform :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Project data to maximize class separation.

Parameters

X : array-like of shape (n_samples, n_features) Input data.

Returns

X_new : ndarray of shape (n_samples, n_components) Transformed data.

coef_¶

attribute coef_

val coef_ : t -> [>`ArrayLike] Np.Obj.t
val coef_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

intercept_¶

attribute intercept_

val intercept_ : t -> [>`ArrayLike] Np.Obj.t
val intercept_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

covariance_¶

attribute covariance_

val covariance_ : t -> [>`ArrayLike] Np.Obj.t
val covariance_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

explained_variance_ratio_¶

attribute explained_variance_ratio_

val explained_variance_ratio_ : t -> [>`ArrayLike] Np.Obj.t
val explained_variance_ratio_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

means_¶

attribute means_

val means_ : t -> [>`ArrayLike] Np.Obj.t
val means_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

priors_¶

attribute priors_

val priors_ : t -> [>`ArrayLike] Np.Obj.t
val priors_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

scalings_¶

attribute scalings_

val scalings_ : t -> [>`ArrayLike] Np.Obj.t
val scalings_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

xbar_¶

attribute xbar_

val xbar_ : t -> [>`ArrayLike] Np.Obj.t
val xbar_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

classes_¶

attribute classes_

val classes_ : t -> [>`ArrayLike] Np.Obj.t
val classes_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string¶

method to_string

val to_string: t -> string

Print the object to a human-readable representation.

show¶

method show

val show: t -> string

Print the object to a human-readable representation.

pp¶

method pp

val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

QuadraticDiscriminantAnalysis¶

Module Sklearn.Discriminant_analysis.QuadraticDiscriminantAnalysis wraps Python class sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis.

type t

create¶

constructor and attributes create

val create :
  ?priors:[>`ArrayLike] Np.Obj.t ->
  ?reg_param:float ->
  ?store_covariance:bool ->
  ?tol:float ->
  unit ->
  t

Quadratic Discriminant Analysis

A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.

The model fits a Gaussian density to each class.

.. versionadded:: 0.17 QuadraticDiscriminantAnalysis

Read more in the :ref:User Guide <lda_qda>.

Parameters

priors : ndarray of shape (n_classes,), default=None Class priors. By default, the class proportions are inferred from the training data.
reg_param : float, default=0.0 Regularizes the per-class covariance estimates by transforming S2 as S2 = (1 - reg_param) * S2 + reg_param * np.eye(n_features), where S2 corresponds to the scaling_ attribute of a given class.
store_covariance : bool, default=False If True, the class covariance matrices are explicitely computed and stored in the self.covariance_ attribute.

.. versionadded:: 0.17
tol : float, default=1.0e-4 Absolute threshold for a singular value to be considered significant, used to estimate the rank of Xk where Xk is the centered matrix of samples in class k. This parameter does not affect the predictions. It only controls a warning that is raised when features are considered to be colinear.

.. versionadded:: 0.17

Attributes

covariance_ : list of len n_classes of ndarray of shape (n_features, n_features) For each class, gives the covariance matrix estimated using the samples of that class. The estimations are unbiased. Only present if store_covariance is True.
means_ : array-like of shape (n_classes, n_features) Class-wise means.
priors_ : array-like of shape (n_classes,) Class priors (sum to 1).
rotations_ : list of len n_classes of ndarray of shape (n_features, n_k) For each class k an array of shape (n_features, n_k), where n_k = min(n_features, number of elements in class k) It is the rotation of the Gaussian distribution, i.e. its principal axis. It corresponds to V, the matrix of eigenvectors coming from the SVD of Xk = U S Vt where Xk is the centered matrix of samples from class k.
scalings_ : list of len n_classes of ndarray of shape (n_k,) For each class, contains the scaling of the Gaussian distributions along its principal axes, i.e. the variance in the rotated coordinate system. It corresponds to S^2 / (n_samples - 1), where S is the diagonal matrix of singular values from the SVD of Xk, where Xk is the centered matrix of samples from class k.
classes_ : ndarray of shape (n_classes,) Unique class labels.

Examples

>>> from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = QuadraticDiscriminantAnalysis()
>>> clf.fit(X, y)
QuadraticDiscriminantAnalysis()
>>> print(clf.predict([[-0.8, -1]]))
[1]

decision_function¶

method decision_function

val decision_function :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Apply decision function to an array of samples.

The decision function is equal (up to a constant factor) to the log-posterior of the model, i.e. log p(y = k | x). In a binary classification setting this instead corresponds to the difference log p(y = 1 | x) - log p(y = 0 | x). See :ref:lda_qda_math.

Parameters

X : array-like of shape (n_samples, n_features) Array of samples (test vectors).

Returns

C : ndarray of shape (n_samples,) or (n_samples, n_classes) Decision function values related to each class, per sample. In the two-class case, the shape is (n_samples,), giving the log likelihood ratio of the positive class.

fit¶

method fit

val fit :
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Fit the model according to the given training data and parameters.

.. versionchanged:: 0.19
   ``store_covariances`` has been moved to main constructor as
   ``store_covariance``

.. versionchanged:: 0.19
   ``tol`` has been moved to main constructor.

Parameters

X : array-like of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.
y : array-like of shape (n_samples,) Target values (integers)

get_params¶

method get_params

val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any Parameter names mapped to their values.

predict¶

method predict

val predict :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Perform classification on an array of test vectors X.

The predicted class C for each sample in X is returned.

Parameters

X : array-like of shape (n_samples, n_features)

Returns

C : ndarray of shape (n_samples,)

predict_log_proba¶

method predict_log_proba

val predict_log_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Return log of posterior probabilities of classification.

Parameters

X : array-like of shape (n_samples, n_features) Array of samples/test vectors.

Returns

C : ndarray of shape (n_samples, n_classes) Posterior log-probabilities of classification per class.

predict_proba¶

method predict_proba

val predict_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Return posterior probabilities of classification.

Parameters

X : array-like of shape (n_samples, n_features) Array of samples/test vectors.

Returns

C : ndarray of shape (n_samples, n_classes) Posterior probabilities of classification per class.

score¶

method score

val score :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  float

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

X : array-like of shape (n_samples, n_features) Test samples.
y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X.
sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

score : float Mean accuracy of self.predict(X) wrt. y.

set_params¶

method set_params

val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

**params : dict Estimator parameters.

Returns

self : object Estimator instance.

covariance_¶

attribute covariance_

val covariance_ : t -> Py.Object.t
val covariance_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

means_¶

attribute means_

val means_ : t -> [>`ArrayLike] Np.Obj.t
val means_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

priors_¶

attribute priors_

val priors_ : t -> [>`ArrayLike] Np.Obj.t
val priors_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

rotations_¶

attribute rotations_

val rotations_ : t -> Py.Object.t
val rotations_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

scalings_¶

attribute scalings_

val scalings_ : t -> Py.Object.t
val scalings_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

classes_¶

attribute classes_

val classes_ : t -> [>`ArrayLike] Np.Obj.t
val classes_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string¶

method to_string

val to_string: t -> string

Print the object to a human-readable representation.

show¶

method show

val show: t -> string

Print the object to a human-readable representation.

pp¶

method pp

val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

StandardScaler¶

Module Sklearn.Discriminant_analysis.StandardScaler wraps Python class sklearn.discriminant_analysis.StandardScaler.

type t

create¶

constructor and attributes create

val create :
  ?copy:bool ->
  ?with_mean:bool ->
  ?with_std:bool ->
  unit ->
  t

Standardize features by removing the mean and scaling to unit variance

The standard score of a sample x is calculated as:

z = (x - u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using :meth:transform.

Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

This scaler can also be applied to sparse CSR or CSC matrices by passing with_mean=False to avoid breaking the sparsity structure of the data.

Read more in the :ref:User Guide <preprocessing_scaler>.

Parameters

copy : boolean, optional, default True If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.
with_mean : boolean, True by default If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.
with_std : boolean, True by default If True, scale the data to unit variance (or equivalently, unit standard deviation).

Attributes

scale_ : ndarray or None, shape (n_features,) Per feature relative scaling of the data. This is calculated using np.sqrt(var_). Equal to None when with_std=False.

.. versionadded:: 0.17 scale_
mean_ : ndarray or None, shape (n_features,) The mean value for each feature in the training set. Equal to None when with_mean=False.
var_ : ndarray or None, shape (n_features,) The variance for each feature in the training set. Used to compute scale_. Equal to None when with_std=False.
n_samples_seen_ : int or array, shape (n_features,) The number of samples processed by the estimator for each feature. If there are not missing samples, the n_samples_seen will be an integer, otherwise it will be an array. Will be reset on new calls to fit, but increments across partial_fit calls.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> scaler = StandardScaler()
>>> print(scaler.fit(data))
StandardScaler()
>>> print(scaler.mean_)
[0.5 0.5]
>>> print(scaler.transform(data))
[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]]
>>> print(scaler.transform([[2, 2]]))
[[3. 3.]]

Notes

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

We use a biased estimator for the standard deviation, equivalent to numpy.std(x, ddof=0). Note that the choice of ddof is unlikely to affect model performance.

For a comparison of the different scalers, transformers, and normalizers,

**see :ref:examples/preprocessing/plot_all_scaling.py** <sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>.

fit¶

method fit

val fit :
  ?y:Py.Object.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Compute the mean and std to be used for later scaling.

Parameters

X : {array-like, sparse matrix}, shape [n_samples, n_features] The data used to compute the mean and standard deviation used for later scaling along the features axis.

y Ignored

fit_transform¶

method fit_transform

val fit_transform :
  ?y:[>`ArrayLike] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None Target values.
**fit_params : dict Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

get_params¶

method get_params

val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any Parameter names mapped to their values.

inverse_transform¶

method inverse_transform

val inverse_transform :
  ?copy:bool ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Scale back the data to the original representation

Parameters

X : array-like, shape [n_samples, n_features] The data used to scale along the features axis.
copy : bool, optional (default: None) Copy the input X or not.

Returns

X_tr : array-like, shape [n_samples, n_features] Transformed array.

partial_fit¶

method partial_fit

val partial_fit :
  ?y:Py.Object.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Online computation of mean and std on X for later scaling.

All of X is processed as a single batch. This is intended for cases

when :meth:fit is not feasible due to very large number of n_samples or because X is read from a continuous stream.

The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. 'Algorithms for computing the sample variance: Analysis and recommendations.' The American Statistician 37.3 (1983): 242-247:

Parameters

X : {array-like, sparse matrix}, shape [n_samples, n_features] The data used to compute the mean and standard deviation used for later scaling along the features axis.
y : None Ignored.

Returns

self : object Transformer instance.

set_params¶

method set_params

val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

**params : dict Estimator parameters.

Returns

self : object Estimator instance.

transform¶

method transform

val transform :
  ?copy:bool ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Perform standardization by centering and scaling

Parameters

X : array-like, shape [n_samples, n_features] The data used to scale along the features axis.
copy : bool, optional (default: None) Copy the input X or not.

scale_¶

attribute scale_

val scale_ : t -> [>`ArrayLike] Np.Obj.t
val scale_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

mean_¶

attribute mean_

val mean_ : t -> [>`ArrayLike] Np.Obj.t
val mean_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

var_¶

attribute var_

val var_ : t -> [>`ArrayLike] Np.Obj.t
val var_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

n_samples_seen_¶

attribute n_samples_seen_

val n_samples_seen_ : t -> Py.Object.t
val n_samples_seen_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string¶

method to_string

val to_string: t -> string

Print the object to a human-readable representation.

show¶

method show

val show: t -> string

Print the object to a human-readable representation.

pp¶

method pp

val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

check_array¶

function check_array

val check_array :
  ?accept_sparse:[`S of string | `StringList of string list | `Bool of bool] ->
  ?accept_large_sparse:bool ->
  ?dtype:[`Dtypes of Np.Dtype.t list | `S of string | `Dtype of Np.Dtype.t | `None] ->
  ?order:[`F | `C] ->
  ?copy:bool ->
  ?force_all_finite:[`Allow_nan | `Bool of bool] ->
  ?ensure_2d:bool ->
  ?allow_nd:bool ->
  ?ensure_min_samples:int ->
  ?ensure_min_features:int ->
  ?estimator:[>`BaseEstimator] Np.Obj.t ->
  array:Py.Object.t ->
  unit ->
  Py.Object.t

Input validation on an array, list, sparse matrix or similar.

By default, the input is checked to be a non-empty 2D array containing only finite values. If the dtype of the array is object, attempt converting to float, raising on failure.

Parameters

array : object Input object to check / convert.
accept_sparse : string, boolean or list/tuple of strings (default=False) String[s] representing allowed sparse matrix formats, such as 'csc', 'csr', etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.
accept_large_sparse : bool (default=True) If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse=False will cause it to be accepted only if its indices are stored with a 32-bit dtype.

.. versionadded:: 0.20
dtype : string, type, list of types or None (default='numeric') Data type of result. If None, the dtype of the input is preserved. If 'numeric', dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.
order : 'F', 'C' or None (default=None) Whether an array will be forced to be fortran or c-style. When order is None (default), then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array.
copy : boolean (default=False) Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.
force_all_finite : boolean or 'allow-nan', (default=True) Whether to raise an error on np.inf, np.nan, pd.NA in array. The possibilities are:
- True: Force all values of array to be finite.
- False: accepts np.inf, np.nan, pd.NA in array.
- 'allow-nan': accepts only np.nan and pd.NA values in array. Values cannot be infinite.
.. versionadded:: 0.20 force_all_finite accepts the string 'allow-nan'.

.. versionchanged:: 0.23 Accepts pd.NA and converts it into np.nan
ensure_2d : boolean (default=True) Whether to raise a value error if array is not 2D.
allow_nd : boolean (default=False) Whether to allow array.ndim > 2.
ensure_min_samples : int (default=1) Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.
ensure_min_features : int (default=1) Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and ensure_2d is True. Setting to 0 disables this check.
estimator : str or estimator instance (default=None) If passed, include the name of the estimator in warning messages.

Returns

array_converted : object The converted and validated array.

check_classification_targets¶

function check_classification_targets

val check_classification_targets :
  [>`ArrayLike] Np.Obj.t ->
  Py.Object.t

Ensure that target y is of a non-regression type.

Only the following target types (as defined in type_of_target) are allowed: 'binary', 'multiclass', 'multiclass-multioutput', 'multilabel-indicator', 'multilabel-sequences'

Parameters

y : array-like

check_is_fitted¶

function check_is_fitted

val check_is_fitted :
  ?attributes:[`Arr of [>`ArrayLike] Np.Obj.t | `S of string | `StringList of string list] ->
  ?msg:string ->
  ?all_or_any:[`Callable of Py.Object.t | `PyObject of Py.Object.t] ->
  estimator:[>`BaseEstimator] Np.Obj.t ->
  unit ->
  Py.Object.t

Perform is_fitted validation for estimator.

Checks if the estimator is fitted by verifying the presence of fitted attributes (ending with a trailing underscore) and otherwise raises a NotFittedError with the given message.

This utility is meant to be used internally by estimators themselves, typically in their own predict / transform methods.

Parameters

estimator : estimator instance. estimator instance for which the check is performed.
attributes : str, list or tuple of str, default=None Attribute name(s) given as string or a list/tuple of strings
Eg.: ['coef_', 'estimator_', ...], 'coef_'

If None, estimator is considered fitted if there exist an attribute that ends with a underscore and does not start with double underscore.
msg : string The default error message is, 'This %(name)s instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.'

For custom messages if '%(name)s' is present in the message string, it is substituted for the estimator name.
Eg. : 'Estimator, %(name)s, must be fitted before sparsifying'.
all_or_any : callable, {all, any}, default all Specify whether all or any of the given attributes must exist.

Returns

None

Raises

NotFittedError If the attributes are not found.

empirical_covariance¶

function empirical_covariance

val empirical_covariance :
  ?assume_centered:bool ->
  x:[>`ArrayLike] Np.Obj.t ->
  unit ->
  [>`ArrayLike] Np.Obj.t

Computes the Maximum likelihood covariance estimator

Parameters

X : ndarray of shape (n_samples, n_features) Data from which to compute the covariance estimate
assume_centered : bool, default=False If True, data will not be centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False, data will be centered before computation.

Returns

covariance : ndarray of shape (n_features, n_features) Empirical covariance (Maximum Likelihood Estimator).

Examples

>>> from sklearn.covariance import empirical_covariance
>>> X = [[1,1,1],[1,1,1],[1,1,1],
...      [0,0,0],[0,0,0],[0,0,0]]
>>> empirical_covariance(X)
array([[0.25, 0.25, 0.25],
       [0.25, 0.25, 0.25],
       [0.25, 0.25, 0.25]])

expit¶

function expit

val expit :
  ?out:Py.Object.t ->
  ?where:Py.Object.t ->
  x:[>`ArrayLike] Np.Obj.t ->
  unit ->
  [>`ArrayLike] Np.Obj.t

expit(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])

expit(x)

Expit (a.k.a. logistic sigmoid) ufunc for ndarrays.

The expit function, also known as the logistic sigmoid function, is defined as expit(x) = 1/(1+exp(-x)). It is the inverse of the logit function.

Parameters

x : ndarray The ndarray to apply expit to element-wise.

Returns

out : ndarray An ndarray of the same shape as x. Its entries are expit of the corresponding entry of x.

Notes

As a ufunc expit takes a number of optional keyword arguments. For more information see ufuncs <https://docs.scipy.org/doc/numpy/reference/ufuncs.html>_

.. versionadded:: 0.10.0

Examples

>>> from scipy.special import expit, logit

>>> expit([-np.inf, -1.5, 0, 1.5, np.inf])
array([ 0.        ,  0.18242552,  0.5       ,  0.81757448,  1.        ])

logit is the inverse of expit:

>>> logit(expit([-2.5, 0, 3.1, 5.0]))
array([-2.5,  0. ,  3.1,  5. ])

Plot expit(x) for x in [-6, 6]:

>>> import matplotlib.pyplot as plt
>>> x = np.linspace(-6, 6, 121)
>>> y = expit(x)
>>> plt.plot(x, y)
>>> plt.grid()
>>> plt.xlim(-6, 6)
>>> plt.xlabel('x')
>>> plt.title('expit(x)')
>>> plt.show()

ledoit_wolf¶

function ledoit_wolf

val ledoit_wolf :
  ?assume_centered:bool ->
  ?block_size:int ->
  x:[>`ArrayLike] Np.Obj.t ->
  unit ->
  ([>`ArrayLike] Np.Obj.t * float)

Estimates the shrunk Ledoit-Wolf covariance matrix.

Read more in the :ref:User Guide <shrunk_covariance>.

Parameters

X : array-like of shape (n_samples, n_features) Data from which to compute the covariance estimate
assume_centered : bool, default=False If True, data will not be centered before computation. Useful to work with data whose mean is significantly equal to zero but is not exactly zero. If False, data will be centered before computation.
block_size : int, default=1000 Size of the blocks into which the covariance matrix will be split. This is purely a memory optimization and does not affect results.

Returns

shrunk_cov : ndarray of shape (n_features, n_features) Shrunk covariance.
shrinkage : float Coefficient in the convex combination used for the computation of the shrunk estimate.

Notes

The regularized (shrunk) covariance is:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features

shrunk_covariance¶

function shrunk_covariance

val shrunk_covariance :
  ?shrinkage:float ->
  emp_cov:[>`ArrayLike] Np.Obj.t ->
  unit ->
  [>`ArrayLike] Np.Obj.t

Calculates a covariance matrix shrunk on the diagonal

Read more in the :ref:User Guide <shrunk_covariance>.

Parameters

emp_cov : array-like of shape (n_features, n_features) Covariance matrix to be shrunk
shrinkage : float, default=0.1 Coefficient in the convex combination used for the computation of the shrunk estimate. Range is [0, 1].

Returns

shrunk_cov : ndarray of shape (n_features, n_features) Shrunk covariance.

Notes

The regularized (shrunk) covariance is given by:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features

softmax¶

function softmax

val softmax :
  ?copy:bool ->
  x:[>`ArrayLike] Np.Obj.t ->
  unit ->
  [>`ArrayLike] Np.Obj.t

Calculate the softmax function.

The softmax function is calculated by np.exp(X) / np.sum(np.exp(X), axis=1)

This will cause overflow when large values are exponentiated. Hence the largest value in each row is subtracted from each data point to prevent this.

Parameters

X : array-like of floats, shape (M, N) Argument to the logistic function
copy : bool, optional Copy X or not.

Returns

out : array, shape (M, N) Softmax function evaluated at every point in x

unique_labels¶

function unique_labels

val unique_labels :
  Py.Object.t list ->
  [>`ArrayLike] Np.Obj.t

Extract an ordered array of unique labels

We don't allow: - mix of multilabel and multiclass (single label) targets - mix of label indicator matrix and anything else, because there are no explicit labels) - mix of label indicator matrices of different sizes - mix of string and integer labels

At the moment, we also don't allow 'multiclass-multioutput' input type.

Parameters

*ys : array-likes

Returns

out : numpy array of shape [n_unique_labels] An ordered array of unique labels.

Examples

>>> from sklearn.utils.multiclass import unique_labels
>>> unique_labels([3, 5, 5, 5, 7, 7])
array([3, 5, 7])
>>> unique_labels([1, 2, 3, 4], [2, 2, 3, 4])
array([1, 2, 3, 4])
>>> unique_labels([1, 2, 10], [5, 11])
array([ 1,  2,  5, 10, 11])

Discriminant analysis

LinearClassifierMixin¶

create¶

decision_function¶

Parameters

Returns

predict¶

Parameters

Returns

score¶

Parameters

Returns

to_string¶

show¶

pp¶

LinearDiscriminantAnalysis¶

create¶

Parameters

Attributes

See also

Examples

decision_function¶

Parameters

Returns

fit¶

Parameters

fit_transform¶

Parameters

Returns

get_params¶

Parameters

Returns

predict¶

Parameters

Returns

predict_log_proba¶

Parameters

Returns

predict_proba¶

Parameters

Returns

score¶

Parameters

Returns

set_params¶

Parameters

Returns

transform¶

Parameters

Returns

coef_¶

intercept_¶

covariance_¶

explained_variance_ratio_¶

means_¶

priors_¶

scalings_¶

xbar_¶

classes_¶

to_string¶

show¶

pp¶

QuadraticDiscriminantAnalysis¶

create¶

Parameters

Attributes

Examples

See also

decision_function¶

Parameters

Returns

fit¶

Parameters

get_params¶

Parameters

Returns

predict¶

Parameters

Returns

predict_log_proba¶