Skip to content

Dummy

DummyClassifier

Module Sklearn.​Dummy.​DummyClassifier wraps Python class sklearn.dummy.DummyClassifier.

type t

create

constructor and attributes create
val create :
  ?strategy:string ->
  ?random_state:int ->
  ?constant:[`Arr of [>`ArrayLike] Np.Obj.t | `S of string | `I of int] ->
  unit ->
  t

DummyClassifier is a classifier that makes predictions using simple rules.

This classifier is useful as a simple baseline to compare with other (real) classifiers. Do not use it for real problems.

Read more in the :ref:User Guide <dummy_estimators>.

.. versionadded:: 0.13

Parameters

  • strategy : str, default='stratified' Strategy to use to generate predictions.

    • 'stratified': generates predictions by respecting the training set's class distribution.
    • 'most_frequent': always predicts the most frequent label in the training set.
    • 'prior': always predicts the class that maximizes the class prior (like 'most_frequent') and predict_proba returns the class prior.
    • 'uniform': generates predictions uniformly at random.
    • 'constant': always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class

    .. versionchanged:: 0.22 The default value of strategy will change to 'prior' in version 0.24. Starting from version 0.22, a warning will be raised if strategy is not explicitly set.

    .. versionadded:: 0.17 Dummy Classifier now supports prior fitting strategy using parameter prior.

  • random_state : int, RandomState instance or None, optional, default=None Controls the randomness to generate the predictions when strategy='stratified' or strategy='uniform'. Pass an int for reproducible output across multiple function calls.

  • See :term:Glossary <random_state>.

  • constant : int or str or array-like of shape (n_outputs,) The explicit constant as predicted by the 'constant' strategy. This parameter is useful only for the 'constant' strategy.

Attributes

  • classes_ : array or list of array of shape (n_classes,) Class labels for each output.

  • n_classes_ : array or list of array of shape (n_classes,) Number of label for each output.

  • class_prior_ : array or list of array of shape (n_classes,) Probability of each class for each output.

  • n_outputs_ : int, Number of outputs.

  • sparse_output_ : bool, True if the array returned from predict is to be in sparse CSC format. Is automatically set to True if the input y is passed in sparse format.

Examples

>>> import numpy as np
>>> from sklearn.dummy import DummyClassifier
>>> X = np.array([-1, 1, 1, 1])
>>> y = np.array([0, 1, 1, 1])
>>> dummy_clf = DummyClassifier(strategy='most_frequent')
>>> dummy_clf.fit(X, y)
DummyClassifier(strategy='most_frequent')
>>> dummy_clf.predict(X)
array([1, 1, 1, 1])
>>> dummy_clf.score(X, y)
0.75

fit

method fit
val fit :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  x:Py.Object.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Fit the random classifier.

Parameters

  • X : {array-like, object with finite length or shape} Training data, requires length = n_samples

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs) Target values.

  • sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

  • self : object

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

predict

method predict
val predict :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Perform classification on test vectors X.

Parameters

  • X : {array-like, object with finite length or shape} Training data, requires length = n_samples

Returns

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs) Predicted target values for X.

predict_log_proba

method predict_log_proba
val predict_log_proba :
  x:Py.Object.t ->
  [> tag] Obj.t ->
  Py.Object.t

Return log probability estimates for the test vectors X.

Parameters

  • X : {array-like, object with finite length or shape} Training data, requires length = n_samples

Returns

  • P : array-like or list of array-like of shape (n_samples, n_classes) Returns the log probability of the sample for each class in the model, where classes are ordered arithmetically for each output.

predict_proba

method predict_proba
val predict_proba :
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Return probability estimates for the test vectors X.

Parameters

  • X : {array-like, object with finite length or shape} Training data, requires length = n_samples

Returns

  • P : array-like or list of array-lke of shape (n_samples, n_classes) Returns the probability of the sample for each class in the model, where classes are ordered arithmetically, for each output.

score

method score
val score :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  x:[`Arr of [>`ArrayLike] Np.Obj.t | `None] ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  float

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

  • X : {array-like, None} Test samples with shape = (n_samples, n_features) or None. Passing None as test samples gives the same result as passing real test samples, since DummyClassifier operates independently of the sampled observations.

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X.

  • sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

  • score : float Mean accuracy of self.predict(X) wrt. y.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

classes_

attribute classes_
val classes_ : t -> [>`ArrayLike] Np.Obj.t
val classes_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

n_classes_

attribute n_classes_
val n_classes_ : t -> Py.Object.t
val n_classes_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

class_prior_

attribute class_prior_
val class_prior_ : t -> Py.Object.t
val class_prior_opt : t -> (Py.Object.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

n_outputs_

attribute n_outputs_
val n_outputs_ : t -> int
val n_outputs_opt : t -> (int) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

sparse_output_

attribute sparse_output_
val sparse_output_ : t -> bool
val sparse_output_opt : t -> (bool) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

DummyRegressor

Module Sklearn.​Dummy.​DummyRegressor wraps Python class sklearn.dummy.DummyRegressor.

type t

create

constructor and attributes create
val create :
  ?strategy:string ->
  ?constant:[`Arr of [>`ArrayLike] Np.Obj.t | `F of float | `I of int] ->
  ?quantile:float ->
  unit ->
  t

DummyRegressor is a regressor that makes predictions using simple rules.

This regressor is useful as a simple baseline to compare with other (real) regressors. Do not use it for real problems.

Read more in the :ref:User Guide <dummy_estimators>.

.. versionadded:: 0.13

Parameters

  • strategy : str Strategy to use to generate predictions.

    • 'mean': always predicts the mean of the training set
    • 'median': always predicts the median of the training set
    • 'quantile': always predicts a specified quantile of the training set, provided with the quantile parameter.
    • 'constant': always predicts a constant value that is provided by the user.
  • constant : int or float or array-like of shape (n_outputs,) The explicit constant as predicted by the 'constant' strategy. This parameter is useful only for the 'constant' strategy.

  • quantile : float in [0.0, 1.0] The quantile to predict using the 'quantile' strategy. A quantile of 0.5 corresponds to the median, while 0.0 to the minimum and 1.0 to the maximum.

Attributes

  • constant_ : array, shape (1, n_outputs) Mean or median or quantile of the training targets or constant value given by the user.

  • n_outputs_ : int, Number of outputs.

Examples

>>> import numpy as np
>>> from sklearn.dummy import DummyRegressor
>>> X = np.array([1.0, 2.0, 3.0, 4.0])
>>> y = np.array([2.0, 3.0, 5.0, 10.0])
>>> dummy_regr = DummyRegressor(strategy='mean')
>>> dummy_regr.fit(X, y)
DummyRegressor()
>>> dummy_regr.predict(X)
array([5., 5., 5., 5.])
>>> dummy_regr.score(X, y)
0.0

fit

method fit
val fit :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  x:Py.Object.t ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  t

Fit the random regressor.

Parameters

  • X : {array-like, object with finite length or shape} Training data, requires length = n_samples

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs) Target values.

  • sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

  • self : object

get_params

method get_params
val get_params :
  ?deep:bool ->
  [> tag] Obj.t ->
  Dict.t

Get parameters for this estimator.

Parameters

  • deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any Parameter names mapped to their values.

predict

method predict
val predict :
  ?return_std:bool ->
  x:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  [>`ArrayLike] Np.Obj.t

Perform classification on test vectors X.

Parameters

  • X : {array-like, object with finite length or shape} Training data, requires length = n_samples

  • return_std : boolean, optional Whether to return the standard deviation of posterior prediction. All zeros in this case.

    .. versionadded:: 0.20

Returns

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs) Predicted target values for X.

  • y_std : array-like of shape (n_samples,) or (n_samples, n_outputs) Standard deviation of predictive distribution of query points.

score

method score
val score :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  x:[`Arr of [>`ArrayLike] Np.Obj.t | `None] ->
  y:[>`ArrayLike] Np.Obj.t ->
  [> tag] Obj.t ->
  float

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters

  • X : {array-like, None} Test samples with shape = (n_samples, n_features) or None. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator. Passing None as test samples gives the same result as passing real test samples, since DummyRegressor operates independently of the sampled observations.

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs) True values for X.

  • sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

  • score : float R^2 of self.predict(X) wrt. y.

set_params

method set_params
val set_params :
  ?params:(string * Py.Object.t) list ->
  [> tag] Obj.t ->
  t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

  • **params : dict Estimator parameters.

Returns

  • self : object Estimator instance.

constant_

attribute constant_
val constant_ : t -> [>`ArrayLike] Np.Obj.t
val constant_opt : t -> ([>`ArrayLike] Np.Obj.t) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

n_outputs_

attribute n_outputs_
val n_outputs_ : t -> int
val n_outputs_opt : t -> (int) option

This attribute is documented in create above. The first version raises Not_found if the attribute is None. The _opt version returns an option.

to_string

method to_string
val to_string: t -> string

Print the object to a human-readable representation.

show

method show
val show: t -> string

Print the object to a human-readable representation.

pp

method pp
val pp: Format.formatter -> t -> unit

Pretty-print the object to a formatter.

check_array

function check_array
val check_array :
  ?accept_sparse:[`S of string | `StringList of string list | `Bool of bool] ->
  ?accept_large_sparse:bool ->
  ?dtype:[`Dtypes of Np.Dtype.t list | `S of string | `Dtype of Np.Dtype.t | `None] ->
  ?order:[`F | `C] ->
  ?copy:bool ->
  ?force_all_finite:[`Allow_nan | `Bool of bool] ->
  ?ensure_2d:bool ->
  ?allow_nd:bool ->
  ?ensure_min_samples:int ->
  ?ensure_min_features:int ->
  ?estimator:[>`BaseEstimator] Np.Obj.t ->
  array:Py.Object.t ->
  unit ->
  Py.Object.t

Input validation on an array, list, sparse matrix or similar.

By default, the input is checked to be a non-empty 2D array containing only finite values. If the dtype of the array is object, attempt converting to float, raising on failure.

Parameters

  • array : object Input object to check / convert.

  • accept_sparse : string, boolean or list/tuple of strings (default=False) String[s] representing allowed sparse matrix formats, such as 'csc', 'csr', etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.

  • accept_large_sparse : bool (default=True) If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse=False will cause it to be accepted only if its indices are stored with a 32-bit dtype.

    .. versionadded:: 0.20

  • dtype : string, type, list of types or None (default='numeric') Data type of result. If None, the dtype of the input is preserved. If 'numeric', dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.

  • order : 'F', 'C' or None (default=None) Whether an array will be forced to be fortran or c-style. When order is None (default), then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array.

  • copy : boolean (default=False) Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

  • force_all_finite : boolean or 'allow-nan', (default=True) Whether to raise an error on np.inf, np.nan, pd.NA in array. The possibilities are:

    • True: Force all values of array to be finite.
    • False: accepts np.inf, np.nan, pd.NA in array.
    • 'allow-nan': accepts only np.nan and pd.NA values in array. Values cannot be infinite.

    .. versionadded:: 0.20 force_all_finite accepts the string 'allow-nan'.

    .. versionchanged:: 0.23 Accepts pd.NA and converts it into np.nan

  • ensure_2d : boolean (default=True) Whether to raise a value error if array is not 2D.

  • allow_nd : boolean (default=False) Whether to allow array.ndim > 2.

  • ensure_min_samples : int (default=1) Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.

  • ensure_min_features : int (default=1) Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and ensure_2d is True. Setting to 0 disables this check.

  • estimator : str or estimator instance (default=None) If passed, include the name of the estimator in warning messages.

Returns

  • array_converted : object The converted and validated array.

check_consistent_length

function check_consistent_length
val check_consistent_length :
  Py.Object.t list ->
  Py.Object.t

Check that all arrays have consistent first dimensions.

Checks whether all objects in arrays have the same shape or length.

Parameters

  • *arrays : list or tuple of input objects. Objects that will be checked for consistent length.

check_is_fitted

function check_is_fitted
val check_is_fitted :
  ?attributes:[`Arr of [>`ArrayLike] Np.Obj.t | `S of string | `StringList of string list] ->
  ?msg:string ->
  ?all_or_any:[`Callable of Py.Object.t | `PyObject of Py.Object.t] ->
  estimator:[>`BaseEstimator] Np.Obj.t ->
  unit ->
  Py.Object.t

Perform is_fitted validation for estimator.

Checks if the estimator is fitted by verifying the presence of fitted attributes (ending with a trailing underscore) and otherwise raises a NotFittedError with the given message.

This utility is meant to be used internally by estimators themselves, typically in their own predict / transform methods.

Parameters

  • estimator : estimator instance. estimator instance for which the check is performed.

  • attributes : str, list or tuple of str, default=None Attribute name(s) given as string or a list/tuple of strings

  • Eg.: ['coef_', 'estimator_', ...], 'coef_'

    If None, estimator is considered fitted if there exist an attribute that ends with a underscore and does not start with double underscore.

  • msg : string The default error message is, 'This %(name)s instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.'

    For custom messages if '%(name)s' is present in the message string, it is substituted for the estimator name.

  • Eg. : 'Estimator, %(name)s, must be fitted before sparsifying'.

  • all_or_any : callable, {all, any}, default all Specify whether all or any of the given attributes must exist.

Returns

None

Raises

NotFittedError If the attributes are not found.

check_random_state

function check_random_state
val check_random_state :
  [`Optional of [`I of int | `None] | `RandomState of Py.Object.t] ->
  Py.Object.t

Turn seed into a np.random.RandomState instance

Parameters

  • seed : None | int | instance of RandomState If seed is None, return the RandomState singleton used by np.random. If seed is an int, return a new RandomState instance seeded with seed. If seed is already a RandomState instance, return it. Otherwise raise ValueError.

class_distribution

function class_distribution
val class_distribution :
  ?sample_weight:[>`ArrayLike] Np.Obj.t ->
  y:[`Arr of [>`ArrayLike] Np.Obj.t | `Sparse_matrix_of_size of Py.Object.t] ->
  unit ->
  (Py.Object.t * Py.Object.t * Py.Object.t)

Compute class priors from multioutput-multiclass target data

Parameters

  • y : array like or sparse matrix of size (n_samples, n_outputs) The labels for each example.

  • sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns

  • classes : list of size n_outputs of arrays of size (n_classes,) List of classes for each column.

  • n_classes : list of integers of size n_outputs Number of classes in each column

  • class_prior : list of size n_outputs of arrays of size (n_classes,) Class distribution of each column.