scikit-learn for OCaml, version sk0.23-0.3.1
ocaml-sklearn allows using Python's scikit-learn machine learning library from OCaml.
As of version 0.22-0.3.0, most classes and functions from scikit-learn and Numpy should be usable. Many examples have been ported from Python to OCaml successfully (see below). However, the APIs have not yet proved stable and will probably evolve in the next releases.
Example : support vector regression with RBF kernel¶
module Np = Np.Numpy let n_samples, n_features = 10, 5 in Np.Random.seed 0; let y = Np.Random.uniform ~size:[n_samples] () in let x = Np.Random.uniform ~size:[n_samples; n_features] () in let open Sklearn.Svm in let clf = SVR.create ~c:1.0 ~epsilon:0.2 () in Format.printf "%a\n" SVR.pp @@ SVR.fit clf ~x ~y; Format.printf "%a\n" Np.pp @@ SVR.support_vectors_ clf;;
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.2, gamma='scale', kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False) [[0.14509922 0.16277752 0.99033894 0.84013554 0.96508279] [0.8865312 0.80655193 0.07459775 0.36058768 0.22130337] [0.21844203 0.09612442 0.49908686 0.1154579 0.98202969] [0.07306658 0.97225754 0.20558949 0.16423512 0.57400651] [0.08153976 0.41462111 0.66190418 0.70208221 0.3600998 ] [0.20502873 0.04244781 0.21800856 0.28184598 0.4282653 ] [0.89211037 0.51466381 0.23432621 0.29850877 0.13323457]]
opam install sklearn
Finding Python's scikit-learn at runtime¶
You do not need a Python installation when compiling your OCaml program using ocaml-sklearn. However, when running, your program will need to load the sklearn, numpy and scipy Python libraries, so these must be installed where the OCaml program is deployed.
A version of ocaml-sklearn is tied to a version of Python's scikit-learn, numpy and scipy. For instance, a version of ocaml-sklearn for Python's scikit-learn 0.22.2 will refuse to initialize (by throwing an exception) if scikit-learn's version is not 0.22 (it can however be 0.22.1, 0.22.2 or 0.22.2.post1).
One way to make sure you run with the right versions is to create a virtualenv, install scikit-learn the Python packages inside, and run your OCaml program in the activated virtualenv.
Do this once to create the virtualenv in directory
.venv and install
scikit-learn, numpy and scipy inside:
python3 -mvenv .venv source .venv/bin/activate pip install scikit-learn==0.23.2 numpy==1.19.4 scipy==1.5.4 pytest
Then run your compiled OCaml program inside the virtualenv:
source .venv/bin/activate ./my_ocaml_program.exe
We attempt to bind all of scikit-learn's APIs. However, not all of the APIs are currently tested, and some are probably hard to use or unusable at the moment.
Each Python module or class gets its own OCaml module. For instance
sklearn.svm.SVC can be found in OCaml module
Sklearn.Svm.SVC. This module has a
create function to construct an
SVC and functions corresponding to the Python methods and
Most data is passed in and out of sklearn through module
You should generally build a dense array using the constructors in
module Np = Np.Numpy let x = Np.matrixi [|[| 1; 2 |]; [| 3; 4 |]|]
To get data out of an
to_string_array (all of these return a flattened copy of the
data, and will raise an exception if the data type is wrong).
Attributes are exposed read-only, each with two getters: one that raises Not_found if the attribute is None, and the other that returns an option.
Bunches (as returned from the sklearn.datasets APIs) are exposed as objects.
Arguments taking string values are converted (in most cases) to polymorphic variants.
Each module has a conversion function to
to_pyobject), so that you can always escape and use
if the API provided here is incomplete.
No attempt is made to expose features marked as deprecated.
ocaml-sklearn's sources are generated using a Python program (see
lib/skdoc.py) that loads up sklearn and uses introspection to
generate bindings based on
pyml. To determine types, it parses
python3 -mvenv .venv source .venv/bin/activate pip install -r requirements-dev.txt opam switch create . 4.11.1 --deps-only dune runtest
The requirements for developing (not using) the bindings are in file
requirements-dev.txt. Install it using:
# sudo apt install python3-venv python3 -mvenv .venv source .venv/bin/activate pip install -r requirements-dev.txt
The tests are in
examples/auto. They are based on examples extracted
from the Python documentation. A good way to develop is to pick one of
the files and start porting examples.
The following examples have been ported completely: - calibration - cluster - compose - ensemble - impute - manifold - metrics - model_selection - neighbors - neural_network - pipeline - preprocessing - svm - tree
The following examples still need to be ported: - covariance - cross_decomposition - datasets - decomposition - discriminant_analysis - feature_extraction - feature_selection - gaussian_process - inspection - isotonic - kernel_approximation - linear_model - multiclass - multioutput - naive_bayes - random_projection - semi_supervised - utils
Documentation can then be found in
it locally with something like:
python3 -mhttp.server --directory html_doc xdg-open http://localhost:8000
BSD-3. See file LICENSE.