Scikit-learn is a free Python package that is also written in Python. Scikit-learn provides a machine learning library that supports several popular machine learning algorithms for classification, clustering, regression, and so on. Scikit-learn is very helpful for machine learning novices. Scikit-learn can be easily installed by running the following command:
pip install sklearn
To check whether the package is installed successfully, conduct a test using the following piece of code in Jupyter Notebook or the Python command line:
import sklearn
If the preceding argument throws no errors, then the package has been successfully installed.
Scikit-learn requires two dependent packages, NumPy and SciPy, to be installed. We will discuss their functionalities in the following sections. Scikit-learn comes with a few inbuilt datasets like:
- Iris data set
- Breast cancer dataset
- Diabetes dataset
- The Boston house prices dataset and others
Other public datasets from libsvm and svmlight can also be loaded, as follows:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
A sample script that uses scikit-learn to load data is as follows:
from sklearn.datasets import load_boston
boston=datasets.load_boston()