Despite being called a regression, logistic regression is actually a widely used supervised classification technique. Logistic regression and its extensions, like multinomial logistic regression, allow us to predict the probability that an observation is of a certain class using a straightforward and well-understood approach. In this chapter, we will cover training a variety of classifiers using scikit-learn.
Train a logistic regression in scikit-learn using LogisticRegression:
# Load librariesfromsklearn.linear_modelimportLogisticRegressionfromsklearnimportdatasetsfromsklearn.preprocessingimportStandardScaler# Load data with only two classesiris=datasets.load_iris()features=iris.data[:100,:]target=iris.target[:100]# Standardize featuresscaler=StandardScaler()features_standardized=scaler.fit_transform(features)# Create logistic regression objectlogistic_regression=LogisticRegression(random_state=0)# Train modelmodel=logistic_regression.fit(features_standardized,target)
Despite having “regression” in its name, a logistic regression is actually a widely used binary classifier (i.e., the target vector can only take two values). In a logistic regression, a linear model (e.g., β0 + β1x) is included in a logistic (also called sigmoid) function, , such that:
where P(yi = 1 | X) is the probability of the ith observation’s target value, yi, being class 1, X is the training data, β0 and β1 are the parameters to be learned, and e is Euler’s number. The effect of the logistic function is to constrain the value of the function’s output to between 0 and 1 so that it can be interpreted as a probability. If P(yi = 1 | X) is greater than 0.5, class 1 is predicted; otherwise, class 0 is predicted.
In scikit-learn, we can learn a logistic regression model using
LogisticRegression. Once it is trained, we can use the model to predict the class of new observations:
# Create new observationnew_observation=[[.5,.5,.5,.5]]# Predict classmodel.predict(new_observation)
array([1])
In this example, our observation was predicted to be class 1. Additionally, we can see the probability that an observation is a member of each class:
# View predicted probabilitiesmodel.predict_proba(new_observation)
array([[ 0.18823041, 0.81176959]])
Our observation had an 18.8% chance of being class 0 and 81.1% chance of being class 1.
Train a logistic regression in scikit-learn with LogisticRegression
using one-vs-rest or multinomial methods:
# Load librariesfromsklearn.linear_modelimportLogisticRegressionfromsklearnimportdatasetsfromsklearn.preprocessingimportStandardScaler# Load datairis=datasets.load_iris()features=iris.datatarget=iris.target# Standardize featuresscaler=StandardScaler()features_standardized=scaler.fit_transform(features)# Create one-vs-rest logistic regression objectlogistic_regression=LogisticRegression(random_state=0,multi_class="ovr")# Train modelmodel=logistic_regression.fit(features_standardized,target)
On their own, logistic regressions are only binary classifiers, meaning they cannot handle target vectors with more than two classes. However, two clever extensions to logistic regression do just that. First, in one-vs-rest logistic regression (OVR) a separate model is trained for each class predicted whether an observation is that class or not (thus making it a binary classification problem). It assumes that each classification problem (e.g., class 0 or not) is independent.
Alternatively, in multinomial logistic regression (MLR) the logistic function we saw in Recipe 15.1 is replaced with a softmax function:
where P(yi = k | X) is the probability of the
ith observation’s target value, yi, is
class k, and K is the total number of classes. One practical advantage of the MLR is that its predicted probabilities using the predict_proba method are more reliable (i.e., better calibrated).
When using LogisticRegression we can select which of the two
techniques we want, with OVR, ovr, being the default argument. We can
switch to an MNL by setting the argument to multinomial.
Tune the regularization strength hyperparameter, C:
# Load librariesfromsklearn.linear_modelimportLogisticRegressionCVfromsklearnimportdatasetsfromsklearn.preprocessingimportStandardScaler# Load datairis=datasets.load_iris()features=iris.datatarget=iris.target# Standardize featuresscaler=StandardScaler()features_standardized=scaler.fit_transform(features)# Create decision tree classifier objectlogistic_regression=LogisticRegressionCV(penalty='l2',Cs=10,random_state=0,n_jobs=-1)# Train modelmodel=logistic_regression.fit(features_standardized,target)
Regularization is a method of penalizing complex models to reduce their variance. Specifically, a penalty term is added to the loss function we are trying to minimize, typically the L1 and L2 penalties. In the L1 penalty:
where is the parameters of the jth of p features being learned and α is a hyperparameter denoting the regularization strength. With the L2 penalty:
Higher values of α increase the penalty for larger
parameter values (i.e., more complex models). scikit-learn follows the
common method of using C instead of α
where C is the inverse of the regularization strength:
. To reduce variance while using
logistic regression, we can treat C as a hyperparameter to
be tuned to find the value of C that creates the best
model. In scikit-learn we can use the LogisticRegressionCV class to efficiently tune C. LogisticRegressionCV’s parameter, Cs, can either accept a range of values for C to search over (if a list of floats is supplied as an argument) or if supplied an integer, will generate a list of that many candidate values drawn from a logarithmic scale between –10,000 and 10,000.
Unfortunately, LogisticRegressionCV does not allow us to search over
different penalty terms. To do this we have to use the less efficient
model selection techniques discussed in Chapter 12.
Train a logistic regression in scikit-learn with LogisticRegression
using the stochastic average gradient (SAG) solver:
# Load librariesfromsklearn.linear_modelimportLogisticRegressionfromsklearnimportdatasetsfromsklearn.preprocessingimportStandardScaler# Load datairis=datasets.load_iris()features=iris.datatarget=iris.target# Standardize featuresscaler=StandardScaler()features_standardized=scaler.fit_transform(features)# Create logistic regression objectlogistic_regression=LogisticRegression(random_state=0,solver="sag")# Train modelmodel=logistic_regression.fit(features_standardized,target)
scikit-learn’s LogisticRegression offers a number of techniques for
training a logistic regression, called solvers. Most of the time
scikit-learn will select the best solver automatically for us or warn us
that we cannot do something with that solver. However, there is one
particular case we should be aware of.
While an exact explanation is beyond the bounds of this book (for more
information see Mark Schmidt’s slides in “See Also”), stochastic average gradient descent allows us to train a model much faster than other solvers when our data is very large. However, it is also very sensitive to feature scaling, so standardizing our features is particularly important. We can set our learning algorithm to use this
solver by setting solver='sag'.
You need to train a simple classifier model.
Train a logistic regression in scikit-learn using LogisticRegression:
# Load librariesimportnumpyasnpfromsklearn.linear_modelimportLogisticRegressionfromsklearnimportdatasetsfromsklearn.preprocessingimportStandardScaler# Load datairis=datasets.load_iris()features=iris.datatarget=iris.target# Make class highly imbalanced by removing first 40 observationsfeatures=features[40:,:]target=target[40:]# Create target vector indicating if class 0, otherwise 1target=np.where((target==0),0,1)# Standardize featuresscaler=StandardScaler()features_standardized=scaler.fit_transform(features)# Create decision tree classifier objectlogistic_regression=LogisticRegression(random_state=0,class_weight="balanced")# Train modelmodel=logistic_regression.fit(features_standardized,target)
Like many other learning algorithms in scikit-learn, LogisticRegression comes with a built-in method of handling imbalanced
classes. If we have highly imbalanced classes and have not addressed it
during preprocessing, we have the option of using the class_weight parameter to weight the classes to make certain we have a balanced mix of each class. Specifically, the balanced argument will automatically weigh classes inversely proportional to their frequency:
where wj is the weight to class j, n is the number of observations, nj is the number of observations in class j, and k is the total number of classes.