Chapter 16. Logistic Regression

16.0 Introduction

Despite being called a regression, logistic regression is actually a widely used supervised classification technique. Logistic regression and its extensions, like multinomial logistic regression, allow us to predict the probability that an observation is of a certain class using a straightforward and well-understood approach. In this chapter, we will cover training a variety of classifiers using scikit-learn.

16.1 Training a Binary Classifier

Problem

You need to train a simple classifier model.

Solution

Train a logistic regression in scikit-learn using LogisticRegression:

# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# Load data with only two classes
iris = datasets.load_iris()
features = iris.data[:100,:]
target = iris.target[:100]

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create logistic regression object
logistic_regression = LogisticRegression(random_state=0)

# Train model
model = logistic_regression.fit(features_standardized, target)

Discussion

Despite having “regression” in its name, a logistic regression is actually a widely used binary classifier (i.e., the target vector can only take two values). In a logistic regression, a linear model (e.g., β₀ + β₁x) is included in a logistic (also called sigmoid) function, $\frac{1}{1 + e^{- z}}$ , such that:

P (y_{i} = 1 ∣ X) = \frac{1}{1 + e^{- (β_{0} + β_{1} x)}}

where P(y_i = 1 | X) is the probability of the ith observation’s target value, y_i, being class 1, X is the training data, β₀ and β₁ are the parameters to be learned, and e is Euler’s number. The effect of the logistic function is to constrain the value of the function’s output to between 0 and 1 so that it can be interpreted as a probability. If P(y_i = 1 | X) is greater than 0.5, class 1 is predicted; otherwise, class 0 is predicted.

In scikit-learn, we can learn a logistic regression model using LogisticRegression. Once it is trained, we can use the model to predict the class of new observations:

# Create new observation
new_observation = [[.5, .5, .5, .5]]

# Predict class
model.predict(new_observation)

array([1])

In this example, our observation was predicted to be class 1. Additionally, we can see the probability that an observation is a member of each class:

# View predicted probabilities
model.predict_proba(new_observation)

array([[ 0.18823041,  0.81176959]])

Our observation had an 18.8% chance of being class 0 and 81.1% chance of being class 1.

16.2 Training a Multiclass Classifier

Problem

Given more than two classes, you need to train a classifier model.

Solution

Train a logistic regression in scikit-learn with LogisticRegression using one-vs-rest or multinomial methods:

# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create one-vs-rest logistic regression object
logistic_regression = LogisticRegression(random_state=0, multi_class="ovr")

# Train model
model = logistic_regression.fit(features_standardized, target)

Discussion

On their own, logistic regressions are only binary classifiers, meaning they cannot handle target vectors with more than two classes. However, two clever extensions to logistic regression do just that. First, in one-vs-rest logistic regression (OVR) a separate model is trained for each class predicted whether an observation is that class or not (thus making it a binary classification problem). It assumes that each classification problem (e.g., class 0 or not) is independent.

Alternatively, in multinomial logistic regression (MLR) the logistic function we saw in Recipe 15.1 is replaced with a softmax function:

P (y_{i} = k ∣ X) = \frac{e^{β_{k} x_{i}}}{\sum_{j = 1}^{K} e^{β_{j} x_{i}}}

where P(y_i = k | X) is the probability of the ith observation’s target value, y_i, is class k, and K is the total number of classes. One practical advantage of the MLR is that its predicted probabilities using the predict_proba method are more reliable (i.e., better calibrated).

When using LogisticRegression we can select which of the two techniques we want, with OVR, ovr, being the default argument. We can switch to an MNL by setting the argument to multinomial.

16.3 Reducing Variance Through Regularization

Problem

You need to reduce the variance of your logistic regression model.

Solution

Tune the regularization strength hyperparameter, C:

# Load libraries
from sklearn.linear_model import LogisticRegressionCV
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create decision tree classifier object
logistic_regression = LogisticRegressionCV(
    penalty='l2', Cs=10, random_state=0, n_jobs=-1)

# Train model
model = logistic_regression.fit(features_standardized, target)

Discussion

Regularization is a method of penalizing complex models to reduce their variance. Specifically, a penalty term is added to the loss function we are trying to minimize, typically the L1 and L2 penalties. In the L1 penalty:

α \sum_{j = 1}^{p} |{\hat{β}}_{j}|

where ${\hat{β}}_{j}$ is the parameters of the jth of p features being learned and α is a hyperparameter denoting the regularization strength. With the L2 penalty:

α \sum_{j = 1}^{p} {\hat{β}}_{j}^{2}

Higher values of α increase the penalty for larger parameter values (i.e., more complex models). scikit-learn follows the common method of using C instead of α where C is the inverse of the regularization strength: $C = \frac{1}{α}$ . To reduce variance while using logistic regression, we can treat C as a hyperparameter to be tuned to find the value of C that creates the best model. In scikit-learn we can use the LogisticRegressionCV class to efficiently tune C. LogisticRegressionCV’s parameter, Cs, can either accept a range of values for C to search over (if a list of floats is supplied as an argument) or if supplied an integer, will generate a list of that many candidate values drawn from a logarithmic scale between –10,000 and 10,000.

Unfortunately, LogisticRegressionCV does not allow us to search over different penalty terms. To do this we have to use the less efficient model selection techniques discussed in Chapter 12.

16.4 Training a Classifier on Very Large Data

Problem

You need to train a simple classifier model on a very large set of data.

Solution

Train a logistic regression in scikit-learn with LogisticRegression using the stochastic average gradient (SAG) solver:

# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create logistic regression object
logistic_regression = LogisticRegression(random_state=0, solver="sag")

# Train model
model = logistic_regression.fit(features_standardized, target)

Discussion

scikit-learn’s LogisticRegression offers a number of techniques for training a logistic regression, called solvers. Most of the time scikit-learn will select the best solver automatically for us or warn us that we cannot do something with that solver. However, there is one particular case we should be aware of.

While an exact explanation is beyond the bounds of this book (for more information see Mark Schmidt’s slides in “See Also”), stochastic average gradient descent allows us to train a model much faster than other solvers when our data is very large. However, it is also very sensitive to feature scaling, so standardizing our features is particularly important. We can set our learning algorithm to use this solver by setting solver='sag'.

16.5 Handling Imbalanced Classes

Problem

You need to train a simple classifier model.

Solution

Train a logistic regression in scikit-learn using LogisticRegression:

# Load libraries
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Make class highly imbalanced by removing first 40 observations
features = features[40:,:]
target = target[40:]

# Create target vector indicating if class 0, otherwise 1
target = np.where((target == 0), 0, 1)

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create decision tree classifier object
logistic_regression = LogisticRegression(random_state=0, class_weight="balanced")

# Train model
model = logistic_regression.fit(features_standardized, target)

Discussion

Like many other learning algorithms in scikit-learn, LogisticRegression comes with a built-in method of handling imbalanced classes. If we have highly imbalanced classes and have not addressed it during preprocessing, we have the option of using the class_weight parameter to weight the classes to make certain we have a balanced mix of each class. Specifically, the balanced argument will automatically weigh classes inversely proportional to their frequency:

w_{j} = \frac{n}{k n_{j}}

where w_j is the weight to class j, n is the number of observations, n_j is the number of observations in class j, and k is the total number of classes.

Previous Chapter

K-Nearest Neighbors

Next Chapter

Support Vector Machines

Table of Contents for Machine Learning with Python Cookbook

Chapter 16. Logistic Regression

16.0 Introduction

16.1 Training a Binary Classifier

Problem

Solution

Discussion

16.2 Training a Multiclass Classifier

Problem

Solution

Discussion

16.3 Reducing Variance Through Regularization

Problem

Solution

Discussion

16.4 Training a Classifier on Very Large Data

Problem

Solution

Discussion

See Also

16.5 Handling Imbalanced Classes

Problem

Solution

Discussion

Table of Contents for
Machine Learning with Python Cookbook