This is a generalized ARIMA model that allows non-integer values of the differencing parameter. It is in time series models with a long memory.
Now that we have discussed the details of each of these stochastic models, we will fit the model with the baseline network data. We will use the stats models library for the ARIMA stochastic model. We will pass the p, d, q values for the ARIMA model. The lag value for autoregression is set to 10, the difference in order is set to 1, and the moving average is set to 0. We use the fit function to fit the training/baseline data. The fitting model is shown as follows:
# fitting the model
model = ARIMA(new_count_df, order=(10,1,0))
fitted_model = model.fit(disp=0)
print(fitted_model.summary())
The preceding is the model that is fitted in the entire training set, and we can use it to find the residuals in the data. This method helps us to understand data better, but is not used in forecasts. To find the residuals in the data, we can do the following in Python:
# plot residual errors
residuals_train_data = DataFrame(fitted_model.resid)
residuals_train_data.plot()
pyplot.show()
Finally we will use the predict() function to predict what the current pattern in data should be like. Thus we bring in our current data, which supposedly contains the data from the DDoS attack:
- Our training data: Baseline network data for a period of a month
- Our testing data: DDoS attack data
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error
ddos_predictions = list()
history = new_count_df
for ddos in range(len(ddos_data)):
model = ARIMA(history, order=(10,1,0))
fitted_model = model.fit(disp=0)
output =fitted_model.forecast()
We can plot the error that is the difference between the forecasted DDoS free network data and the data with the network, by computing the mean square error between them, as shown in the following code:
pred = output[0]
ddos_predictions.append(pred)
error = mean_squared_error(ddos_data,ddos_predictions)
The output of the preceding code is the following plot where the dense line is the forecasted data and the dotted line is the data from the DDoS attack.