Hands-On Machine Learning for Cybersecurity

Since our computations are done per minute, we round off the time to the nearest minute, as shown in the following code:

_time = pdata_frame['Time'] #Time column of the data frame
edited_time = []
for row in pdata_frame.rows:
            arr = _time.split(':')
            time_till_mins = str(arr[0]) + str(arr[1])
            edited_time.append(time_till_mins) # the rounded off time 
source = pdata_frame['Source'] # source address

The output of the preceding code is the time rounded off to the nearest minute, that is, 2018-03-18 21:17:58 which will become 2018-03-18 21:17:00 as shown:

'2018-03-18 21:17:00'
'2018-03-18 21:18:00'
'2018-03-18 21:19:00'
'2018-03-18 21:20:00'
'2018-03-19 21:17:00'

We count the number of connections established per minute for a particular source by iterating through the time array for a given source:

connection_count = {} # dictionary that stores count of connections per minute
for s in source: 
    for x in edited_time :
        if  x in connection_count :
            value = connection_count[x] 
            value = value + 1  
            connection_count[x] = value
        else:
            connection_count[x] = 1
new_count_df #count # date #source

The connection_count dictionary gives the number of connections. The output of the preceding code looks like:

`Time`	`Source`	`Number of Connections`
`2018-03-18 21:17:00`	`192.168.0.2`	`5`
`2018-03-18 21:18:00`	`192.168.0.2`	`1`
`2018-03-18 21:19:00`	`192.168.0.2`	`10`
`2018-03-18 21:17:00`	`192.168.0.3`	`2`
`2018-03-18 21:20:00`	`192.168.0.2`	`3`
`2018-03-19 22:17:00`	`192.168.0.2`	`3`
`2018-03-19 22:19:00`	`192.168.0.2`	`1`
`2018-03-19 22:22:00`	`192.168.0.2`	`1`
`2018-03-19 21:17:00`	`192.168.0.3`	`20`

We will decompose the data with the following code to look for trends and seasonality in the data. Decomposition of the data promotes more effective detection of an anomalous behavior, a DDoS attack, as shown in the following code:

from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(new_count_df, model='additive')
result.plot()
pyplot.show()

The data generates a graph as follows; we are able to recognize the seasonality and trend of the data in general:

Next we find the ACF function for the data to understand the autocorrelation among the variables, with the following piece of code:

from matplotlib import pyplot
from pandas.tools.plotting import autocorrelation_plot
autocorrelation_plot(new_count_df)
pyplot.show()

Table of Contents for
Hands-On Machine Learning for Cybersecurity

Feature computation

Table of Contents for Hands-On Machine Learning for Cybersecurity

Table of Contents for
Hands-On Machine Learning for Cybersecurity