Hands-On Machine Learning for Cybersecurity

We have data in a CSV file, a text file separated by commas. While importing the data we also identify the headers in the data. Since we deal with packet capture data from the network, the columns captured are as follows:

Sl Num: Serial number
Time: Time of record capture
Source: Source address or origin of the network packet
Destination: Destination address of the network
Volume: Data volume exchanged in kilobyte (KB)
Protocol: The network protocol that is SMTP, FTP, or HTTP:

pdata_frame = pd.read_csv("path/to/file.csv", sep=',', index_col = 'Sl Num', names = ["Sl Num", "Time", "Source", "Destination","Volume", "Protocol"])

Let's dump the first few lines of the data frame and have a look at the data. The following code displays the first 10 lines of the packet capture dataset:

pdata_frame.head(n=9)

The output of the preceding is as follows:

`Sl Num`	`Time`	`Source`	`Destination`	`Volume`	`Protocol`
`1`	`1521039662`	`192.168.0.1`	`igmp.mcast.net`	`5`	`IGMP`
`2`	`1521039663`	`192.168.0.2`	`239.255.255.250`	`1`	`IGMP`
`3`	`1521039666`	`192.168.0.2`	`192.168.10.1`	`2`	`UDP`
`4`	`1521039669`	`192.168.10.2`	`192.168.0.8`	`20`	`DNS`
`5`	`1521039671`	`192.168.10.2`	`192.168.0.8`	`1`	`TCP`
`6`	`1521039673`	`192.168.0.1`	`192.168.0.2`	`1`	`TCP`
`7`	`1521039674`	`192.168.0.2`	`192.168.0.1`	`1`	`TCP`
`8`	`1521039675`	`192.168.0.1`	`192.168.0.2`	`5`	`DNS`
`9`	`1521039676`	`192.168.0.2`	`192.168.10.8`	`2`	`DNS`

Table of Contents for
Hands-On Machine Learning for Cybersecurity

Importing data in pandas

Table of Contents for Hands-On Machine Learning for Cybersecurity

Table of Contents for
Hands-On Machine Learning for Cybersecurity