We have data in a CSV file, a text file separated by commas. While importing the data we also identify the headers in the data. Since we deal with packet capture data from the network, the columns captured are as follows:
- Sl Num: Serial number
- Time: Time of record capture
- Source: Source address or origin of the network packet
- Destination: Destination address of the network
- Volume: Data volume exchanged in kilobyte (KB)
- Protocol: The network protocol that is SMTP, FTP, or HTTP:
pdata_frame = pd.read_csv("path/to/file.csv", sep=',', index_col = 'Sl Num', names = ["Sl Num", "Time", "Source", "Destination","Volume", "Protocol"])
Let's dump the first few lines of the data frame and have a look at the data. The following code displays the first 10 lines of the packet capture dataset:
pdata_frame.head(n=9)
The output of the preceding is as follows:
| Sl Num | Time | Source | Destination |
Volume
|
Protocol |
| 1 | 1521039662 | 192.168.0.1 | igmp.mcast.net | 5 | IGMP |
| 2 | 1521039663 | 192.168.0.2 | 239.255.255.250 | 1 | IGMP |
| 3 | 1521039666 | 192.168.0.2 | 192.168.10.1 | 2 | UDP |
| 4 | 1521039669 | 192.168.10.2 | 192.168.0.8 | 20 | DNS |
| 5 | 1521039671 | 192.168.10.2 | 192.168.0.8 | 1 | TCP |
| 6 | 1521039673 | 192.168.0.1 | 192.168.0.2 | 1 | TCP |
| 7 | 1521039674 | 192.168.0.2 | 192.168.0.1 | 1 | TCP |
| 8 | 1521039675 | 192.168.0.1 | 192.168.0.2 | 5 | DNS |
| 9 | 1521039676 | 192.168.0.2 | 192.168.10.8 | 2 | DNS |