In the previous article, we have discusses the What, Where, Why and How of PyShark and have also seen simple code implementations such as capturing live packets and to read a PCAP file.
In this article, we will look into data visualization using Pyshark.
So, let’s get started …
The first step to capture live packets has been shown/discussed in my previous article. You can check out the link below.
https://medium.com/@sbswaroop69/introduction-to-pyshark-71dfd390536d
The second step is to read the captured packets which has been shown/discussed in my previous article. You can check out the same link given above.
I will put the code at the end for your reference.
Third step
ip = []
for pkt in capture1:
if ("IP" in pkt):
if ("UDP" in pkt):
print(pkt.ip.src, pkt.udp.dstport)
ip.append([pkt.ip.src, pkt.udp.dstport])
elif ("TCP" in pkt):
print(pkt.ip.src, pkt.tcp.dstport)
Ip.append([pkt.ip.src, pkt.tcp.dstport])
elif ("IPV6" in pkt):
if ("UDP" in pkt):
print(pkt.ipv6.src, pkt.udp.dstport)
ip.append([pkt.ipv6.src, pkt.udp.dstport])
elif ("TCP" in pkt):
print(pkt.ipv6.src, pkt.tcp.dstport)
ip.append([pkt.ipv6.src, pkt.tcp.dstport])
In the code above, initially an empty list is created, thereafter iterating through loops so as to decode each packet at a time and then we check if the packet contains the IP layer or IPv6 whether it contains someone checking whether it is TCP or UDP, there would be several protocols such as DNS, TLS etc. However, this is specified in the TCP and UDP protocol and come up with a nested list.
Fourth step
import pandas as pddata = pd.DataFrame(ip, columns=['sourceip', 'port'])
data['port'] = data['port'].astype(int)
Now, I’m going to convert Pyshark to pandas data frame to plot the data properly. Here I’m going to plot a graph: source IP vs Port.
The fifth step is to plot the graph
data_crosstab = pd.crosstab(data['sourceip'], data['port'])
print(data_crosstab)
data_crosstab.plot.bar(stacked=True)
plt.show()
I have used the pandas crosstab function to build a cross-tabulation table that can show the frequency with which certain groups of data appear. And then plotted a stacked bar graph.
Here is the final code
import pyshark
import pandas as pd
import matplotlib.pyplot as plt
filename = input("Please enter OUTPUT filename with Extension csv/pcap example- file.csv or file.pcap:: ")
try:
capture = pyshark.LiveCapture(interface="wlan0", output_file=filename)
capture.sniff()
except KeyboardInterrupt:
print(capture)
if len(capture) > 10:
capture1 = pyshark.FileCapture(filename)
ip = []
for pkt in capture1:
if ("IP" in pkt):
if ("UDP" in pkt):
print(pkt.ip.src, pkt.udp.dstport)
ip.append([pkt.ip.src, pkt.udp.dstport])
elif ("TCP" in pkt):
print(pkt.ip.src, pkt.tcp.dstport)
ip.append([pkt.ip.src, pkt.tcp.dstport])
elif ("IPV6" in pkt):
if ("UDP" in pkt):
print(pkt.ipv6.src, pkt.udp.dstport)
ip.append([pkt.ipv6.src, pkt.udp.dstport])
elif ("TCP" in pkt):
print(pkt.ipv6.src, pkt.tcp.dstport)
ip.append([pkt.ipv6.src, pkt.tcp.dstport])
data = pd.DataFrame(ip, columns=['sourceip', 'port'])
data['port'] = data['port'].astype(int)
data_crosstab = pd.crosstab(data['sourceip'], data['port'])
print(data_crosstab)
data_crosstab.plot.bar(stacked=True)
plt.show()
else:
print("[-] YOU HAVE LESS PACKETS TO PLOT THE GRAPH")
Output
Summary
This article gives the outline on data visualization using Pyshark in a short, relevant and focused manner. I genuinely hope it has helped someone get a better understanding of data visualization using pyshark.