I recently had someone ask me how you can have scapy reassemble full duplex packets for you. That is what Wireshark does when you ask it to "Follow TCP Stream". In SANS SEC573: Automating Information Security with Python we discuss how to use scapy's native session reassembly capabilities, but its default behavior is to reassemble unidirectional streams. In other words, two separate sessions are created. One for traffic that flows from Host A to Host B and another for the traffic in the same session that flows form Host B to Host A. To use scapy's native session reassembly you call a packet list's .sessions() method which returns a Python dictionary of followed streams. They keys of that dictionary contain a string indicating who is communicating with who in the format:
Protocol SourceIP:SourcePort > DestinationIP:DestinationPort
Its value in the dictionary is a list of all the packets that are part of that stream.
In this example you can tell by looking at the keys that there are two sessions. In the course we go into more detail about how to reassemble these packets to extract payloads so you can look for useful pieces of attack signatures. But, the question posed to me was "How can I do this with full duplex streams combining those two into a single session like Wireshark?" There are a couple of ways you could do this. One option is just to combine the two sets of packets.
This works well for a few packets, but at scale can be problematic. For example, if you step through a large file you have to keep track of which streams you already combined so that when you reach the 2nd half of the conversation you don't process the same stream a second time. A better option is to pass the scapy PacketList session() method a function that tells it how to reassemble the packets in full duplex. The function will take in a single packet and returns a string that contains the relevant bits of data that uniquely identify the steam. Every packet in your list of packets that has the same string will be automatically grouped together into a stream by the sessions() method. This behaves sort of like a key function for Python's sort capability. For our packets to be full duplex we need the string to be identical for both HOST A to HOST B and HOST B to HOST A communications. You can accomplish this by just sorting a list of the attributes that make our stream unique. Here is a function called full_duplex() that will reassemble all of the same protocols that the default session assembly process currently supports but will do it with bi-directional streams.
To use the function pass the full_duplex function to the sessions() method and it will use it to reassemble your packets.
Here is an example of trying the full_duplex() function on a larger pcap file with multiple sessions.
You can see that without full_duplex you have 182 unidirectional streams and with full_duplex you have 91 bidirectional streams. Now, using various techniques you can reassemble the payload from the full duplex streams and search for evil, forensics data or useful pieces of information. For more information on this and other techniques check out SANS SEC573: Automating Information Security with Python.
The Scapy Packet reassembly:
https://gist.github.com/MarkBaggett/d8933453f431c111169158ce7f4e2222#file-scapy_helper-py