Graphing Packet Retransmission Rates with Wireshark
As network engineers, our lives revolve around making sure data gets from point A to point B. Fortunately for us, TCP does a great job of ensuring this happens for us without much intervention. Unfortunately, we need to step in every once in a while to make sure things are going as we designed.
That said, let’s talk about TCP retransmissions.
I’m going into this post with the assumption that we all understand what a retransmission is, and that TCP retransmissions could be a symptom of a problem – but not a cause. With this post, I want to share how to provide a visual reference of the count of retransmissions over time. The idea is that if the retransmissions are charted out, they are easier to compare to things like spikes in throughput, error count increments, or even server CPU/memory utilization.
Identifying TCP Retransmissions in Wireshark
The first step is to identify the retransmissions within the packet list with this filter:
Once we have this filter applied, we can begin to see how many retransmissions we’re seeing in the trace.
It’s important to note that there is no flag or unique identifier associated with a TCP retransmission. Wireshark calculates TCP retransmissions based on SEQ/ACK number, IP ID, source and destination IP address, TCP Port, and the time the frame was received. It’s very easy for Wireshark to count a duplicate packet as a retransmission. Make sure you haven’t captured the same frame twice. This is very common in data center capture architectures.
If you open a trace file and see something that looks like the below screenshot, you’ll want to review the process for removing duplicate frames here.
Graphing Retransmissions in Wireshark
We’re going to walk through the process of graphing retransmissions for a single host, then look at how to compare the count of retransmissions for two hosts.
The examples below were all created from this trace on pcapr. Feel free to use these traces to work through the examples.
Graphing a Single Host
Let’s start by opening the IO Graph in Wireshark:
The IO Graph window opens and default displays the total count of packets per second. The first thing that jumps out is that the PPS (packets per second) dropped. In this case, we already know why that happened – we started taking errors on an up-stream interface and experienced packet loss of +/- 50%.
Let’s add the retransmissions to the Graph 2 field. We’ll be using the same filter as above:
That’s not helpful at all. We’ll need to make some modifications to make this display a bit better.
In the next screenshot, I’ve made the following changes:
- Change the Pixels per tick to 10 – This widens the X axis and allows us to have more granularity
- Select View as time of day – this changes the time display on the X axis from seconds since beginning of capture to time of day
- Change the Y Axis scale to logarithmic – this displays the count of retransmissions in the context of the count of packets per second.
- Change the style of Graph 2 to FBar
Graphing Multiple Hosts
In the below graph, I’ve created a comparison between the two hosts in the trace. The goal with this graph is to display the counts of retransmissions per second for each host.
I’ve added the following display filter to each of the colored graphs. I chose red and blue only because the default Wireshark green kills my eyes!
ip.src == 192.168.0.12 && tcp.analysis.retransmission
ip.addr ==22.214.171.124 && tcp.analysis.retransmission
The point here is that graphing data in Wireshark serves two purposes.
- A surprising amount of people don’t get their jollies from staring at a list of thousands of raw packets like you and I do. Turns out, they understand it much better when you give them a chart to look at.
- It’s much easier to establish correlation with other issues when you can visually present the data in a similar format over a period of time. Consider overlaying these graphs over a heavily utilized network link. When the peaks line up, you can begin to see a pattern.
TCP retransmissions are just one of the many fields that can be used for graphing in troubleshooting scenarios. Try some of these others using the same trace file.
HTTP Response times that took more than 400 ms:
http.time >= 0.4
TCP ACK that took longer than 50 ms
tcp.analysis.rto >= 0.050