Running Path MTU Discovery from Windows

When debugging network connectivity issues, sometimes it would be helpful to know what’s the largest packet that can be transferred from one host to another without being fragmented. This is especially helpful when Jumbo Frames or VPNs are involved. The method to find this size is called Path MTU Discovery (PMTUD). It is performed by your OS of choice in the background, but you can also do it manually. In Linux this is easily done, but it’s also possible from Windows. Here’s how.

Principle

The general idea is quite simple. All you need a target on the other side that responds to ping. The ping packet contains a data field that can be as large as you want (probably within some reasonable limits), and if it exceeds the MTU, the packet will be fragmented.

To prevent this, there’s a flag in the IP header called “Don’t fragment”, which you can set in order to forbid fragmentation. These two options combined allow for MTU discovery: If you send a packet that is larger than the MTU, and you also forbid fragmentation, any router that is unable to forward the packet will drop it and send back an ICMP message telling you about this fact, plus giving you the correct MTU. If you then reduce the size to fit that MTU and the target replies, you found the Path MTU. If instead another router replies with an even smaller MTU, you’ll need to continue until you get a response.

PMTUD on Linux

In Linux, this can be done using two options to the ping command:

  • -M do: This tells ping to set the DF flag,
  • -s 1472: Sets the size of the content field.

Note that you need to subtract 20 bytes for the IP header plus 8 bytes for the ICMP header for the length of the content field. So if you want to send a packet that completely consumes an MTU of 1500 bytes, you need to specify 1472 bytes of data. An easier way to do this is to have bash do the calculation for you: -s $((1500-28)).

If a router on the path can’t handle the MTU, the response will look like this:

# ping -M do -s $((1500-28)) 10.159.1.2
PING 10.159.1.2 (10.159.1.2) 1472(1500) bytes of data.
From 192.168.211.1 icmp_seq=1 Frag needed and DF set (mtu = 1420)
ping: local error: Message too long, mtu=1420
ping: local error: Message too long, mtu=1420
^C
--- 10.159.1.2 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 9ms

Here we see that 192.168.211.1 replied that it cannot forward the packet because it exceeds the MTU of the next link, which is 1420 bytes. Linux stores this information in its routing cache:

# ip route get 10.159.1.2
10.159.1.2 via 192.168.211.1 dev eth0 src 192.168.211.5 uid 0
    cache expires 596sec mtu 1420

So if we reduce our packet to 1420 bytes, it’ll go through:

# ping -M do -s $((1420-28)) 10.159.1.2
PING 10.159.1.2 (10.159.1.2) 1392(1420) bytes of data.
1400 bytes from 10.159.1.2: icmp_seq=1 ttl=63 time=28.4 ms
1400 bytes from 10.159.1.2: icmp_seq=2 ttl=63 time=29.7 ms
1400 bytes from 10.159.1.2: icmp_seq=3 ttl=63 time=28.2 ms
^C
--- 10.159.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 6ms
rtt min/avg/max/mdev = 28.197/28.781/29.736/0.708 ms

Thus we found that the Path MTU for this destination is 1420 bytes.

PMTUD on Windows

On Windows, the commands are a bit different, but the principle stays the same. First we’ll try sending a packet that is too large:

PS C:\> ping -f -l (1500-28) 10.159.1.2

Ping wird ausgeführt für 10.159.1.2 mit 1472 Bytes Daten:
Antwort von 192.168.211.1: Paket müsste fragmentiert werden, DF-Flag ist jedoch gesetzt.
Paket müsste fragmentiert werden, DF-Flag ist jedoch gesetzt.

Ping-Statistik für 10.159.1.2:
    Pakete: Gesendet = 2, Empfangen = 1, Verloren = 1
    (50% Verlust),
STRG-C

Unfortunately on Windows, ping does not show the new MTU. We can query it from the destinationcache though:

PS C:\> netsh interface ipv4 show destinationcache address=10.159.1.2
Ziel              : 10.159.1.2
Adresse des nächsten Hops         : 192.168.211.1
Quelle                   : 192.168.211.5
Schnittstelle                : Ethernet0
Pfad-MTU                 : 1300
MTU obere Ebene          : 1280
RTT-Durchschnitt                 : 1000
RTT-Abweichung            : 0
Pfadübertragungsgeschwindigkeit (Bit/s): 0
Pfadempfangsgeschwindigkeit (Bit/s) : 0
Verbindungsübertragungsgeschwindigkeit (Bit/s): 1000000000
Verbindungsempfangsgeschwindigkeit (Bit/s) : 1000000000
Kennzeichen                    : 0x48

The field we’re looking for is “Pfad-MTU”, which is German for “Path MTU”. So now if we send a ping with 1300 bytes, it’ll work:

PS C:\> ping -f -l (1300-28) 10.159.1.2

Ping wird ausgeführt für 10.159.1.2 mit 1272 Bytes Daten:
Antwort von 10.159.1.2: Bytes=1272 Zeit=7ms TTL=126
Antwort von 10.159.1.2: Bytes=1272 Zeit=6ms TTL=126
Antwort von 10.159.1.2: Bytes=1272 Zeit=6ms TTL=126

Ping-Statistik für 10.159.1.2:
    Pakete: Gesendet = 3, Empfangen = 3, Verloren = 0
    (0% Verlust),
Ca. Zeitangaben in Millisek.:
    Minimum = 6ms, Maximum = 7ms, Mittelwert = 6ms

Happy debugging!