tcp connection timeout mystery
For over a year, I’ve been investigating a strange issue with the internet in my apartment. I’m posting the details here in case someone more knowledgeable than me can solve the mystery.
I live in graduate student housing with internet provided by the university campus. Usually things work reasonably well, with one exception. On the wired network, I cannot connect to IP addresses used by Bunny CDN.1
Many websites use Bunny CDN to serve static assets. The connection timeouts break these sites. Some fail to load images, some load after a 60 second timeout, and some cannot load at all. I’ve seen consistent failures on all of the following sites:
- plausible.io is served from Bunny CDN.
- dunnedwards.com uses h6a8m2f3.rocketcdn.me backed by Bunny CDN.
- ravelry.com loads stylesheets from style-cdn.ravelrycache.com and analytics scripts from plausible.io, both backed by Bunny CDN.
- fosstodon.org previously used Bunny CDN for images (but now uses Fastly).
- Nintendo’s US support site loads a stylesheet from cdn.icomoon.io backed by Bunny CDN.
- The Bunny CDN website itself loads fonts from… Bunny CDN!
This issue occurs only on the campus network, and only when connected to the wired network (the university also provides wireless eduroam and guest networks, which work correctly). It happens on every device and operating system I’ve tested (laptop/phone/desktop and Linux/Windows/macOS/iOS). It happens whether connecting through a NetGear WiFi router or connecting my computer directly to the campus network via Ethernet. It happens using both HTTP and HTTPS (but ping works fine).
It happens consistently, every time I test it. Packet captures of the timeouts always show the exact same behavior:
- Packet #1: Client (136.152.38.228) sends SYN to server (169.150.221.147).
- Packet #2: Server responds with SYN+ACK.
- Packet #3: Client responds with ACK to complete the TCP 3-way handshake.
- Packet #8: Server resends SYN+ACK, indicating that it never received the client ACK.
- Client and server repeatedly resend ACK and SYN+ACK.
- Packet #17: Client times out and terminates the connection with FIN+ACK.
This looks to me like the client ACK never reaches the server.
My best guess is that maybe asymmetric routing causes the campus firewall to drop packets:
- Client SYN goes through the firewall.
- Server SYN+ACK routed back to the client, incorrectly bypassing the firewall.
- Client ACK goes to the firewall and gets dropped because the firewall never received the server SYN+ACK.
However, I don’t know what routing misconfiguration could cause this or why it affects only IP addresses from Bunny CDN.
I spent over a week trying to escalate this problem through Berkeley Student Tech Services. After I explained the problem to three different teams, someone from the Network Operations and Services team closed the ticket, telling me:
Since it is happening with only certain websites, you should ensure that there is not a static DNS entry in the wired connection of your device that is in conflict with our DNS service. We are not seeing any indication that this is an issue with our network at this point in time… This is all functioning as it was designed to per the Principal Wifi Engineer for the campus.
As you might imagine, I found this response unconvincing.2
I’m still very curious what could cause an issue like this. Why are the client ACK packets to those specific IP addresses being dropped? Is there something special about Bunny CDN that could interact badly with the campus network configuration?
Update: Thanks to everyone who emailed me with comments and suggestions! After I wrote this post, I learned some new information that (mostly) solves the mystery. Please see my follow-up post here.
Traceroute to the IP address always shows a subdomain of “bunnyinfra.net.” ↩︎
I really don’t understand how they think a static DNS entry could cause a TCP handshake failure. (Or, for that matter, what it means for a DNS entry to be “in the wired connection” of a device.) The IP address in the packet capture matches the address returned by Google and Cloudflare DNS resolvers, so it’s definitely correct. ↩︎