AT&T Wireless traffic shaping apparently making some websites unusable
Content Overview
Post-incident Sumary⌗
To summarize this post, I was seemingly throttled by AT&T for days. See the Original Post for the full write up.
While gathering data for this post and attempting solutions, I changed my LTE router’s APN, which caused it to re-authenticate to AT&T, and after I was re-authenticated I was no longer being throttled. One Hacker News user suggested that re-authenticating may have caused me to route through a different PGW (Packet Data Network Gateway) with a different traffic shaping policy.
I won’t know for certain unless AT&T responds directly to my request for more information.
The bottom of this post contains multiple updates from throughout the incident and beyond.
Original Post⌗
When I’m living in my RV, wireless service providers are my primary source of connectivity. So when either AT&T or Verizon make major changes, I take notice.
I recently noticed that multiple websites are quite slow when browsing with my AT&T business plan, listed in AT&T Premier (business account management UI) as “Wireless Broadband Ultra for Router or Hotspt (sic)”. This is an “unlimited” 100Mbit plan with 50GB for Business Fast Track (prioritized) data. Being that I was far below the 50GB of monthly Fast Track data, my data should have had top priority, so I became suspicious. To be honest, I’ve never noticed a discernible difference between Fast Track and non-Fast Track data rates. This is all to say that I have no reason to believe that I’m being deprioritized due to usage.
Naturally, the first hing I did was conduct a speed test. I already knew from previous experience that for some reason, AT&T traffic to fast.com is throttled. Why AT&T wants bandwidth to appear lower than reality is a mystery to me, but I digress. Linode.com has speed tests that AT&T has no special treatment for, and the nearest one to me was in Fremont, CA.
The speedtest revealed 21Mbps down and 4.5Mbps up – pretty reasonable in a relatively rural area like Durango, CO. Latency was ~130ms. That speed certainly wouldn’t explain why it took anywhere between 15 seconds and 2 minutes to load strava.com
.
So I opened up the “Network” tab in Firefox and could clearly see that dozens of resources from cloudfront.com
were taking multiple seconds to load. The problem clearly has something to do with Cloudfront.
Is Cloudfront having problems? That’s easy enough to verify; my Sierra Wireless RV55 CAT-12 LTE-A router also contains an unlimited Verizon Business SIM card that I can use to conduct tests on Cloudfront, independent of AT&T.
I noticed that one of Strava’s javascript resources clocked in at 1.68MB, making it a nice test subject for speed tests (https://web-assets.strava.com/assets/federated/find-and-invite-friends/827.js). At the time of writing web-assets.strava.com
resolves to dgpcy4fyk1eox.cloudfront.net
, so rest assured, we are dealing with Cloudfront.
After switching to Verizon, I could see that Cloudfront was having no problems. Our friend 827.js
downloaded in just over 1 second at 1.4MB/s. I clearly saw earlier in the Firefox network tab that this resource took nearly 1 minute to load on AT&T.
While
wget
is not my goto for command line HTTP fetching, it displays transfer rates in a human friendly format by default, so I used the following as my test case:wget -O /dev/null -q --show-progress https://web-assets.strava.com/assets/federated/find-and-invite-friends/827.js
So the problem isn’t Cloudfront, because Verizon was fast enough. It wasn’t blazing fast by any means, but I also didn’t have to wait 2 minutes to learn whether Strava awarded me King of the Moutain on a local trail (I wasn’t).
Maybe this is a global AT&T problem. That’s easy to test as well – my iPhone is also on the same AT&T business account as my data-only plan, so I turned on the iPhone hotspot and made it the router’s WAN device to make sure we’re changing a single variable at a time. I conducted another speed test, revealing 23Mbps down and 3Mbps up. Nothing surprising there – normal bandwidth fluctuations for a wireless device. How about our wget
test? 1.7MB/s. The problem is clearly not all of AT&T wireless.
Let’s go back to the original configuration: connect directly to my AT&T data-only plan with my router and re-run the wget
test. Maybe I was imagining things. I’m somewhat surprised to see the wget
test with a transfer rate of ~30KB/s. I believe this rules out AT&T with a souring peering agreement somewhere between me and Cloudfront. My phone traffic to Cloudfront is unaffected; it’s only my data-only plan that is affected.
I now have a pretty clear picture of what is likely happening. Good old-fashion traffic shaping. Now that the router is connected directly to AT&T, the true test of traffic shaping is transfer rates while connected to a VPN. I’ll let the image speak for itself.
As of the time of writing, I’m unsure what is causing such a significant slowdown. It has rendered some websites effectively useless. Everything in this writeup indicates, to me, that AT&T is engaged in extremely aggressive traffic shaping for some plans, rendering many websites nearly unusable.
Do you have any ideas how to diagnose this problem further? Do you know the best way engage AT&T’s technical folks to take this seriously? Write me at att-traffic-shaping @ this domain
. I’ll add updates here if anything changes, or I get a response on https://bizcommunity.att.com.
Updates⌗
Update #1 (19:03:09 UTC)⌗
Here is a pcap file captured with sudo tcpdump host 'dgpcy4fyk1eox.cloudfront.net' -w capture.pcap
: capture.pcap
I’ve also attempted to disable/enable IPv6 on my local machine and my VPN connection to check for any differences. No differences were observed.
Update #2 (19:43:00 UTC)⌗
Traceroutes
Connected to AT&T iPhone via Wifi hotspot
traceroute web-assets.strava.com 130 ↵
traceroute to web-assets.strava.com (99.84.208.10), 30 hops max, 60 byte packets
1 _gateway (172.20.10.1) 4.386 ms 4.280 ms 5.292 ms
2 107.243.82.3 (107.243.82.3) 139.792 ms * *
3 * * *
4 * * *
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 12.117.216.210 (12.117.216.210) 244.538 ms 369.608 ms 933.487 ms
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 server-99-84-208-10.iad79.r.cloudfront.net (99.84.208.10) 206.664 ms 206.643 ms 206.623 ms
Connected to router on AT&T business plan
traceroute web-assets.strava.com
traceroute to web-assets.strava.com (99.84.208.115), 30 hops max, 60 byte packets
1 _gateway (192.168.13.31) 61.896 ms 61.950 ms 62.064 ms
2 172.26.96.161 (172.26.96.161) 132.156 ms 132.775 ms 132.749 ms
3 107.72.231.188 (107.72.231.188) 132.743 ms 132.718 ms 132.199 ms
4 * * *
5 12.83.179.49 (12.83.179.49) 148.138 ms 148.187 ms 148.162 ms
6 slkut21crs.ip.att.net (12.122.1.186) 148.361 ms 137.643 ms 142.401 ms
7 dvmco22crs.ip.att.net (12.122.28.45) 146.657 ms 146.627 ms 149.389 ms
8 cgcil21crs.ip.att.net (12.122.28.78) 149.369 ms 149.345 ms 216.643 ms
9 12.122.28.206 (12.122.28.206) 149.302 ms 146.467 ms 149.256 ms
10 wshdc84crs.ip.att.net (12.122.135.230) 146.421 ms 149.211 ms 149.161 ms
11 wshdc406me9.ip.att.net (12.123.10.125) 143.819 ms 139.466 ms 139.378 ms
12 12.117.216.210 (12.117.216.210) 139.342 ms 139.314 ms 187.019 ms
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 server-99-84-208-115.iad79.r.cloudfront.net (99.84.208.115) 152.577 ms 140.190 ms 141.460 ms
Update #3 (21:54:03 UTC)⌗
Here is a pcap file captured with sudo tcpdump host 'dgpcy4fyk1eox.cloudfront.net' -w iphone-capture.pcap
while tethered to my AT&T iPhone and fetching 827.js
with wget -O /dev/null -q --show-progress https://web-assets.strava.com/assets/federated/find-and-invite-friends/827.js
: iphone-capture.pcap
Update #4 (23:39:00 UTC)⌗
As my initial post mentioned, I had hoped to get some sort of resolution by posting to http://bizcommunity.att.com. As of right now, that post https://bizcommunity.att.com/conversations/the-break-room/im-apparently-experiencing-extremely-aggressive-att-traffic-shaping/643c3d81aba4c269accb1a2b has been marked as “Private”. This URL also 404s, unless I’m logged in under the account with which I posted it.
I can’t say whether this is normal practice on the bizcommunity forum, but I posted a followup post with a much more benign title, “Are all new posts here marked as private?” https://bizcommunity.att.com/conversations/the-break-room/are-all-new-posts-here-marked-as-private/643c866bb67b2d067eb0985c, which as of right now is not private and still accessible to the public.
Update #5 (23:54:00 UTC)⌗
Hello reader, I’m another HN member trying to help get to the bottom of this! I’ve analyzed all the data provided and feel confident there is some kind of manipulation / throttling happening with Adriano’s AT&T business plan, but if anyone wants to correct my interpretation or provide other insights feel free to reach out to me on my blog!
We start by analyzing the packet capture from the throttled connection. There are a few takeaways from this:
- TCP segment & window sizes have normal values.
- There are almost no retransmissions or lost packets.
- There appears to be ample room for the TCP stream to scale up to a much faster speed.
- The TCP stream starts scaling up to 800Kbps before dramatically going down.
- The TCP stream regularly blocks for 50-200ms intervals waiting for new data to arrive from cloudfront.
- These blocks coincide with throughput scaling up to ~350Kbps.
All of the spikes in round trip time correspond to a segment of the stream where the client acknowladges receipt of new data and is forced to wait 50-200ms for any new data to arrive. This happens at a regular cadence that appears artificial in nature (i.e not regular network congestion) because the throughput chart stays within a well defined boundary.
If this were a problem with AT&T’s network or a side effect of LTE networks being inherently high latency we would expect to observe similair patterns over the iphone connection:
We can draw a few conclusions from these charts:
- The iphone AT&T connection doesn’t exhibit the same RTT or throughput pattern despite having similair latency & network routing.
- The throughput chart is indicative of a healthy TCP stream that slowly scales up the total bandwidth to make the most out of available network resources.
- We are never waiting very long for fresh data to arrive.
In my opinion these patterns rule out any extraneous factors affecting the network and are indicative of intentional manipulation of the TCP stream. If AT&T were to occasionaly drop an incoming packet than the client would never send an acknowledgement to cloudfront and after a little while cloudfront would automatically retransmit that packet, on our client side packet capture we would observe this as a packet coming in with a big delay, if we were being rate limited we would expect to see these delayed packets coming in regularly around the time when we were starting to hit a threshold, it seems likely AT&T is dropping certain inbound packets once the incoming rate reaches around ~350Kbps.
Update #6 (00:43:00 UTC)⌗
Refer to Update #4 above: I’m getting shut down on the AT&T bizcommunity site. My posts are being marked private, so I’ll save them here for posterity.
Forum Post 1: I’m apparently experiencing extremely aggressive AT&T traffic shaping⌗
URL (was public): https://bizcommunity.att.com/conversations/the-break-room/im-apparently-experiencing-extremely-aggressive-att-traffic-shaping/643c3d81aba4c269accb1a2b
The short of it is that I’m experiencing what appears to be aggressive traffic shaping by AT&T. Many sites are affected, because Cloudflare is one content delivery network for which AT&T is apparently shaping my traffic.
This is a business account. I have both cell phone devices and a data-only plan “Wireless Broadband Ultra for Router or Hotspt (sic)” on the account. Only the data-only plan is affected.
Because Cloudflare is such a large content delivery network, this affects millions of websites. A large percentage of the websites I’ve accessed over the last few days have been affected, forcing me to use a VPN to subvert the problem.
I’ve added more concrete data in a writeup on my personal website: https://adriano.fyi/post/2023/2023-04-16-att-traffic-shaping-makes-websites-unusable
The data paint a fairly clear picture of traffic shaping.
[ GIF IMAGE FROM ABOVE HERE ] [ NOTE: web-assets.strava.com resolves to Cloudflare, see the writeup on my site for details ]
I’d love to get a response from AT&T on the matter because this is a major performance degradation to the point that many websites are unusable. Is anyone else experiencing similar issues?
Cheers, AC
Forum Post 2: Are all new posts here marked as private?⌗
URL (currently public): https://bizcommunity.att.com/conversations/the-break-room/are-all-new-posts-here-marked-as-private/643c866bb67b2d067eb0985c
I posted earlier today regarding a traffic shaping issue that I’m experiencing. The title of the post is “I’m apparently experiencing extremely aggressive AT&T traffic shaping”. When I came back to check for any responses, I see that my previous post is marked as “Private”, which I don’t believe was under my control, and certainly wasn’t my intention.
This post is a test to see whether posts are automatically flagged as private, or if someone took action on my last post to mark it as private.
[edit] As of 2023-04-16 23:37 UTC, this post is not Private like my previous post.
Response and followup to Forum Post 2⌗
Notice the Private
label on my response. So it looks to others like I impolitely never respond to their polite response.
Update #6 (2023-04-17 04:12:00 UTC)⌗
This ended with the least satisfying end one could expect. Over on Hacker News kevin_nisbet
recommended changing the APN on the device to see if there were any routing differences. I changed the APN to NXTGENPHONE
, which I figured would work as a “generic LTE device”, per https://www.att.com/support/article/wireless/KM1062162/. However, the router was not able to authenticate with the NXTGENPHONE
APN, so I switched back to broadband
.
After switching back and re-authenticating with the broadband
APN, I’m no longer being throttled.
I’ll keep monitoring for future throttling and add any responses from the bizcommunity
admins.
For now, we’re done here.
Update #7 (2023-04-17 13:43:00)⌗
My original Forum Post 1
on the bizcommunity
is still marked as private and not deleted. In a followup to Forum Post 2, I asked that my original post be made public so I can have a discussion with other business community members to see if they’re having similar problems.
My Response and followup to Forum Post 2
has been deleted by a forum moderator. I presume if they provide any explanation, it will be that I linked to an external site. In any case, despite claiming to have direct messaged me, I never received a direct message from anyone, and they didn’t seem to care that I didn’t receive it.
Update #8 (2023-04-17 14:30:00)⌗
I figured it would be helpful to post some “post-incident” (I doubt it’s actually over) data for completeness.
Traceroute⌗
traceroute web-assets.strava.com 130 ↵
traceroute to web-assets.strava.com (13.225.103.44), 30 hops max, 60 byte packets
1 _gateway (192.168.13.31) 2.368 ms 2.292 ms 2.263 ms
2 172.26.96.161 (172.26.96.161) 73.895 ms 85.170 ms 85.204 ms
3 107.72.233.28 (107.72.233.28) 85.119 ms 107.72.233.4 (107.72.233.4) 85.447 ms 107.72.233.28 (107.72.233.28) 85.418 ms
4 12.83.188.161 (12.83.188.161) 85.394 ms 85.370 ms 85.346 ms
5 12.83.179.49 (12.83.179.49) 91.268 ms 91.242 ms 91.218 ms
6 ggr2.la2ca.ip.att.net (12.122.128.97) 91.180 ms 76.697 ms 76.656 ms
7 be3013.ccr41.lax05.atlas.cogentco.com (154.54.13.149) 1477.110 ms 79.291 ms 93.815 ms
8 be3243.ccr41.lax01.atlas.cogentco.com (154.54.27.117) 93.796 ms 93.779 ms 93.762 ms
9 be3176.ccr21.sjc01.atlas.cogentco.com (154.54.31.190) 377.948 ms be2327.ccr21.hkg02.atlas.cogentco.com (154.54.0.6) 237.357 ms 239.443 ms
10 be2414.rcr51.hkg01.atlas.cogentco.com (154.54.88.50) 239.426 ms 377.871 ms 239.644 ms
11 154.18.36.170 (154.18.36.170) 239.619 ms 239.602 ms be3585.ccr51.per01.atlas.cogentco.com (154.54.47.30) 307.318 ms
12 be3929.ccr71.tyo01.atlas.cogentco.com (154.54.83.189) 283.628 ms 283.597 ms 241.812 ms
13 * be2012.ccr51.tpe01.atlas.cogentco.com (66.28.4.234) 241.576 ms 241.701 ms
14 * be2226.ccr21.hkg02.atlas.cogentco.com (154.54.40.137) 1778.342 ms *
15 be2414.rcr51.hkg01.atlas.cogentco.com (154.54.88.50) 1776.152 ms 1776.096 ms *
16 154.18.36.170 (154.18.36.170) 1776.049 ms * amazon.demarc.cogentco.com (154.18.7.2) 238.541 ms
17 * * *
18 * * *
19 * * *
20 * * *
21 * server-13-225-103-44.hkg60.r.cloudfront.net (13.225.103.44) 336.093 ms *
We can see it’s taking a distinctly different route from Connected to router on AT&T business plan
from Update #2
. Current resolver is 1.1.1.1
. I imagine some of you will spot that this was routed to Hong Kong…from Durango, CO. The internet is a mysterious place.
Current DNS Resolution⌗
host web-assets.strava.com
web-assets.strava.com is an alias for dgpcy4fyk1eox.cloudfront.net.
dgpcy4fyk1eox.cloudfront.net has address 13.33.21.106
dgpcy4fyk1eox.cloudfront.net has address 13.33.21.119
dgpcy4fyk1eox.cloudfront.net has address 13.33.21.16
dgpcy4fyk1eox.cloudfront.net has address 13.33.21.44
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:8e00:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:1400:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:a600:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:9400:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:2600:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:ac00:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:ec00:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:3800:17:4613:2840:93a1
Note this is a completely different set of hosts from “during the incident”. I’m not sure that this new host list is in any way related to this being “post-incident”. Cloudflare runs a lot of POPs, and I’m not about to try to understand the interplay between my resolver 1.1.1.1
and Cloudflare’s global DNS. It’s notable, in any case.
During incident DNS resolution⌗
For reference, these are the host from “during the incident”. I can’t say with 100% certainty, but I believe I was using 1.1.1.1
as a resolver at the time, and whether I was connected via iphone tether or LTE router, I received the same host list:
host web-assets.strava.com
web-assets.strava.com is an alias for dgpcy4fyk1eox.cloudfront.net.
dgpcy4fyk1eox.cloudfront.net has address 99.84.208.46
dgpcy4fyk1eox.cloudfront.net has address 99.84.208.56
dgpcy4fyk1eox.cloudfront.net has address 99.84.208.10
dgpcy4fyk1eox.cloudfront.net has address 99.84.208.115
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:8000:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:e600:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:c600:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:b600:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:d000:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:6400:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:f000:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:400:17:4613:2840:93a1
Pcap data & analysis⌗
Here is a pcap file captured with sudo tcpdump host 'dgpcy4fyk1eox.cloudfront.net' -w post-incident-capture.pcap
while my router is connected to my AT&T business plan and fetching 827.js
with wget -O /dev/null -q --show-progress https://web-assets.strava.com/assets/federated/find-and-invite-friends/827.js
: post-incident-capture.pcap
Throughput and RTT graphs from Wireshark of the above pcap file
Update #9 (2023-04-17 15:39:00)⌗
For posterity, since bizcommunity
has deleted at least one post. I want to record my current comment awaiting response.
Update #10 (2023-04-20 17:17:00)⌗
My connection continues not to be throttled, but I still want an answer from AT&T. On that front, they’ve been dragging their feet.
AT&T’s bizcommunity
forum is comically poor. It’s not really what one thinks of as a forum. Nobody is having real conversations, and most posts are met by a canned response from the moderator saying “We want to help. Let’s meet in direct message on this…”. Handling everything in private eliminates the value in a forum. One person (maybe) gets helped, and everyone reading the post with the same problem is left wondering if the problem was ever solved.
In any case, this is AT&T’s latest “help” for me. They’ll only respond to my followup post that asks why my original post was marked as private. I still haven’t received a single response on the original post regarding the throttling problem, which links back to this post and all of its data.