Post-incident Sumary

Hacker News Discussion

To summarize this post, I was seemingly throttled by AT&T for days. See the Original Post for the full write up.

While gathering data for this post and attempting solutions, I changed my LTE router’s APN, which caused it to re-authenticate to AT&T, and after I was re-authenticated I was no longer being throttled. One Hacker News user suggested that re-authenticating may have caused me to route through a different PGW (Packet Data Network Gateway) with a different traffic shaping policy.

I won’t know for certain unless AT&T responds directly to my request for more information.

The bottom of this post contains multiple updates from throughout the incident and beyond.

Original Post

When I’m living in my RV, wireless service providers are my primary source of connectivity. So when either AT&T or Verizon make major changes, I take notice.

I recently noticed that multiple websites are quite slow when browsing with my AT&T business plan, listed in AT&T Premier (business account management UI) as “Wireless Broadband Ultra for Router or Hotspt (sic)”. This is an “unlimited” 100Mbit plan with 50GB for Business Fast Track (prioritized) data. Being that I was far below the 50GB of monthly Fast Track data, my data should have had top priority, so I became suspicious. To be honest, I’ve never noticed a discernible difference between Fast Track and non-Fast Track data rates. This is all to say that I have no reason to believe that I’m being deprioritized due to usage.

Naturally, the first hing I did was conduct a speed test. I already knew from previous experience that for some reason, AT&T traffic to fast.com is throttled. Why AT&T wants bandwidth to appear lower than reality is a mystery to me, but I digress. Linode.com has speed tests that AT&T has no special treatment for, and the nearest one to me was in Fremont, CA.

The speedtest revealed 21Mbps down and 4.5Mbps up – pretty reasonable in a relatively rural area like Durango, CO. Latency was ~130ms. That speed certainly wouldn’t explain why it took anywhere between 15 seconds and 2 minutes to load strava.com.

So I opened up the “Network” tab in Firefox and could clearly see that dozens of resources from cloudfront.com were taking multiple seconds to load. The problem clearly has something to do with Cloudfront.

Is Cloudfront having problems? That’s easy enough to verify; my Sierra Wireless RV55 CAT-12 LTE-A router also contains an unlimited Verizon Business SIM card that I can use to conduct tests on Cloudfront, independent of AT&T.

I noticed that one of Strava’s javascript resources clocked in at 1.68MB, making it a nice test subject for speed tests (https://web-assets.strava.com/assets/federated/find-and-invite-friends/827.js). At the time of writing web-assets.strava.com resolves to dgpcy4fyk1eox.cloudfront.net, so rest assured, we are dealing with Cloudfront.

After switching to Verizon, I could see that Cloudfront was having no problems. Our friend 827.js downloaded in just over 1 second at 1.4MB/s. I clearly saw earlier in the Firefox network tab that this resource took nearly 1 minute to load on AT&T.

While wget is not my goto for command line HTTP fetching, it displays transfer rates in a human friendly format by default, so I used the following as my test case: wget -O /dev/null -q --show-progress https://web-assets.strava.com/assets/federated/find-and-invite-friends/827.js

So the problem isn’t Cloudfront, because Verizon was fast enough. It wasn’t blazing fast by any means, but I also didn’t have to wait 2 minutes to learn whether Strava awarded me King of the Moutain on a local trail (I wasn’t).

Maybe this is a global AT&T problem. That’s easy to test as well – my iPhone is also on the same AT&T business account as my data-only plan, so I turned on the iPhone hotspot and made it the router’s WAN device to make sure we’re changing a single variable at a time. I conducted another speed test, revealing 23Mbps down and 3Mbps up. Nothing surprising there – normal bandwidth fluctuations for a wireless device. How about our wget test? 1.7MB/s. The problem is clearly not all of AT&T wireless.

Let’s go back to the original configuration: connect directly to my AT&T data-only plan with my router and re-run the wget test. Maybe I was imagining things. I’m somewhat surprised to see the wget test with a transfer rate of ~30KB/s. I believe this rules out AT&T with a souring peering agreement somewhere between me and Cloudfront. My phone traffic to Cloudfront is unaffected; it’s only my data-only plan that is affected.

I now have a pretty clear picture of what is likely happening. Good old-fashion traffic shaping. Now that the router is connected directly to AT&T, the true test of traffic shaping is transfer rates while connected to a VPN. I’ll let the image speak for itself.


As of the time of writing, I’m unsure what is causing such a significant slowdown. It has rendered some websites effectively useless. Everything in this writeup indicates, to me, that AT&T is engaged in extremely aggressive traffic shaping for some plans, rendering many websites nearly unusable.

Do you have any ideas how to diagnose this problem further? Do you know the best way engage AT&T’s technical folks to take this seriously? Write me at att-traffic-shaping @ this domain. I’ll add updates here if anything changes, or I get a response on https://bizcommunity.att.com.

Updates


Update #1 (19:03:09 UTC)

Here is a pcap file captured with sudo tcpdump host 'dgpcy4fyk1eox.cloudfront.net' -w capture.pcap: capture.pcap

I’ve also attempted to disable/enable IPv6 on my local machine and my VPN connection to check for any differences. No differences were observed.

Update #2 (19:43:00 UTC)

Traceroutes

Connected to AT&T iPhone via Wifi hotspot

traceroute web-assets.strava.com                                                                                                                                                    130 ↵
traceroute to web-assets.strava.com (99.84.208.10), 30 hops max, 60 byte packets
 1  _gateway (172.20.10.1)  4.386 ms  4.280 ms  5.292 ms
 2  107.243.82.3 (107.243.82.3)  139.792 ms * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  12.117.216.210 (12.117.216.210)  244.538 ms  369.608 ms  933.487 ms
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  server-99-84-208-10.iad79.r.cloudfront.net (99.84.208.10)  206.664 ms  206.643 ms  206.623 ms

Connected to router on AT&T business plan

traceroute web-assets.strava.com
traceroute to web-assets.strava.com (99.84.208.115), 30 hops max, 60 byte packets
 1  _gateway (192.168.13.31)  61.896 ms  61.950 ms  62.064 ms
 2  172.26.96.161 (172.26.96.161)  132.156 ms  132.775 ms  132.749 ms
 3  107.72.231.188 (107.72.231.188)  132.743 ms  132.718 ms  132.199 ms
 4  * * *
 5  12.83.179.49 (12.83.179.49)  148.138 ms  148.187 ms  148.162 ms
 6  slkut21crs.ip.att.net (12.122.1.186)  148.361 ms  137.643 ms  142.401 ms
 7  dvmco22crs.ip.att.net (12.122.28.45)  146.657 ms  146.627 ms  149.389 ms
 8  cgcil21crs.ip.att.net (12.122.28.78)  149.369 ms  149.345 ms  216.643 ms
 9  12.122.28.206 (12.122.28.206)  149.302 ms  146.467 ms  149.256 ms
10  wshdc84crs.ip.att.net (12.122.135.230)  146.421 ms  149.211 ms  149.161 ms
11  wshdc406me9.ip.att.net (12.123.10.125)  143.819 ms  139.466 ms  139.378 ms
12  12.117.216.210 (12.117.216.210)  139.342 ms  139.314 ms  187.019 ms
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  server-99-84-208-115.iad79.r.cloudfront.net (99.84.208.115)  152.577 ms  140.190 ms  141.460 ms

Update #3 (21:54:03 UTC)

Here is a pcap file captured with sudo tcpdump host 'dgpcy4fyk1eox.cloudfront.net' -w iphone-capture.pcap while tethered to my AT&T iPhone and fetching 827.js with wget -O /dev/null -q --show-progress https://web-assets.strava.com/assets/federated/find-and-invite-friends/827.js : iphone-capture.pcap

Update #4 (23:39:00 UTC)

As my initial post mentioned, I had hoped to get some sort of resolution by posting to http://bizcommunity.att.com. As of right now, that post https://bizcommunity.att.com/conversations/the-break-room/im-apparently-experiencing-extremely-aggressive-att-traffic-shaping/643c3d81aba4c269accb1a2b has been marked as “Private”. This URL also 404s, unless I’m logged in under the account with which I posted it.

I can’t say whether this is normal practice on the bizcommunity forum, but I posted a followup post with a much more benign title, “Are all new posts here marked as private?” https://bizcommunity.att.com/conversations/the-break-room/are-all-new-posts-here-marked-as-private/643c866bb67b2d067eb0985c, which as of right now is not private and still accessible to the public.

Update #5 (23:54:00 UTC)

Hello reader, I’m another HN member trying to help get to the bottom of this! I’ve analyzed all the data provided and feel confident there is some kind of manipulation / throttling happening with Adriano’s AT&T business plan, but if anyone wants to correct my interpretation or provide other insights feel free to reach out to me on my blog!

We start by analyzing the packet capture from the throttled connection. There are a few takeaways from this:

  • TCP segment & window sizes have normal values.
  • There are almost no retransmissions or lost packets.
  • There appears to be ample room for the TCP stream to scale up to a much faster speed.
  • The TCP stream starts scaling up to 800Kbps before dramatically going down.
  • The TCP stream regularly blocks for 50-200ms intervals waiting for new data to arrive from cloudfront.
  • These blocks coincide with throughput scaling up to ~350Kbps.

All of the spikes in round trip time correspond to a segment of the stream where the client acknowladges receipt of new data and is forced to wait 50-200ms for any new data to arrive. This happens at a regular cadence that appears artificial in nature (i.e not regular network congestion) because the throughput chart stays within a well defined boundary.

If this were a problem with AT&T’s network or a side effect of LTE networks being inherently high latency we would expect to observe similair patterns over the iphone connection:

We can draw a few conclusions from these charts:

  • The iphone AT&T connection doesn’t exhibit the same RTT or throughput pattern despite having similair latency & network routing.
  • The throughput chart is indicative of a healthy TCP stream that slowly scales up the total bandwidth to make the most out of available network resources.
  • We are never waiting very long for fresh data to arrive.

In my opinion these patterns rule out any extraneous factors affecting the network and are indicative of intentional manipulation of the TCP stream. If AT&T were to occasionaly drop an incoming packet than the client would never send an acknowledgement to cloudfront and after a little while cloudfront would automatically retransmit that packet, on our client side packet capture we would observe this as a packet coming in with a big delay, if we were being rate limited we would expect to see these delayed packets coming in regularly around the time when we were starting to hit a threshold, it seems likely AT&T is dropping certain inbound packets once the incoming rate reaches around ~350Kbps.

Update #6 (00:43:00 UTC)

Refer to Update #4 above: I’m getting shut down on the AT&T bizcommunity site. My posts are being marked private, so I’ll save them here for posterity.

Forum Post 1: I’m apparently experiencing extremely aggressive AT&T traffic shaping

URL (was public): https://bizcommunity.att.com/conversations/the-break-room/im-apparently-experiencing-extremely-aggressive-att-traffic-shaping/643c3d81aba4c269accb1a2b

The short of it is that I’m experiencing what appears to be aggressive traffic shaping by AT&T. Many sites are affected, because Cloudflare is one content delivery network for which AT&T is apparently shaping my traffic.

This is a business account. I have both cell phone devices and a data-only plan “Wireless Broadband Ultra for Router or Hotspt (sic)” on the account. Only the data-only plan is affected.

Because Cloudflare is such a large content delivery network, this affects millions of websites. A large percentage of the websites I’ve accessed over the last few days have been affected, forcing me to use a VPN to subvert the problem.

I’ve added more concrete data in a writeup on my personal website: https://adriano.fyi/post/2023/2023-04-16-att-traffic-shaping-makes-websites-unusable

The data paint a fairly clear picture of traffic shaping.

[ GIF IMAGE FROM ABOVE HERE ] [ NOTE: web-assets.strava.com resolves to Cloudflare, see the writeup on my site for details ]

I’d love to get a response from AT&T on the matter because this is a major performance degradation to the point that many websites are unusable. Is anyone else experiencing similar issues?

Cheers, AC

Forum Post 2: Are all new posts here marked as private?

URL (currently public): https://bizcommunity.att.com/conversations/the-break-room/are-all-new-posts-here-marked-as-private/643c866bb67b2d067eb0985c

I posted earlier today regarding a traffic shaping issue that I’m experiencing. The title of the post is “I’m apparently experiencing extremely aggressive AT&T traffic shaping”. When I came back to check for any responses, I see that my previous post is marked as “Private”, which I don’t believe was under my control, and certainly wasn’t my intention.

This post is a test to see whether posts are automatically flagged as private, or if someone took action on my last post to mark it as private.

[edit] As of 2023-04-16 23:37 UTC, this post is not Private like my previous post.

Response and followup to Forum Post 2

Notice the Private label on my response. So it looks to others like I impolitely never respond to their polite response.

Update #6 (2023-04-17 04:12:00 UTC)

This ended with the least satisfying end one could expect. Over on Hacker News kevin_nisbet recommended changing the APN on the device to see if there were any routing differences. I changed the APN to NXTGENPHONE, which I figured would work as a “generic LTE device”, per https://www.att.com/support/article/wireless/KM1062162/. However, the router was not able to authenticate with the NXTGENPHONE APN, so I switched back to broadband.

After switching back and re-authenticating with the broadband APN, I’m no longer being throttled.

I’ll keep monitoring for future throttling and add any responses from the bizcommunity admins.

For now, we’re done here.

Update #7 (2023-04-17 13:43:00)

My original Forum Post 1 on the bizcommunity is still marked as private and not deleted. In a followup to Forum Post 2, I asked that my original post be made public so I can have a discussion with other business community members to see if they’re having similar problems.

My Response and followup to Forum Post 2 has been deleted by a forum moderator. I presume if they provide any explanation, it will be that I linked to an external site. In any case, despite claiming to have direct messaged me, I never received a direct message from anyone, and they didn’t seem to care that I didn’t receive it.

Update #8 (2023-04-17 14:30:00)

I figured it would be helpful to post some “post-incident” (I doubt it’s actually over) data for completeness.

Traceroute

traceroute web-assets.strava.com                                                                                                                                                    130 ↵
traceroute to web-assets.strava.com (13.225.103.44), 30 hops max, 60 byte packets
 1  _gateway (192.168.13.31)  2.368 ms  2.292 ms  2.263 ms
 2  172.26.96.161 (172.26.96.161)  73.895 ms  85.170 ms  85.204 ms
 3  107.72.233.28 (107.72.233.28)  85.119 ms 107.72.233.4 (107.72.233.4)  85.447 ms 107.72.233.28 (107.72.233.28)  85.418 ms
 4  12.83.188.161 (12.83.188.161)  85.394 ms  85.370 ms  85.346 ms
 5  12.83.179.49 (12.83.179.49)  91.268 ms  91.242 ms  91.218 ms
 6  ggr2.la2ca.ip.att.net (12.122.128.97)  91.180 ms  76.697 ms  76.656 ms
 7  be3013.ccr41.lax05.atlas.cogentco.com (154.54.13.149)  1477.110 ms  79.291 ms  93.815 ms
 8  be3243.ccr41.lax01.atlas.cogentco.com (154.54.27.117)  93.796 ms  93.779 ms  93.762 ms
 9  be3176.ccr21.sjc01.atlas.cogentco.com (154.54.31.190)  377.948 ms be2327.ccr21.hkg02.atlas.cogentco.com (154.54.0.6)  237.357 ms  239.443 ms
10  be2414.rcr51.hkg01.atlas.cogentco.com (154.54.88.50)  239.426 ms  377.871 ms  239.644 ms
11  154.18.36.170 (154.18.36.170)  239.619 ms  239.602 ms be3585.ccr51.per01.atlas.cogentco.com (154.54.47.30)  307.318 ms
12  be3929.ccr71.tyo01.atlas.cogentco.com (154.54.83.189)  283.628 ms  283.597 ms  241.812 ms
13  * be2012.ccr51.tpe01.atlas.cogentco.com (66.28.4.234)  241.576 ms  241.701 ms
14  * be2226.ccr21.hkg02.atlas.cogentco.com (154.54.40.137)  1778.342 ms *
15  be2414.rcr51.hkg01.atlas.cogentco.com (154.54.88.50)  1776.152 ms  1776.096 ms *
16  154.18.36.170 (154.18.36.170)  1776.049 ms * amazon.demarc.cogentco.com (154.18.7.2)  238.541 ms
17  * * *
18  * * *
19  * * *
20  * * *
21  * server-13-225-103-44.hkg60.r.cloudfront.net (13.225.103.44)  336.093 ms *

We can see it’s taking a distinctly different route from Connected to router on AT&T business plan from Update #2. Current resolver is 1.1.1.1. I imagine some of you will spot that this was routed to Hong Kong…from Durango, CO. The internet is a mysterious place.

Current DNS Resolution

host web-assets.strava.com
web-assets.strava.com is an alias for dgpcy4fyk1eox.cloudfront.net.
dgpcy4fyk1eox.cloudfront.net has address 13.33.21.106
dgpcy4fyk1eox.cloudfront.net has address 13.33.21.119
dgpcy4fyk1eox.cloudfront.net has address 13.33.21.16
dgpcy4fyk1eox.cloudfront.net has address 13.33.21.44
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:8e00:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:1400:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:a600:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:9400:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:2600:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:ac00:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:ec00:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2363:3800:17:4613:2840:93a1

Note this is a completely different set of hosts from “during the incident”. I’m not sure that this new host list is in any way related to this being “post-incident”. Cloudflare runs a lot of POPs, and I’m not about to try to understand the interplay between my resolver 1.1.1.1 and Cloudflare’s global DNS. It’s notable, in any case.

During incident DNS resolution

For reference, these are the host from “during the incident”. I can’t say with 100% certainty, but I believe I was using 1.1.1.1 as a resolver at the time, and whether I was connected via iphone tether or LTE router, I received the same host list:

host web-assets.strava.com
web-assets.strava.com is an alias for dgpcy4fyk1eox.cloudfront.net.
dgpcy4fyk1eox.cloudfront.net has address 99.84.208.46
dgpcy4fyk1eox.cloudfront.net has address 99.84.208.56
dgpcy4fyk1eox.cloudfront.net has address 99.84.208.10
dgpcy4fyk1eox.cloudfront.net has address 99.84.208.115
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:8000:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:e600:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:c600:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:b600:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:d000:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:6400:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:f000:17:4613:2840:93a1
dgpcy4fyk1eox.cloudfront.net has IPv6 address 2600:9000:2199:400:17:4613:2840:93a1

Pcap data & analysis

Here is a pcap file captured with sudo tcpdump host 'dgpcy4fyk1eox.cloudfront.net' -w post-incident-capture.pcap while my router is connected to my AT&T business plan and fetching 827.js with wget -O /dev/null -q --show-progress https://web-assets.strava.com/assets/federated/find-and-invite-friends/827.js : post-incident-capture.pcap

Throughput and RTT graphs from Wireshark of the above pcap file

Update #9 (2023-04-17 15:39:00)

For posterity, since bizcommunity has deleted at least one post. I want to record my current comment awaiting response.

Update #10 (2023-04-20 17:17:00)

My connection continues not to be throttled, but I still want an answer from AT&T. On that front, they’ve been dragging their feet.

AT&T’s bizcommunity forum is comically poor. It’s not really what one thinks of as a forum. Nobody is having real conversations, and most posts are met by a canned response from the moderator saying “We want to help. Let’s meet in direct message on this…”. Handling everything in private eliminates the value in a forum. One person (maybe) gets helped, and everyone reading the post with the same problem is left wondering if the problem was ever solved.

In any case, this is AT&T’s latest “help” for me. They’ll only respond to my followup post that asks why my original post was marked as private. I still haven’t received a single response on the original post regarding the throttling problem, which links back to this post and all of its data.