How do mobile carriers know video resolution over HTTPS connections?

This is an active area of research. I happen to have done some work in this area, so I'll share what I can about the basic idea (this work was with industry partners and I can't share the secret details :) ).

The tl;dr is that it's often possible to identify an encrypted traffic stream as carrying video, and it's often possible to estimate its resolution - but it's complicated, and not always accurate. There are a lot of people working on ways to do this more consistently and more accurately.

Video traffic has some specific characteristics that can distinguish it from other kinds of traffic. Here I refer specifically to video on demand - not live streaming video. Video on demand doesn't often have those priority tags mentioned in this answer. Also I refer specifically to adaptive video, meaning that the video is divided into segments (each about 2-10 seconds long), and each segment of video is encoded at multiple quality levels (quality level meaning: long-term video bitrate, codec, and resolution). As you play the video, the quality level at which the next segment is downloaded depends on what data rate the application thinks your network can support. (That's the DASH protocol referred to in this answer.)

If your phone is playing a video, and you look at the (weighted moving average of) data rate of the traffic going to your phone over time, it might look something like this:

data rate over time

(this is captured from a YouTube session over Verizon. There's the moving average over 15 seconds and also short-term average.)

There are a few different parts to this session:

First, the video application (YouTube player) tries to fill the buffer up to the buffer capacity. During this time, it is pulling data at whatever rate the network can support. At this stage, it's basically indistinguishable from a large file download, unless you can infer that it's video traffic from the remote address (as mentioned in this answer).

Once the buffer is full, then you get "bursts" at sort-of-regular intervals. Suppose your buffer can hold 200 seconds of video. When the buffer has 200 seconds of video in it, the application stops downloading. Then after a segment of video has played back (say 5 seconds), there is room in the buffer again, so it'll download the next segment, then stop again. That's what causes this bursty pattern.

This pattern is very characteristic of video - traffic from other applications doesn't have this pattern - so a network service provider can pretty easily pick out flows that carry video traffic. In some cases, you might not ever observe this pattern - for example, if the video is so short that the entire thing is loaded into the buffer at once and then the client stops downloading. Under those circumstances, it's very difficult to distinguish video traffic from a file download (unless you can figure it out by remote address).

Anyway, once you have identified the flow as carrying video traffic - either by the remote address (not always possible, since major video providers use content distribution networks that are not exclusive to video) or by its traffic pattern (possible if the video session is long, much more difficult if it is so short that the whole video is loaded into the buffer all at once)...

Now, as Hector said, you can try to guess the resolution from the bitrate by looking at the size (in bytes) of each "burst" of data:

From the size per duration you could make a reasonable estimate of the resolution - especially if you keep a rolling average.

But, this can be difficult. Take the YouTube session in my example:

Not all segments are the same duration - the duration of video requested at a time depends on several factors (the quality level, network status, what kind of device you are playing the video on, and others). So you can't necessarily look at a "burst" and say, "OK, this was X bytes representing 5 seconds of video, so I know the video data rate". Sometimes you can figure out the likely segment duration but other times it is tricky.
For a given video quality level and segment duration, different segments will have different sizes (depending on things like how much motion takes place in that part of the video).
Even for the same video resolution, the long-term data rate can vary - a 1080p video encoded with VP9 won't have the same long-term data rate as one encoded with H.264.
The video quality level changes according to perceived network quality (which is visible to the network service provider) and buffer status (which is not). So you can look at long-term data rates over 30 seconds, but it's possible that the actual video quality level changed several times over that 30 seconds.
During periods when the buffer is draining or filling as fast as possible (when you don't have those "bursts"), it's much harder to estimate what's going on in the video.
To complicate things even further: sometimes a video flow will be "striped" across multiple lower-layer flows. Sometimes part of the video will be retrieved from one address, and then it will switch to retrieving the video from a different address.

That graph of data rate I showed you just above? Here's what the video resolution was over that time interval:

video resolution

Here, the color indicates the video resolution. So... you can sort of estimate what's going on just from the traffic patterns. But it's a difficult problem! There are other markers in the traffic that you can look at. I can't say definitively how any one service provider is doing it. But at least as far as the academic state-of-the-art goes, there isn't any way to do this with perfect accuracy, all of the time (unless you have the cooperation of the video providers...)

If you're interested in learning more about the techniques used for this kind of problem, there's a lot of academic literature out there - see for example BUFFEST: Predicting Buffer Conditions and Real-time Requirements of HTTP(S) Adaptive Streaming Clients as a starting point. (Not my paper - just one I happen to have read recently.)

Nothing maxes out bandwidth at a consistent rate other than streaming video.

Also, in order to make sure that stream is handled with priority (and not like a big file download, for instance) streaming sources tag the packets in a way that tells the carriers that it is streaming video. The rest of the packet is encrypted, but the metadata that tells the ISP how to route it gets to see this part. If they did not do this, there would be a high chance that the stream would get interrupted or degraded as the ISP tried to balance all the needs of the network traffic at that time.

And here is how Verizon said they will do it:

Verizon apparently won't be converting videos to lower resolutions itself. Instead, it will set a bandwidth limit that video applications will have to adjust to. "We manage HD video throughput by setting speeds at no more than 10Mbps, which provides HD video at up to 1080p video," Verizon told Ars. The Mbps will presumably be lower than that in cases where Verizon limits video to 480p or 720p.

That means that both the subscriber and the fact that the traffic is shaped a certain way because it is a certain type of video means it is tagged.

How? Verizon has a video optimization system that has been shown to limit Netflix and YouTube to 10 Mbps even before the Aug 2017 announcement of the new caps.

Verizon acknowledged using a new video optimization system but said it is part of a temporary test and that it did not affect the actual quality of video. The video optimization appears to apply both to unlimited and limited mobile plans.

But some YouTube users are reporting degraded video, saying that using a VPN service can bypass the Verizon throttling.

This points to the ability for Verizon to identify video streams and limit the bandwidth accordingly, even if the content is delivered over HTTPS (but not VPNs).

Schroeder is almost certainly right in that its just a marketing way of saying they restrict bandwidth to certain sites IP addresses or look for priority markers on the packets.

It is worth noting however that theoretically there are ways they could make this work better if the sole aim was to force users to a certain resolution while video streaming and nothing else.

Much internet streaming these days uses a process called DASH (Dynamic Adaptive Streaming over HTTP). The way that this works is to request a small chunk of video, measure the bandwidth while this is downloaded and select the next chunk of video at a resolution / compression scheme that would allow it to be received in time for when the first chunk has finished playing.

This means there are hints in the requests as to what the user is doing. If your device sends a request to a website every 3 seconds requesting a file that takes just under 3 seconds to download then there is a very high chance that site is streaming video. From the size per duration you could make a reasonable estimate of the resolution - especially if you keep a rolling average. You can then just restrict bandwidth to that ip address.

By using known IP addresses for major video providers (googlevideo (youtube), Netflix etc) in decision weighting you could make the algorithm more aggressive without too many false positives.

How do mobile carriers know video resolution over HTTPS connections?

Tags:

Encryption

Tls

Cellular

Related

Recent Posts