How do all of these "Save video from YouTube" services work?

There is a very popular open source command-line downloader called youtube-dl, which does exactly that. It grabs the actual video and audio file links from a given YouTube link – or any other popular web video site like Vimeo, Yahoo! Video, uStream, etc.

To see how that's done, look into the YouTube extractor. That's just too much to show here. Other extractors exist for simpler sites. Steven Penny has a simple JavaScript downloader for YouTube too, which is a little more straightforward.

But basically, for a Flash video player, it must be initialized and configured through some JavaScript. Simply speaking, the Flash object's player will receive a URL of a video stream to load.

In order to find the video stream, you'd have to parse the HTML and JS code of the video page to find the relevant initialization code, and then from there try to find the link to the actual MP4 file. It might be there in plaintext, but it could also be generated on the fly with some specific download tokens. Often, the JavaScript is obfuscated to make it harder to re-engineer it. Or the video information might be contained in an XML file that's loaded asynchronously by JS.

For HTML5 progressive download video, the actual source file is usually mentioned directly in the source child of the video tag, so if you'd search the page for mp4 or similar. For example on German news show Tagesschau 100, you'll find:

<source src="http://media.tagesschau.de/video/2014/0626/TV-20140626-1649-5801.webl.h264.mp4" type="video/mp4">

For more advanced playback technologies like MPEG DASH or Apple's HTTP Live Streaming (HLS), you have to parse a meta-information file to get the actual video stream. The meta file (.mpd for example in DASH, and .m3u8 for HLS) will contain links to segments of video and audio, which you'd later have to combine to get a playable file.

There's no general solution for this. It requires careful inspection and debugging of the target site.

YouTube Bookmarklet

This is how I did it with JavaScript

Start with ytplayer.config.args object. This contains all URLs for the video. It is broken up into

url_encoded_fmt_stream_map // traditional: contains video and audio stream
adaptive_fmts              // DASH: contains video or audio stream

Each of these is a comma separated array of what I would call "stream objects". Each "stream object" will contain values like this

url  // direct HTTP link to a video
itag // code specifying the quality
s    // signature, security measure to counter downloading

Each URL will be encoded so you will need to decode them. Now the tricky part.

YouTube has at least 3 security levels for their videos

unsecured // as expected, you can download these with just the unencoded URL
s         // see below
RTMPE     // uses "rtmpe://" protocol, no known method for these

The RTMPE videos are typically used on official full length movies, and are protected with SWF Verification Type 2. This has been around since 2011 and has yet to be reverse engineered.

The type "s" videos are the most difficult that can actually be downloaded. You will typcially see these on VEVO videos and the like. They start with a signature such as

AA5D05FA7771AD4868BA4C977C3DEAAC620DE020E.0F421820F42978A1F8EAFCDAC4EF507DB5

Then the signature is scrambled with a function like this

function mo(a) {
  a = a.split("");
  a = lo.rw(a, 1);
  a = lo.rw(a, 32);
  a = lo.IC(a, 1);
  a = lo.wS(a, 77);
  a = lo.IC(a, 3);
  a = lo.wS(a, 77);
  a = lo.IC(a, 3);
  a = lo.wS(a, 44);
  return a.join("")
}

This function is dynamic, it typically changes every day. To make it more difficult the function is hosted at a URL such as

http://s.ytimg.com/yts/jsbin/html5player-en_US-vflycBCEX.js

this introduces the problem of Same-origin policy. Essentially, you cannot download this file from www.youtube.com because they are different domains. A workaround of this problem is CORS. With CORS, s.ytimg.com could add this header

Access-Control-Allow-Origin: http://www.youtube.com

and it would allow the JavaScript to download from www.youtube.com. Of course they do not do this. A workaround for this workaround is to use a CORS proxy. This is a proxy that responds with the following header to all requests

Access-Control-Allow-Origin: *

So, now that you have proxied your JS file, and used the function to scramble the signature, you can use that in the querystring to download a video.

How do all of these "Save video from YouTube" services work?

Tags:

Youtube

Related

Recent Posts