Content hashes to help protect resources being fetched from a CDN

Update: There is more information on Subresource Integrity at MDN, which (as of 12/12/16) shows support in Chrome 45+ and FireFox(Gecko) as of 43+

Update: There is a w3c draft called Subresource Integrity describing a feature like this.

It's already implemented in Chromium.

For example:

<script src="file.js" integrity="ni://sha256;BpfBw7ivV8q2jLiT13…"></script>

The basic approach is sound IMO, but there are a few details to take care of:

  • You should support several hashes on a single tag. The browser doesn't need to validate all of them, validating one collision resistant hash is enough.
  • Being able to specify the size seems useful to avoid some kind of DoS where your site is fed a huge resource
  • Unless you're using a tree-hash, you can't verify incomplete files. That's not an issue for 100kB javascript files, but it is for a 5 GB video. So support for tree hashes should be added later on.
  • I'd use algorithm identifiers matching NI - Naming Things with Hashes, and use urlsafe Base64 without padding
  • I'd specify SHA-256 as the standard algorithm every browser should support, but allow browsers to add other algorithms. SHA-256 is:

    • collision resistant at a 128 bit level
    • a NIST standard, implementations are widely available(unlike SHA-3)
    • performance isn't great, but still fast enough to keep up with typical network speeds on mobile devices.

    IMO SHA-256 is the ideal choice for the default/mandatory algorithm.

  • You should support all embedded resources, CSS, images, videos etc. and not just scripts

  • One could consider using NI urls instead, but I prefer the attribute based approach here. An attribute is more flexible and doesn't require cooperation of the target host to implement. NI can only specify a single hash per url.
  • You could disable mixed content warnings for securely hashed content that was fetched via http
  • It's a great way to see if your cache is still valid. It's valid if and only if the hash matches. No need for rechecking, dates, etc. This also works if you downloaded the resource from a different url. For example if you already have jquery from google in your cache, you don't need to reload it from another url since the same hash guarantees[assuming collision resistance] that they'll be the same.
  • There are probably some issues related to authenticating http headers since those influence the interpretation of the resource. For example mime type and charset/encoding are such headers.

So an example might look like this:

<script src="http://example.cdn/jq/jquery-1.2.3.js"
     hash="sha-256:UyaQV-Ev4rdLoHyJJWCi11OHfrYv9E1aGQAlMO2X_-Q; size:103457;
           other-hash: abc..." />

To add to the good points from @CodesInChaos: there was a much older mechanism to support signed Javascript. This comes from the days of Netscape 4 and it is still documented, but it is unclear whether this is still supported in Firefox. Internet Explorer never supported it, although the people at Microsoft toyed with the idea. The system piggybacked on the Jar file format, which came from the Java world.

Your method has the appeal of looking simple to implement; and it could be done in Javascript directly (the order of magnitude of Javascript performance for hashing is roughly 1 MB/s, which should be sufficient for scripts).

One drawback of your system, to be aware of: if you modify the script, you must alter all the pages which reference it with an explicit hash; this can be quite inconvenient in a big site (ready for a search-and-replace over 10000 static files ?). This is where signatures can offer more flexibility.