How do search engines deal with AngularJS applications?

Update May 2014

Google crawlers now executes javascript - you can use the Google Webmaster Tools to better understand how your sites are rendered by Google.

Original answer
If you want to optimize your app for search engines there is unfortunately no way around serving a pre-rendered version to the crawler. You can read more about Google's recommendations for ajax and javascript-heavy sites here.

If this is an option I'd recommend reading this article about how to do SEO for Angular with server-side rendering.

I’m not sure what the crawler does when it encounters custom tags.


Let's get definitive about AngularJS and SEO

Google, Yahoo, Bing, and other search engines crawl the web in traditional ways using traditional crawlers. They run robots that crawl the HTML on web pages, collecting information along the way. They keep interesting words and look for other links to other pages (these links, the amount of them and the number of them come into play with SEO).

So why don't search engines deal with javascript sites?

The answer has to do with the fact that the search engine robots work through headless browsers and they most often do not have a javascript rendering engine to render the javascript of a page. This works for most pages as most static pages don't care about JavaScript rendering their page, as their content is already available.

What can be done about it?

Luckily, crawlers of the larger sites have started to implement a mechanism that allows us to make our JavaScript sites crawlable, but it requires us to implement a change to our site.

If we change our hashPrefix to be #! instead of simply #, then modern search engines will change the request to use _escaped_fragment_ instead of #!. (With HTML5 mode, i.e. where we have links without the hash prefix, we can implement this same feature by looking at the User Agent header in our backend).

That is to say, instead of a request from a normal browser that looks like:

http://www.ng-newsletter.com/#!/signup/page

A search engine will search the page with:

http://www.ng-newsletter.com/?_escaped_fragment_=/signup/page

We can set the hash prefix of our Angular apps using a built-in method from ngRoute:

angular.module('myApp', [])
.config(['$location', function($location) {
  $location.hashPrefix('!');
}]);

And, if we're using html5Mode, we will need to implement this using the meta tag:

<meta name="fragment" content="!">

Reminder, we can set the html5Mode() with the $location service:

angular.module('myApp', [])
.config(['$location', 
function($location) {
  $location.html5Mode(true);
}]);

Handling the search engine

We have a lot of opportunities to determine how we'll deal with actually delivering content to search engines as static HTML. We can host a backend ourselves, we can use a service to host a back-end for us, we can use a proxy to deliver the content, etc. Let's look at a few options:

Self-hosted

We can write a service to handle dealing with crawling our own site using a headless browser, like phantomjs or zombiejs, taking a snapshot of the page with rendered data and storing it as HTML. Whenever we see the query string ?_escaped_fragment_ in a search request, we can deliver the static HTML snapshot we took of the page instead of the pre-rendered page through only JS. This requires us to have a backend that delivers our pages with conditional logic in the middle. We can use something like prerender.io's backend as a starting point to run this ourselves. Of course, we still need to handle the proxying and the snippet handling, but it's a good start.

With a paid service

The easiest and the fastest way to get content into search engine is to use a service Brombone, seo.js, seo4ajax, and prerender.io are good examples of these that will host the above content rendering for you. This is a good option for the times when we don't want to deal with running a server/proxy. Also, it's usually super quick.

For more information about Angular and SEO, we wrote an extensive tutorial on it at http://www.ng-newsletter.com/posts/serious-angular-seo.html and we detailed it even more in our book ng-book: The Complete Book on AngularJS. Check it out at ng-book.com.


(2022) Use Server Side Rendering if possible, and generate URLs with Pushstate

Google can and will run JavaScript now so it is very possible to build a site using only JavaScript provided you create a sensible URL structure. However, pagespeed has become a progressively more important ranking factor and typically pages built clientside perform poorly on initial render.

Serverside rendering (SSR) can help by allowing your pages to be pre-generated on the server. Your html containst the div that will be used as the page root, but this is not an empty div, it contains the html that the JavaScript would have generated if it were allowed to run.

The client downloads the HTML and renders it giving a very fast initial load, then it executes the JavaScript replacing the content of the root div with generated content in a process known as hydration.

Many newer frameworks come with SSR built in, notably NextJS.

(2015) Use PushState and Precomposition

The current (2015) way to do this is using the JavaScript pushState method.

PushState changes the URL in the top browser bar without reloading the page. Say you have a page containing tabs. The tabs hide and show content, and the content is inserted dynamically, either using AJAX or by simply setting display:none and display:block to hide and show the correct tab content.

When the tabs are clicked, use pushState to update the URL in the address bar. When the page is rendered, use the value in the address bar to determine which tab to show. Angular routing will do this for you automatically.

Precomposition

There are two ways to hit a PushState Single Page App (SPA)

  1. Via PushState, where the user clicks a PushState link and the content is AJAXed in.
  2. By hitting the URL directly.

The initial hit on the site will involve hitting the URL directly. Subsequent hits will simply AJAX in content as the PushState updates the URL.

Crawlers harvest links from a page then add them to a queue for later processing. This means that for a crawler, every hit on the server is a direct hit, they don't navigate via Pushstate.

Precomposition bundles the initial payload into the first response from the server, possibly as a JSON object. This allows the Search Engine to render the page without executing the AJAX call.

There is some evidence to suggest that Google might not execute AJAX requests. More on this here:

https://web.archive.org/web/20160318211223/http://www.analog-ni.co/precomposing-a-spa-may-become-the-holy-grail-to-seo

Search Engines can read and execute JavaScript

Google has been able to parse JavaScript for some time now, it's why they originally developed Chrome, to act as a full featured headless browser for the Google spider. If a link has a valid href attribute, the new URL can be indexed. There's nothing more to do.

If clicking a link in addition triggers a pushState call, the site can be navigated by the user via PushState.

Search Engine Support for PushState URLs

PushState is currently supported by Google and Bing.

Google

Here's Matt Cutts responding to Paul Irish's question about PushState for SEO:

http://youtu.be/yiAF9VdvRPw

Here is Google announcing full JavaScript support for the spider:

http://googlewebmastercentral.blogspot.de/2014/05/understanding-web-pages-better.html

The upshot is that Google supports PushState and will index PushState URLs.

See also Google webmaster tools' fetch as Googlebot. You will see your JavaScript (including Angular) is executed.

Bing

Here is Bing's announcement of support for pretty PushState URLs dated March 2013:

http://blogs.bing.com/webmaster/2013/03/21/search-engine-optimization-best-practices-for-ajax-urls/

Don't use HashBangs #!

Hashbang URLs were an ugly stopgap requiring the developer to provide a pre-rendered version of the site at a special location. They still work, but you don't need to use them.

Hashbang URLs look like this:

domain.example/#!path/to/resource

This would be paired with a metatag like this:

<meta name="fragment" content="!">

Google will not index them in this form, but will instead pull a static version of the site from the escaped_fragments URL and index that.

Pushstate URLs look like any ordinary URL:

domain.example/path/to/resource

The difference is that Angular handles them for you by intercepting the change to document.location dealing with it in JavaScript.

If you want to use PushState URLs (and you probably do) take out all the old hash style URLs and metatags and simply enable HTML5 mode in your config block.

Testing your site

Google Webmaster tools now contains a tool which will allow you to fetch a URL as Google, and render JavaScript as Google renders it.

https://www.google.com/webmasters/tools/googlebot-fetch

Generating PushState URLs in Angular

To generate real URLs in Angular, rather than # prefixed ones, set HTML5 mode on your $locationProvider object.

$locationProvider.html5Mode(true);

Server Side

Since you are using real URLs, you will need to ensure the same template (plus some precomposed content) gets shipped by your server for all valid URLs. How you do this will vary depending on your server architecture.

Sitemap

Your app may use unusual forms of navigation, for example hover or scroll. To ensure Google is able to drive your app, I would probably suggest creating a sitemap, a simple list of all the URLs your app responds to. You can place this at the default location (/sitemap or /sitemap.xml), or tell Google about it using webmaster tools.

It's a good idea to have a sitemap anyway.

Browser support

Pushstate works in IE10. In older browsers, Angular will automatically fall back to hash style URLs

A demo page

The following content is rendered using a pushstate URL with precomposition:

http://html5.gingerhost.com/london

As can be verified, at this link, the content is indexed and is appearing in Google.

Serving 404 and 301 Header status codes

Because the search engine will always hit your server for every request, you can serve header status codes from your server and expect Google to see them.