How should I structure my URLs for both SEO and localization?

There are many acceptable ways to structure your site for both SEO and internationalization. Each have advantages and disadvantages.

Top Level Domains

Buy the same domain name at multiple top level country domains like example.com, example.es and example.de.

Advantages

  • Fully supported by Google. You can add the sites to Google Webmaster Tools where there are options to tell Google about how they are targeted.
  • Often preferred by users who tend to like content published on the TLD for their country
  • The domain name itself can be localized. Many international users may react badly to English words or an English sounding domain name. This can be especially important for languages that do not use a Latin alphabet.
  • Supports localization by country. You can have separate sites like example.co.uk and example.com.au targeted at audiences in different countries. The sites may have duplicate content with slight spelling differences and still rank well. In fact, multiple well localized sites in the same language may rank better than a single site in that language.
  • Hosting can be localized by pointing DNS to a web server in the country being targeted.

Disadvantages

  • Expensive and time consuming to buy many domains. Especially if you have to deal with squatters.
  • Cookies cannot be shared across multiple locales, meaning that users have to log in separately to each site.
  • No good option for localizing only by language since many languages have multiple countries and no country TLD may be the language code. Even in cases where the TLD does match the language code like es, search engines may assume that the site is only appropriate for users from Spain, not for all Spanish speakers.

Sub-domains

Buy a single domain, and use sub-domains such as en.example.com, and es.example.com

Advantages

  • Fully supported by Google.
  • Supports localization by country or by language.
  • Hosting can be localized by pointing DNS to a web server located close to users.
  • Easy and cheap to implement compared to buying multiple domains.
  • Cookies can be shared across all locales, enabling single sign on for a more seamless user experience.

Disadvantages

  • No opportunity to localize the domain name itself
  • May look less local to users compared to a top level domain.

Sub-directories

Buy a single domain, and use sub-directories such as example.com/en/, and example.com/es/

Advantages and Disadvantages

  • The same as sub-domains, except that there is one DNS entry which precludes hosting your site in multiple countries for different locales.

Techniques that are NOT recommended

  • File Names: Using different file names such as index_en.html and index_de.html. This technique is not fully supported by Google. For example, there is no way to set targeting in webmaster tools.
  • URL Parameters: Using URL parameters such as lang=en. It is not recommended for the same reason that different file names are not recommended.
  • Accept Language Header: Automatically switching the language based on the Accept-Language header.
    • Many users do not have this header set correctly. This is especially true for users traveling abroad that may be using a friend's computer, or an internet cafe. It is also often true for international users that install an English web browser and know enough English to be get around, but would prefer content in a different language.
    • Google just announced that Googlebot will send the Accept-Language header and crawl from different geographic locations. However, Google still recommends that you have separate URLs for content in different languages.
    • You may use the Accept-Language header to suggest that users might prefer a different version of the site by displaying a message when the site they are visiting does not match the Accept-Language header.
  • Geographic IP Addresses: Automatically switching the language based where the IP address is geographically located.
    • Geo-ip databases are inaccurate. Up to 10% of visitors may be assigned to the incorrect country.
    • Some countries (like Canada) use more than one language
    • You may use the IP address country to suggest a language or languages that a user may be interested in.

On-page Markup

When supporting multiple languages, you should clearly mark up with language meta-data.

Use the lang attribute in the html tag:

<html lang="en">

Use rel alternate links to the same page in other languages as suggested by Google:

<link rel="alternate" hreflang="es" href="http://www.example.com/" />
<link rel="alternate" hreflang="es-ES" href="http://es-es.example.com/" />
<link rel="alternate" hreflang="es-MX" href="http://es-mx.example.com/" /> 
<link rel="alternate" hreflang="en" href="http://en.example.com/" />

Alternately, this information can be put into sitemap files.

Tell Google About Your Site

You should add each language (or locale) of your site to Google Webmaster Tools. This can be done for top level domains, for sub- domains, or for sub-directories.

If your site is targeted by country, you should use webmaster tools to set the site targeting. Navigate to "Configuration" -> "Settings" -> "Geographic target" and choose to target the correct country from the drop down list.


Answering a question similar to yours on his blog, Matt Cutts suggests:

If you have sites with say French and German versions for a business, my preferences would be:

  1. ccTLDS such as example.fr or example.de
  2. After than, subdomains such as fr.example.com or de.example.com.
  3. If that’s not possible, I’d use subdirectories such as example.com/fr/ or example.com/de/

As a German user I hate it when a website won't let me on the English page because it's think it knows better what I want. It might be hard for Americans to understand but there are actually people who speak more than one language.

Sometimes I might want to view the German websites and sometimes I might want to view the English one.

Simply parsing the Accept-Language header might drive me mad.

That especially true if your German page is a cheap translation of your English page.

To make it easy for your user, the English version should also have localisation such as domain.com/en/ or en.domain.com.

When I type domain.com you get one guess to give me the English or the German page based on my Accept-Language header. If I however don't like your choice, I should be able to simply exchange the language in the domain name.

Extra hint: If you have the language in front of the domain name both typing ger.domain.com and de.domain.com should bring me to the German website.