Stop search engines to index specific parts of the page

As I understood he issue is that you have same content for many urls. Like:

www.my-awesome-domain.com/my-book/page/42

www.my-awesome-domain.com//my-book/page/7

And the visible content of the page is adjustable by JavaScript, that User Execute when he clicks some elements on your site.

In This case you need to do 2 things:

  1. Mark your URL's as Canonical pages in any of the ways described in this google document: https://support.google.com/webmasters/answer/139066?hl=en
  2. You need add a feature that each page will load to the same state after full page refresh, for example you can use hash parameter when navigating as desiribed in the article here: or here is the overview of the technique

Today google bot is executing JavaScript as announced in their official blog: https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html

So if you achieve proper page behavior when hitting Refresh (F5) and Will specify the canonical pages property, pages will be correctly crawled, and when you will follow the link you will get to the linked page.

If you need more guidance how to do it in url.js Please post another question (so it's will be proper documented for others) and I will be glad to help.


The answere is really simple: you can't do it. There is no technical possibility to keep the same content under different URLs and ask search engines to index only part of it.

If you are OK with having only one page indexed you can use, as suggested before, canonical URLs. You place the canonical URL that links to the main page on every sub-page.

You may find a "hack" that uses special tags used for Google Search Appliance: googleon and googleoff.

https://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/admin_crawl/preparing.html

The only issue is this will most likely not work with Google Bot (at least no one will guarantee it will) or any other search engine.


I dont think you will be able to achieve what you are looking for.

I cant see how robots.txt would have any affect. Canonical tags dont work on divs.

Google has spoken about sites like these in the past and made some suggestions for indexing, here are a couple of links that may help :

https://www.seroundtable.com/seo-single-page-12964.html

https://www.seroundtable.com/google-on-crawling-javascript-sites-progressive-web-apps-21737.html