Do search engines still crawl a noindex page

Yes, Google still crawl webpages that have noindex tag.

But if you have same content on two different webpages and one URL contain noindex tag, while second does not, then you should not worry about it, because out of all duplicate content only one webpage is indexed by Google. Rest of webpages are crawlable but not indexed in Google search result, so that is fine.


As Goyllo has already stated, search engine bots will crawl pages that have a noindex meta tag. If you think about it, they need to crawl the page in order to see the noindex meta tag in the first place. (You could use an X-Robots-Tag HTTP response header instead and, in theory, a bot would only need to do a HEAD request in order to see the noindex attribute - but that's not how Google rolls.)

If a page is noindex, it can still be follow (which it would be by default, unless you explicitly state nofollow as well), so the page would obviously need to be crawled in order to discover any links to follow.

Do I have to add a 'nofollow' attribute to the link whilst we make these pages unique?

That simply discounts that particular link from the ranking algorithm. So, that particular link will not be used as a ranking factor for the target URL. I assume it's highly likely that there are other inbound links to that page as well?

...pages have stated 'noindex' and I was wondering if these pages would still be detected as duplicates?

Duplicate of what? A page can only be considered a duplicate (in the eyes of the search engine index) if it is indexed. If it's not indexed then it can't be a duplicate.

The duplicate content "problem" is if you have two (or more) duplicate pages that have been crawled and indexed then the search engine must decide which page to return in the SERPs. Unless you resolve this duplicate content yourself (redirect, canonical tag or simply making the content unique) then it's out of your control - the search engine makes the decision for you. You are also potentially diluting your search ranking as users discover different pages and link back to one or the other.

To prevent a page from being crawled (ie. not even requested) then you can include an entry in your robots.txt file. However, this will mean the search engines will be unable to see your noindex meta tag. Whilst this should prevent the page appearing in normal search results, it doesn't necessarily prevent the page from appearing as a link-only result in the SERPs (ie. "indexed") if it is linked to. However, it still can't be considered "duplicate" because it's content won't have been read and indexed.