Disallow robots.txt from being accessed in a browser but still accessible by spiders?

If you don't want something visible to the web, then don't make it visible. In your case, using robots.txt for excluding directories is security by non-obscurity. Instead of publicly saying "Hey, there's the place with all the precious jewels and valuable metals, don't go in there!", just say nothing and simply do not advertise the presence of those directories at all.

This take some discipline on your part, making sure that none of those directories are referred to in any way on your publicly accessible pages.

Otherwise it's impossible to do so via robots.txt. If you make that a dynamically generated page based on the visitor, then what's to stop someone from setting their browser's user agent string to "Googlebot" and getting the full list of your excluded directories? There's no 100% reliable method of detecting who's at the other end of a connection - at best you can guess and hope for the best. And guessing just doesn't cut it when it comes to security.

Password protected pages/direcotries can't be crawled, so those would be safe even if they're not explictly listed in the robots.txt for exclusion - Googlebot and its brethren won't have login credentials, so can't get into the pages to spider them.

If you don't want robots crawling those directories but don't want to announce them in your robots.txt file use the x-robots-tag HTTP header to block them.

Put this in a .htaccess file in any directory you don't want indexed:

Header set x-robots-tag: noindex

That will tell robots to ignore the contents of the files in that directory. This way no robots.txt file is necessary and you have your security through obscurity and a legitimate way of telling the search engines to stay out.

Disallow robots.txt from being accessed in a browser but still accessible by spiders?

Tags:

Php

Security

Robots.Txt

Related

Recent Posts