Block Website Scraping by Google Docs

Blocking on User-Agent is great solution because there doesn't appear to be a way to set a different User-Agent and still use INPUTHTML function -- and since you're happy to ban 'all' usage from doc-sheets, that's perfect.

Additional thoughts, though if full on ban seems unpleasant:

Rate limit it: as you say you're recognizing it's mostly coming from two IP and always with the same user agent, just slow down your response. As long as the requests are serial, the you can provide data, yet at a pass which may be sufficient to discourage scraping. Delay your response (to suspected scrapers) by 20 or 30 seconds.
Redirect to "You're blocked" screen, or screen with "default" data (i.e., scrapable, but not with current data). Better than basic 403 because it will tell the human it's not for scraping and then you can direct them to purchasing access (or at least requesting a key from you.)

Block Website Scraping by Google Docs

Tags:

Web Scraping

Google Docs

Google Sheets

Google Sheets Importxml

Related

Recent Posts