scrapy how to set referer url

You should do exactly as @warwaruk indicated, below is my example elaboration for a crawl spider:

from scrapy.spiders import CrawlSpider
from scrapy import Request

class MySpider(CrawlSpider):
  name = "myspider"
  allowed_domains = ["example.com"]
  start_urls = [
      'http://example.com/foo'
      'http://example.com/bar'
      'http://example.com/baz'
      ]
  rules = [(...)]

  def start_requests(self):
    requests = []
    for item in self.start_urls:
      requests.append(Request(url=item, headers={'Referer':'http://www.example.com/'}))
    return requests    

  def parse_me(self, response):
    (...)

This should generate following logs in your terminal:

(...)
[myspider] DEBUG: Crawled (200) <GET http://example.com/foo> (referer: http://www.example.com/)
(...)
[myspider] DEBUG: Crawled (200) <GET http://example.com/bar> (referer: http://www.example.com/)
(...)
[myspider] DEBUG: Crawled (200) <GET http://example.com/baz> (referer: http://www.example.com/)
(...)

Will work same with BaseSpider. In the end start_requests method is BaseSpider method, from which CrawlSpider inherits from.

Documentation explains more options to be set in Request apart from headers, such as: cookies , callback function, priority of the request etc.


Override BaseSpider.start_requests and create there your custom Request passing it your referer header.


Just set Referer url in the Request headers

class scrapy.http.Request(url[, method='GET', body, headers, ...

headers (dict) – the headers of this request. The dict values can be strings (for single valued headers) or lists (for multi-valued headers).

Example:

return Request(url=your_url, headers={'Referer':'http://your_referer_url'})


If you want to change the referer in your spider's request, you can change DEFAULT_REQUEST_HEADERS in the settings.py file:

DEFAULT_REQUEST_HEADERS = {
    'Referer': 'http://www.google.com' 
}