Python Scrapy: What is the difference between "runspider" and "crawl" commands?

In the command:

scrapy crawl [options] <spider>

<spider> is the project name (defined in settings.py, as BOT_NAME).

And in the command:

scrapy runspider [options] <spider_file>

<spider_file> is the path to the file that contains the spider.

Otherwise, the options are the same:

Options
=======
--help, -h              show this help message and exit
-a NAME=VALUE           set spider argument (may be repeated)
--output=FILE, -o FILE  dump scraped items into FILE (use - for stdout)
--output-format=FORMAT, -t FORMAT
                        format to use for dumping items with -o

Global Options
--------------
--logfile=FILE          log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
                        log level (default: DEBUG)
--nolog                 disable logging completely
--profile=FILE          write python cProfile stats to FILE
--lsprof=FILE           write lsprof profiling stats to FILE
--pidfile=FILE          write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
                        set/override setting (may be repeated)
--pdb                   enable pdb on failure

Since runspider doesn't depend on the BOT_NAME parameter, depending on the way you are customising your scrapers, you might find runspider more flexible.


The little explanation and syntax of both:

runspider

Syntax: scrapy runspider <spider_file.py>

Requires project: no

Run a spider self-contained in a Python file, without having to create a project.

Example usage:

$ scrapy runspider myspider.py

crawl

Syntax: scrapy crawl <spider>

Requires project: yes

Start crawling using a spider with the corresponding name.

Usage examples:

 $ scrapy crawl myspider

The main difference is that runspider does not need a project. That is, you can write a spider in a myspider.py file and call scrapy runspider myspider.py.

The crawl command requires a project in order to find the project's settings, load available spiders from SPIDER_MODULES settings, and lookup the spider by name.

If you need quick spider for a short task, then runspider has less boilerplate required.