Running scrapy from script not including pipeline

@Pawel's and the docs' solution was not working for me and, after looking at Scrapy's source code, I realized that in some cases it was not identifying the settings module correctly. I was wondering why the pipelines were not being used until I realized that they were never found from the script in the first place.

As the docs and Pawel state, I was using:

from scrapy.utils.project import get_project_settings
settings = get_project_settings()
crawler = Crawler(settings)

but, when calling:

print "these are the pipelines:"
print crawler.settings.__dict__['attributes']['ITEM_PIPELINES']

I got:

these are the pipelines:
<SettingsAttribute value={} priority=0>

settings wasn't getting properly populated.

I realized that what is required is a path to the project's settings module, relative to the module containing the script that calls Scrapy e.g. scrapy.myproject.settings. Then, I created the Settings() object as follows:

from scrapy.settings import Settings

settings = Settings()
os.environ['SCRAPY_SETTINGS_MODULE'] = 'scraper.edx_bot.settings'
settings_module_path = os.environ['SCRAPY_SETTINGS_MODULE']
settings.setmodule(settings_module_path, priority='project')

The complete code I used, which effectively imported the pipelines, is:

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings
from scrapy.myproject.spiders.first_spider import FirstSpider

spider = FirstSpider()

settings = Settings()
os.environ['SCRAPY_SETTINGS_MODULE'] = 'scrapy.myproject.settings'
settings_module_path = os.environ['SCRAPY_SETTINGS_MODULE']
settings.setmodule(settings_module_path, priority='project')
crawler = Crawler(settings)

crawler.signals.connect(reactor.stop, signal=signals.spider_closed)

You need to actually call get_project_settings, Settings object that you are passing to your crawler in your posted code will give you defaults, not your specific project settings. You need to write something like this:

from scrapy.utils.project import get_project_settings
settings = get_project_settings()
crawler = Crawler(settings)