suppress Scrapy Item printed in logs after pipeline

If you want to exclude only some attributes of the output, you can extend the answer given by @dino

from scrapy.item import Item, Field
import json

class MyItem(Item):
    attr1 = Field()
    attr2 = Field()
    attr1ToExclude = Field()
    attr2ToExclude = Field()
    # ...
    attrN = Field()

    def __repr__(self):
        r = {}
        for attr, value in self.__dict__['_values'].iteritems():
            if attr not in ['attr1ToExclude', 'attr2ToExclude']:
                r[attr] = value
        return json.dumps(r, sort_keys=True, indent=4, separators=(',', ': '))

Having read through the documentation and conducted a (brief) search through the source code, I can't see a straightforward way of achieving this aim.

The hammer approach is to set the logging level in the settings to INFO (ie add the following line to settings.py):

LOG_LEVEL='INFO'

This will strip out a lot of other information about the URLs/page that are being crawled, but it will definitely suppress data about processed items.


I tried the repre way mentioned by @dino, it doesn't work well. But evolved from his idea, I tried the str method, and it works.

Here's how I do it, very simple:

    def __str__(self):
        return ""

Another approach is to override the __repr__ method of the Item subclasses to selectively choose which attributes (if any) to print at the end of the pipeline:

from scrapy.item import Item, Field
class MyItem(Item):
    attr1 = Field()
    attr2 = Field()
    # ...
    attrN = Field()

    def __repr__(self):
        """only print out attr1 after exiting the Pipeline"""
        return repr({"attr1": self.attr1})

This way, you can keep the log level at DEBUG and show only the attributes that you want to see coming out of the pipeline (to check attr1, for example).

Tags:

Python

Scrapy