Scrapy components
Item pipelines
- class zyte_common_items.pipelines.AEPipeline
Replace standard items with matching items with the old Zyte Automatic Extraction schema.
This item pipeline is intended to help in the migration from Zyte Automatic Extraction to Zyte API automatic extraction.
In the simplest scenarios, it can be added to the
ITEM_PIPELINESsetting in migrated code to ensure that the schema of output items matches the old schema.In scenarios where page object classes were being used to fix, extend or customize extraction, it is recommended to migrate page object classes to the new schemas, or move page object class code to the corresponding spider callback.
If you have callbacks with custom code based on the old schema, you can either migrate that code, and ideally move it to a page object class, or use zyte_common_items.ae.downgrade at the beginning of the callback, e.g.:
from zyte_common_items import ae ... def parse_product(self, response: DummyResponse, product: Product): product = ae.downgrade(product) ...
- class zyte_common_items.pipelines.DropLowProbabilityItemPipeline(crawler)
Item pipeline that drops items that have a low probability.
The
ITEM_PROBABILITY_THRESHOLDSsetting determines the probability thresholds. By default, items with probability < 0.1 are dropped.dictobjects with items as values are supported. For those, the probability of each item is evaluated, and items with a low probability are removed from thedict. If thedictends up empty, it is dropped entirely.ITEM_PROBABILITY_THRESHOLDS
Default:
{"default": 0.1}Allows defining a threshold for each item class and a default threshold for any other item class.
Thresholds for item classes can be defined using either an import path of the item class or directly using the item class itself.
For example:
from zyte_common_items import Article ITEM_PROBABILITY_THRESHOLDS = { Article: 0.2, "zyte_common_items.Product": 0.3, "default": 0.15, }
Log formatters
- class zyte_common_items.log_formatters.ZyteLogFormatter
Log formatter that implements support for
InfoDropItem.