Field processor API

API reference of provided field processors.

Built-in field processors

zyte_common_items.processors.brand_processor(value: Any, page: Any) Any

Convert the data into a brand name if possible.

If inputs are either Selector, SelectorList or HtmlElement, attempts to extract brand data from it.

If value is a string, uses it to create a Brand instance.

Other inputs are returned unchanged.

zyte_common_items.processors.breadcrumbs_processor(value: Any, page: Any) Any

Convert the data into a list of Breadcrumb objects if possible.

Supported inputs are Selector, SelectorList, HtmlElement and an iterable of zyte_parsers.Breadcrumb objects. Other inputs are returned as is.

zyte_common_items.processors.description_processor(value: Any, page: Any) Any

Convert the data into a cleaned up text if possible.

Uses the clear-html library.

Supported inputs are Selector, SelectorList and HtmlElement. Other inputs are returned as is.

Puts the cleaned HtmlElement object into page._description_node and the cleaned text into page._description_str.

zyte_common_items.processors.description_html_processor(value: Selector | HtmlElement, page: Any) Any

Convert the data into a cleaned up HTML if possible.

Uses the clear-html library.

Supported inputs are Selector, SelectorList and HtmlElement. Other inputs are returned as is.

Puts the cleaned HtmlElement object into page._descriptionHtml_node.

zyte_common_items.processors.gtin_processor(value: SelectorList | Selector | HtmlElement | str, page: Any) Any

Convert the data into a list of Gtin objects if possible.

Supported inputs are str, Selector, SelectorList, HtmlElement, an iterable of str and an iterable of zyte_parsers.Gtin objects. Other inputs are returned as is.

zyte_common_items.processors.images_processor(value: Any, page: Any) Any

Convert the data into a list of Image objects if possible.

If the input is a string, it’s used as a url for returning image object.

If input is either an iterable of strings or mappings with “url” key, they are used to populate image objects.

Other inputs are returned unchanged.

zyte_common_items.processors.metadata_processor(metadata: BaseMetadata | None, page)

Processor for a metadata field that ensures that the output metadata object uses the metadata class declared by page.

zyte_common_items.processors.price_processor(value: Any, page: Any) Any

Convert the data into a price string if possible.

Uses the price-parser library.

Supported inputs are Selector, SelectorList, HtmlElement and numeric values.

Other inputs are returned as is.

Puts the parsed Price object into page._parsed_price.

zyte_common_items.processors.rating_processor(value: Any, page: Any) Any

Convert the data into an AggregateRating object if possible.

Supported inputs are selector-like objects (Selector, SelectorList, or HtmlElement).

The input can also be a dictionary with one or more of the AggregateRating fields as keys. The values for those keys can be either final values, to be assigned to the corresponding fields, or selector-like objects.

If a returning dictionary is missing the bestRating field and ratingValue is a selector-like object, bestRating may be extracted.

For example, for the following input HTML:

<span class="rating">3.8 out of 5 stars</span>
<a class="reviews">See all 7 reviews</a>

You can use:

@field
def aggregateRating(self):
    return {
        "ratingValue": self.css(".rating"),
        "reviewCount": self.css(".reviews"),
    }

To get:

AggregateRating(
    bestRating=5.0,
    ratingValue=3.8,
    reviewCount=7,
)
zyte_common_items.processors.simple_price_processor(value: Any, page: Any) Any

Convert the data into a price string if possible.

Uses the price-parser library.

Supported inputs are Selector, SelectorList, HtmlElement and numeric values.

Other inputs are returned as is.