Page objects

Built-in page object classes are good base classes for custom page object classes that implement website-specific page objects.

They provide the following base line:

  • They declare the item class that they return, allowing for their to_item method to automatically build an instance of it from @field-decorated methods. See Fields.

  • They provide a default implementation for their metadata and url fields.

  • They also provide a default implementation for some item-specific fields in pages that have those (except for description in the pages for Article which has different requirements):

The following code shows a ProductPage subclass whose to_item method returns an instance of Product with metadata, a name, and a url:

import attrs
from zyte_common_items import ProductPage


class CustomProductPage(ProductPage):
    @field
    def name(self):
        return self.css("h1::text").get()

Extractors

For some nested fields (ProductFromList, ProductVariant), base extractors exist that you can subclass to write your own extractors.

They provide the following base line:

  • They declare the item class that they return, allowing for their to_item method to automatically build an instance of it from @field-decorated methods. See Fields.

  • They also provide default processors for some item-specific fields.

See Extractor API.

Auto page object classes

Page object classes with the Auto prefix can be used to easily define page object classes that get an item as a dependency from another page object class, can generate an identical item by default, and can also easily override specific fields of the item, or even return a new item with extra fields. For example:

import attrs
from web_poet import Returns, field
from zyte_common_items import AutoProductPage, Product


@attrs.define
class ExtendedProduct(Product):
    foo: str


class ExtendedProductPage(AutoProductPage, Returns[ExtendedProduct]):
    @field
    def name(self):
        return f"{self.product.brand.name} {self.product.name}"

    @field
    def foo(self):
        return "bar"

Fields of these classes have auto_field set to True in their field metadata, so that you can check if a page object subclass is overriding a field using is_auto_field():

zyte_common_items.fields.is_auto_field(cls: ItemPage, field: str)

Return True if the field named field of the cls page object class has auto_field set to True in its field metadata.

All fields defined in auto page object classes meet this condition.

print(is_auto_field(ExtendedProductPage, "name"))  # Returns False
print(is_auto_field(ExtendedProductPage, "foo"))  # Returns False
print(is_auto_field(ExtendedProductPage, "brand"))  # Returns True
print(is_auto_field(ExtendedProductPage, "bar"))  # Raises KeyError

If you are overriding a field method but the method continues to return the value straight from the Auto-prefixed class, you should also set auto_field to True. Instead of setting it manually in the field meta, you can replace the field() decorator with auto_field():

zyte_common_items.fields.auto_field(method=None, *, cached: bool = False, meta: dict | None = None, out: List[Callable] | None = None)

Decorator that works like web_poet.fields.field() but sets auto_field to True by default in meta.

from zyte_common_items import AutoProductPage
from zyte_common_items.fields import auto_field


class ProductPage(AutoProductPage):
    @auto_field
    def name(self):
        return super().name