Page objects

Built-in page object classes are good base classes for custom page object classes that implement website-specific page objects.

They provide the following base line:

  • They declare the item class that they return, allowing for their to_item method to automatically build an instance of it from @field-decorated methods. See Fields.

  • They provide a default implementation for their metadata and url fields.

  • They also provide a default implementation for some item-specific fields in pages that have those (except for description in the pages for Article which has different requirements):

The following code shows a ProductPage subclass whose to_item method returns an instance of Product with metadata, a name, and a url:

import attrs
from zyte_common_items import ProductPage


class CustomProductPage(ProductPage):
    @field
    def name(self):
        return self.css("h1::text").get()

Page object classes with the Auto prefix can be used to easily define page object classes that get an item as a dependency from another page object class, can generate an identical item by default, and can also easily override specific fields of the item, or even return a new item with extra fields. For example:

import attrs
from web_poet import Returns, field
from zyte_common_items import AutoProductPage, Product


@attrs.define
class ExtendedProduct(Product):
    foo: str


class ExtendedProductPage(AutoProductPage, Returns[ExtendedProduct]):
    @field
    def name(self):
        return f"{self.product.brand.name} {self.product.name}"

    @field
    def foo(self):
        return "bar"

Extractors

For some nested fields (ProductFromList, ProductVariant), base extractors exist that you can subclass to write your own extractors.

They provide the following base line:

  • They declare the item class that they return, allowing for their to_item method to automatically build an instance of it from @field-decorated methods. See Fields.

  • They also provide default processors for some item-specific fields.

See Extractor API.