Page objects
Built-in page object classes are good base classes for custom page object classes that implement website-specific page objects.
They provide the following base line:
They declare the item class that they return, allowing for their
to_item
method to automatically build an instance of it from@field
-decorated methods. See Fields.They provide a default implementation for their
metadata
andurl
fields.They also provide a default implementation for some item-specific fields in pages that have those (except for
description
in the pages forArticle
which has different requirements):
The following code shows a ProductPage
subclass
whose to_item
method returns an instance of
Product
with
metadata
, a
name
, and a
url
:
import attrs
from zyte_common_items import ProductPage
class CustomProductPage(ProductPage):
@field
def name(self):
return self.css("h1::text").get()
Page object classes with the Auto
prefix can be used to easily define page
object classes that get an item as a dependency from another
page object class, can generate an identical item by default, and can also
easily override specific fields of the item, or even return a new item with
extra fields. For example:
import attrs
from web_poet import Returns, field
from zyte_common_items import AutoProductPage, Product
@attrs.define
class ExtendedProduct(Product):
foo: str
class ExtendedProductPage(AutoProductPage, Returns[ExtendedProduct]):
@field
def name(self):
return f"{self.product.brand.name} {self.product.name}"
@field
def foo(self):
return "bar"
Extractors
For some nested fields (ProductFromList
, ProductVariant
),
base extractors exist that you can subclass
to write your own extractors.
They provide the following base line:
They declare the item class that they return, allowing for their
to_item
method to automatically build an instance of it from@field
-decorated methods. See Fields.They also provide default processors for some item-specific fields.
See Extractor API.