Page objects
Built-in page object classes are good base classes for custom page object classes that implement website-specific page objects.
They provide the following base line:
They declare the item class that they return, allowing for their
to_itemmethod to automatically build an instance of it from@field-decorated methods. See Fields.They provide a default implementation for their
metadataandurlfields.They also provide a default implementation for some item-specific fields in pages that have those (except for
descriptionin the pages forArticlewhich has different requirements):
The following code shows a ProductPage subclass
whose to_item method returns an instance of
Product with
metadata, a
name, and a
url:
import attrs
from zyte_common_items import ProductPage
class CustomProductPage(ProductPage):
@field
def name(self):
return self.css("h1::text").get()
Extractors
For some nested fields (ProductFromList, ProductVariant),
base extractors exist that you can subclass
to write your own extractors.
They provide the following base line:
They declare the item class that they return, allowing for their
to_itemmethod to automatically build an instance of it from@field-decorated methods. See Fields.They also provide default processors for some item-specific fields.
See Extractor API.
Auto page object classes
Page object classes with the Auto prefix can be used to easily define page
object classes that get an item as a dependency from another
page object class, can generate an identical item by default, and can also
easily override specific fields of the item, or even return a new item with
extra fields. For example:
import attrs
from web_poet import Returns, field
from zyte_common_items import AutoProductPage, Product
@attrs.define
class ExtendedProduct(Product):
foo: str
class ExtendedProductPage(AutoProductPage, Returns[ExtendedProduct]):
@field
def name(self):
return f"{self.product.brand.name} {self.product.name}"
@field
def foo(self):
return "bar"
Fields of these classes have auto_field set to True in their field
metadata, so that you can check if a page object subclass is overriding a field
using is_auto_field():
- zyte_common_items.fields.is_auto_field(cls: ItemPage, field: str)
Return
Trueif the field named field of the cls page object class hasauto_fieldset toTruein its field metadata.All fields defined in auto page object classes meet this condition.
print(is_auto_field(ExtendedProductPage, "name")) # Returns False
print(is_auto_field(ExtendedProductPage, "foo")) # Returns False
print(is_auto_field(ExtendedProductPage, "brand")) # Returns True
print(is_auto_field(ExtendedProductPage, "bar")) # Raises KeyError
If you are overriding a field method but the method continues to return the
value straight from the Auto-prefixed class, you should also set
auto_field to True. Instead of setting it manually in the field meta,
you can replace the field() decorator with
auto_field():
- zyte_common_items.fields.auto_field(method=None, *, cached: bool = False, meta: dict | None = None, out: List[Callable] | None = None)
Decorator that works like
web_poet.fields.field()but setsauto_fieldtoTrueby default in meta.from zyte_common_items import AutoProductPage from zyte_common_items.fields import auto_field class ProductPage(AutoProductPage): @auto_field def name(self): return super().name