Items

The provided item classes can be used to map data extracted from web pages, e.g. using page objects.

Creating items from dictionaries

You can create an item from any dict-like object via the from_dict() method.

For example, to create a Product:

>>> from zyte_common_items import Product
>>> data = {
...     'url': 'https://example.com/',
...     'mainImage': {
...         'url': 'https://example.com/image.png',
...     },
...     'gtin': [
...         {'type': 'gtin13', 'value': '9504000059446'},
...     ],
... }
>>> product = Product.from_dict(data)

from_dict() applies the right classes to nested data, such as Image and Gtin for the input above.

>>> product.url
'https://example.com/'
>>> product.mainImage
Image(url='https://example.com/image.png')
>>> product.canonicalUrl
>>> product.gtin
[Gtin(type='gtin13', value='9504000059446')]

Creating items from lists

You can create items in bulk using the from_list() method:

>>> from zyte_common_items import Product
>>> data_list = [
...     {'url': 'https://example.com/1', 'name': 'Product 1'},
...     {'url': 'https://example.com/2', 'name': 'Product 2'},
...     {'url': 'https://example.com/3', 'name': 'Product 3'},
...     {'url': 'https://example.com/4', 'name': 'Product 4'}
... ]
>>> products = Product.from_list(data_list)
>>> len(products)
4
>>> products[0].url
'https://example.com/1'
>>> products[3].name
'Product 4'

This can be especially useful if you’re processing lots of items from an API, file, database, etc.

Handling unknown fields

Items and components do not allow attributes beyond those they define:

>>> from zyte_common_items import Product
>>> product = Product(url="https://example.com", foo="bar")
Traceback (most recent call last):
...
TypeError: ... got an unexpected keyword argument 'foo'
>>> product = Product(url="https://example.com")
>>> product.foo = "bar"
Traceback (most recent call last):
...
AttributeError: 'Product' object has no attribute 'foo'

However, when using from_dict() and from_list(), unknown fields assigned to items and components won’t cause an error. Instead, they are placed inside the _unknown_fields_dict attribute, and can be accessed the same way as known fields using ZyteItemAdapter:

>>> from zyte_common_items import Product, ZyteItemAdapter
>>> data = {
...     'url': 'https://example.com/',
...     'unknown_field': True,
... }
>>> product = Product.from_dict(data)
>>> product._unknown_fields_dict
{'unknown_field': True}
>>> adapter = ZyteItemAdapter(product)
>>> adapter['unknown_field']
True

This allows compatibility with future field changes in the input data, which could cause backwards incompatibility issues.

Note, however, that unknown fields are only supported within items and components. Input processing can still fail for other types of unexpected input:

>>> from zyte_common_items import Product
>>> data = {
...     'url': 'https://example.com/',
...     'mainImage': 'not a dictionary',
... }
>>> product = Product.from_dict(data)
Traceback (most recent call last):
...
ValueError: Expected mainImage to be a dict with fields from zyte_common_items.components.media.Image, got 'not a dictionary'.
>>> data = {
...     'url': 'https://example.com/',
...     'breadcrumbs': 3,
... }
>>> product = Product.from_dict(data)
Traceback (most recent call last):
...
ValueError: Expected breadcrumbs to be a list, got 3.

Defining custom items

You can subclass Item or any item subclass to define your own item.

Item is a slotted attrs class and, to enjoy the benefits of that, subclasses should also be slotted attrs classes. For example:

>>> import attrs
>>> from zyte_common_items import Item
>>> @attrs.define
... class CustomItem(Item):
...     foo: str

Mind that slotted attrs classes do not support multiple inheritance.