Items
The provided item classes can be used to map data extracted from web pages, e.g. using page objects.
Creating items from dictionaries
You can create an item from any dict
-like object via
the from_dict()
method.
For example, to create a Product
:
>>> from zyte_common_items import Product
>>> data = {
... 'url': 'https://example.com/',
... 'mainImage': {
... 'url': 'https://example.com/image.png',
... },
... 'gtin': [
... {'type': 'gtin13', 'value': '9504000059446'},
... ],
... }
>>> product = Product.from_dict(data)
from_dict()
applies the right classes to
nested data, such as Image
and
Gtin
for the input above.
>>> product.url
'https://example.com/'
>>> product.mainImage
Image(url='https://example.com/image.png')
>>> product.canonicalUrl
>>> product.gtin
[Gtin(type='gtin13', value='9504000059446')]
Creating items from lists
You can create items in bulk using the
from_list()
method:
>>> from zyte_common_items import Product
>>> data_list = [
... {'url': 'https://example.com/1', 'name': 'Product 1'},
... {'url': 'https://example.com/2', 'name': 'Product 2'},
... {'url': 'https://example.com/3', 'name': 'Product 3'},
... {'url': 'https://example.com/4', 'name': 'Product 4'}
... ]
>>> products = Product.from_list(data_list)
>>> len(products)
4
>>> products[0].url
'https://example.com/1'
>>> products[3].name
'Product 4'
This can be especially useful if you’re processing lots of items from an API, file, database, etc.
Handling unknown fields
Items and components do not allow attributes beyond those they define:
>>> from zyte_common_items import Product
>>> product = Product(url="https://example.com", foo="bar")
Traceback (most recent call last):
...
TypeError: ... got an unexpected keyword argument 'foo'
>>> product = Product(url="https://example.com")
>>> product.foo = "bar"
Traceback (most recent call last):
...
AttributeError: 'Product' object has no attribute 'foo'
However, when using from_dict()
and
from_list()
, unknown fields assigned to
items and components won’t cause an error. Instead, they are placed inside
the _unknown_fields_dict
attribute, and
can be accessed the same way as known fields using
ZyteItemAdapter
:
>>> from zyte_common_items import Product, ZyteItemAdapter
>>> data = {
... 'url': 'https://example.com/',
... 'unknown_field': True,
... }
>>> product = Product.from_dict(data)
>>> product._unknown_fields_dict
{'unknown_field': True}
>>> adapter = ZyteItemAdapter(product)
>>> adapter['unknown_field']
True
This allows compatibility with future field changes in the input data, which could cause backwards incompatibility issues.
Note, however, that unknown fields are only supported within items and components. Input processing can still fail for other types of unexpected input:
>>> from zyte_common_items import Product
>>> data = {
... 'url': 'https://example.com/',
... 'mainImage': 'not a dictionary',
... }
>>> product = Product.from_dict(data)
Traceback (most recent call last):
...
ValueError: Expected mainImage to be a dict with fields from zyte_common_items.components.media.Image, got 'not a dictionary'.
>>> data = {
... 'url': 'https://example.com/',
... 'breadcrumbs': 3,
... }
>>> product = Product.from_dict(data)
Traceback (most recent call last):
...
ValueError: Expected breadcrumbs to be a list, got 3.
Defining custom items
You can subclass Item
or any item
subclass to define your own item.
Item
is a slotted attrs class and, to enjoy
the benefits of that, subclasses should also be slotted attrs classes. For
example:
>>> import attrs
>>> from zyte_common_items import Item
>>> @attrs.define
... class CustomItem(Item):
... foo: str
Mind that slotted attrs classes do not support multiple inheritance.