Field processors
Overview
This library provides useful field processors (web-poet documentation) and complementary mixins. Built-in page object classes and extractor classes use them by default for the corresponding fields.
By design, the processors enabled by default are “transparent”: they
don’t change the output of the field if the result is of the expected
final type. For example, if there is a str
attribute in the item,
and the field returns str
value, the default processor returns
the value as-is.
Usually, to engage a built-in field processor, a
field must return a Selector
,
SelectorList
, or HtmlElement
object. Then the field processor takes care of extracting the right data.
Field mapping
The following table indicates which fields use which processors by default in built-in page object classes and extractor classes:
Field |
Default processor |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Examples
Here are examples of inputs and matching field implementations that work on built-in page object and extractor classes:
Input HTML fragment |
Field implementation and output |
<span class="reviews">
3.8 (7 reviews)
</span>
|
@field
def aggregateRating(self):
return self.css(".reviews")
Product(
aggregateRating=AggregateRating(
bestRating=None,
ratingValue=3.8,
reviewCount=7,
),
)
Supports separate selectors per field.
See
rating_processor() . |
<p class="brand">
<img alt='Some Brand'>
</p>
|
@field
def brand(self):
return self.css(".brand")
Product(
brand="Some Brand",
)
|
<div class="nav">
<ul>
<li>
<a href="/home">Home</a>
</li>
<li>
<a href="/about">About</a>
</li>
</ul>
</div>
|
@field
def breadcrumbs(self):
return self.css(".nav")
Product(
breadcrumbs=[
Breadcrumb(
name="Home",
url="https://example.com/home",
),
Breadcrumb(
name="About",
url="https://example.com/about",
),
],
)
|
<div class="desc">
<p>Ideal for <b>scraping</b> glass.</p>
<p>Durable and reusable.</p>
</div>
|
@field
def descriptionHtml(self):
return self.css(".desc")
Product(
description=(
"Ideal for scraping glass.\n"
"\n"
"Durable and reusable."
),
descriptionHtml=(
"<article>\n"
"\n"
"<p>Ideal for "
"<strong>scraping</strong> "
"glass.</p>\n"
"\n"
"<p>Durable and reusable.</p>\n"
"\n"
"</article>"
),
)
|
<span class="gtin">
978-1-933624-34-1
</span>
|
@field
def gtin(self):
return self.css(".gtin")
Product(
gtin=[
("isbn13", "9781933624341"),
],
)
|
<div class="price">
<del>13,2 €</del>
<b>10,2 €</b>
</div>
|
@field
def price(self):
return self.css(".price b")
@field
def regularPrice(self):
return self.css(".price del")
Product(
currencyRaw="€",
price="10.20",
regularPrice="13.20",
)
|