Components

These classes are used to map data within items, and are not tied to any specific item type.

class zyte_common_items.AdditionalProperty(**kwargs)

A name-value pair.

See Product.additionalProperties.

name: str

Name.

value: str

Value.

class zyte_common_items.Address(**kwargs)

Address item.

addressCity: str | None

The city the place is located in.

addressCountry: str | None

The country the place is located in.

The country name or the ISO 3166-1 alpha-2 country code.

addressLocality: str | None

The locality to which the place belongs.

addressRaw: str | None

The raw address information, as it appears on the website.

addressRegion: str | None

The region of the place.

latitude: float | None

Geographical latitude of the place.

longitude: float | None

Geographical longitude of the place.

postalCode: str | None

The postal code of the address.

postalCodeAux: str | None

The auxiliary part of the postal code.

It may include a state abbreviation or town name, depending on local standards.

streetAddress: str | None

The street address of the place.

class zyte_common_items.AggregateRating(**kwargs)

Aggregate data about reviews and ratings.

At least one of ratingValue or reviewCount is required.

See Product.aggregateRating.

bestRating: float | None

Maximum value of the rating system.

ratingValue: float | None

Average value of all ratings.

reviewCount: int | None

Review count.

class zyte_common_items.Amenity(**kwargs)

An amenity that a business place has

name: str

Name of amenity.

value: bool

Availability of the amenity.

class zyte_common_items.Audio(**kwargs)

Audio.

See Article.audios.

url: str

URL.

When multiple URLs exist for a given media element, pointing to different-quality versions, the highest-quality URL should be used.

Data URIs are not allowed in this attribute.

class zyte_common_items.Author(**kwargs)

Author of an article.

See Article.authors.

email: str | None

Email.

name: str | None

Full name.

nameRaw: str | None

Text from which name was extracted.

url: str | None

URL of the details page of the author.

class zyte_common_items.BaseSalary(**kwargs)

Base salary of a job offer.

currency: str | None

Currency associated with the salary amount.

currencyRaw: str | None

Currency associated with the salary amount, without normalization.

rateType: str | None

The type of rate associated with the salary, e.g. monthly, annual, daily.

raw: str | None

Salary amount as it appears on the website.

valueMax: str | None

The maximum value of the base salary as a number string.

valueMin: str | None

The minimum value of the base salary as a number string.

class zyte_common_items.Brand(**kwargs)

Brand.

See Product.brand.

name: str

Name as it appears on the source webpage (no post-processing).

class zyte_common_items.Breadcrumb(**kwargs)

A breadcrumb from the breadcrumb trail of a webpage.

See Product.breadcrumbs.

name: str | None

Displayed name.

url: str | None

Target URL.

class zyte_common_items.Gtin(**kwargs)

GTIN type-value pair.

See Product.gtin.

type: str

Identifier of the GTIN format of value.

One of: "gtin13", "gtin8", "gtin14", "isbn10", "isbn13", "ismn", "issn", "upc".

value: str

Value.

It should only contain digits.

class zyte_common_items.Header(**kwargs)

An HTTP header

name: str

Name of the header

value: str

Value of the header

class zyte_common_items.HiringOrganization(**kwargs)

Organization that is hiring for a job offer.

id: str | None

Identifier of the organization used by job posting website.

name: str | None

Name of the hiring organization.

nameRaw: str | None

Organization information as available on the website.

class zyte_common_items.Image(**kwargs)

Image.

See for example Product.images and Product.mainImage.

url: str

URL.

When multiple URLs exist for a given media element, pointing to different-quality versions, the highest-quality URL should be used.

Data URIs are not allowed in this attribute.

class zyte_common_items.JobLocation(**kwargs)

Location of a job offer.

raw: str | None

Job location, as it appears on the website.

A link from a webpage to another webpage.

text: str | None

Displayed text.

url: str | None

Target URL.

A link from a webpage to another webpage.

name: str | None

The name of the link.

url: str | None

Target URL.

class zyte_common_items.OpeningHoursItem(**kwargs)

Specification of opening hours of a business place.

closes: str | None

Closing time in ISO 8601 format, local time.

dayOfWeek: str | None

English weekday name.

opens: str | None

Opening time in ISO 8601 format, local time.

rawCloses: str | None

Closing time, as it appears on the page, without processing.

rawDayOfWeek: str | None

Day of the week, as it appears on the page, without processing.

rawOpens: str | None

Opening time, as it appears on the page, without processing.

class zyte_common_items.ParentPlace(**kwargs)

If the place is located inside another place, these are the details of the parent place.

name: str

Name of the parent place.

placeId: str

Identifier of the parent place.

class zyte_common_items.ProbabilityRequest(**kwargs)

A Request that includes a probability value.

metadata: ProbabilityMetadata | None

Data extraction process metadata.

class zyte_common_items.Reactions(**kwargs)

Details of reactions to a post.

dislikes: int | None

Number of dislikes or other negative reactions to the post.

likes: int | None

Number of likes or other positive reactions to the post.

replies: int | None

Number of times the post received a reply.

reposts: int | None

Number of times the post has been shared.

class zyte_common_items.RealEstateArea(**kwargs)

Area of a place, with type, units, value and raw value.

areaType: str | None

Type of area, one of: LOT, FLOOR

raw: str

Area in the raw format, as it appears on the website.

unitCode: str

Unit of the value field, one of: SQMT (square meters), SQFT (square feet), ACRE (acres).

value: float

Area

class zyte_common_items.Request(**kwargs)

Describe a web request to load a page

cast(cls: Type[RequestT]) RequestT

Convert value, an instance of Request or a subclass, into cls, a different class that is also either Request or a subclass.

to_scrapy(callback, **kwargs)

Convert a request to scrapy.Request. All kwargs are passed to scrapy.Request as-is.

body: str | None

HTTP request body, Base64-encoded

property body_bytes: bytes | None

Request.body as bytes

headers: List[Header] | None

HTTP headers

method: str

HTTP method

name: str | None

Name of the page being requested.

url: str

HTTP URL

class zyte_common_items.SocialMediaPostAuthor(**kwargs)

Details of the author of a social media post.

dateAccountCreated: str | None

The date of the creation of the author’s account.

isVerified: bool | None

Indication if the author’s account is verified.

location: str | None

The location of the author, if it’s available in the author profile. Country or city location only.

numberOfFollowers: int | None

The number of the followers that observe the author.

numberOfFollowing: int | None

The number of the users that the author follows.

class zyte_common_items.StarRating(**kwargs)

Official star rating of a place.

ratingValue: float | None

Star rating value of the place.

raw: str | None

Star rating of the place, as it appears on the page, without processing.

class zyte_common_items.Topic(**kwargs)

Topic that is discussed on the page.

name: str

Name of the topic.

class zyte_common_items.Url(**kwargs)

A URL.

class zyte_common_items.Video(**kwargs)

Video.

See Article.videos.

url: str

URL.

When multiple URLs exist for a given media element, pointing to different-quality versions, the highest-quality URL should be used.

Data URIs are not allowed in this attribute.

Item metadata components

class zyte_common_items.Metadata(**kwargs)

Bases: SearchMetadata

Generic metadata class.

It defines all attributes of metadata classes for specific item types, so that it can be used during extraction instead of a more specific class, and later converted to the corresponding, more specific metadata class.

get_date_downloaded_parsed() datetime | None

Return dateDownloaded as a TZ-aware datetime object

dateDownloaded: str | None

Date and time when the product data was downloaded, in UTC timezone and the following format: YYYY-MM-DDThh:mm:ssZ.

displayedQuery: str | None

Search query as seen in the webpage.

probability: float | None

The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.

For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).

searchText: str | None

The search text used to find the item.

searchedQuery: str | None

Search query as specified in the input URL.

totalOrganicResults: int | None

Total number of organic results reported by the search engine.

validationMessages: Dict[str, List[str]] | None

Contains paths to fields with the description of issues found with their values.

class zyte_common_items.ProbabilityMetadata(**kwargs)

Bases: BaseMetadata

Data extraction process metadata.

probability: float | None

The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.

For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).

class zyte_common_items.ListMetadata(**kwargs)

Bases: BaseMetadata

Minimal metadata for list item classes, such as ProductList or ArticleList.

See ArticleList.metadata.

get_date_downloaded_parsed() datetime | None

Return dateDownloaded as a TZ-aware datetime object

dateDownloaded: str | None

Date and time when the product data was downloaded, in UTC timezone and the following format: YYYY-MM-DDThh:mm:ssZ.

validationMessages: Dict[str, List[str]] | None

Contains paths to fields with the description of issues found with their values.

class zyte_common_items.DetailsMetadata(**kwargs)

Bases: ListMetadata

Minimal metadata for details item classes, such as Product or Article.

get_date_downloaded_parsed() datetime | None

Return dateDownloaded as a TZ-aware datetime object

dateDownloaded: str | None

Date and time when the product data was downloaded, in UTC timezone and the following format: YYYY-MM-DDThh:mm:ssZ.

probability: float | None

The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.

For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).

validationMessages: Dict[str, List[str]] | None

Contains paths to fields with the description of issues found with their values.

class zyte_common_items.SearchMetadata(**kwargs)

Bases: DetailsMetadata

Minimal metadata for classes of items that can declare search metadata.

get_date_downloaded_parsed() datetime | None

Return dateDownloaded as a TZ-aware datetime object

dateDownloaded: str | None

Date and time when the product data was downloaded, in UTC timezone and the following format: YYYY-MM-DDThh:mm:ssZ.

probability: float | None

The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.

For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).

searchText: str | None

The search text used to find the item.

validationMessages: Dict[str, List[str]] | None

Contains paths to fields with the description of issues found with their values.

class zyte_common_items.BaseMetadata(**kwargs)

Bases: Item

Base metadata class

cast(cls: Type[MetadataT]) MetadataT

Convert value, a metadata instance, into a different metadata cls.

Typing

class zyte_common_items.components.metadata.MetadataT

TypeVar for BaseMetadata.

alias of TypeVar(‘MetadataT’, bound=BaseMetadata)

class zyte_common_items.components.request.RequestT

TypeVar for Request.

alias of TypeVar(‘RequestT’, bound=Request)