Components

These classes are used to map data within items, and are not tied to any specific item type.

class zyte_common_items.AdditionalProperty(**kwargs)

A name-value pair.

See Product.additionalProperties.

name: str

Name.

value: str

Value.

class zyte_common_items.Address(**kwargs)

Address item.

addressCity: Optional[str]

The city the place is located in.

addressCountry: Optional[str]

The country the place is located in.

The country name or the ISO 3166-1 alpha-2 country code.

addressLocality: Optional[str]

The locality to which the place belongs.

addressRaw: Optional[str]

The raw address information, as it appears on the website.

addressRegion: Optional[str]

The region of the place.

latitude: Optional[float]

Geographical latitude of the place.

longitude: Optional[float]

Geographical longitude of the place.

postalCode: Optional[str]

The postal code of the address.

postalCodeAux: Optional[str]

The auxiliary part of the postal code.

It may include a state abbreviation or town name, depending on local standards.

streetAddress: Optional[str]

The street address of the place.

class zyte_common_items.AggregateRating(**kwargs)

Aggregate data about reviews and ratings.

At least one of ratingValue or reviewCount is required.

See Product.aggregateRating.

bestRating: Optional[float]

Maximum value of the rating system.

ratingValue: Optional[float]

Average value of all ratings.

reviewCount: Optional[int]

Review count.

class zyte_common_items.Amenity(**kwargs)

An amenity that a business place has

name: str

Name of amenity.

value: bool

Availability of the amenity.

class zyte_common_items.Audio(**kwargs)

Audio.

See Article.audios.

url: str

URL.

When multiple URLs exist for a given media element, pointing to different-quality versions, the highest-quality URL should be used.

Data URIs are not allowed in this attribute.

class zyte_common_items.Author(**kwargs)

Author of an article.

See Article.authors.

email: Optional[str]

Email.

name: Optional[str]

Full name.

nameRaw: Optional[str]

Text from which name was extracted.

url: Optional[str]

URL of the details page of the author.

class zyte_common_items.BaseSalary(**kwargs)

Base salary of a job offer.

currency: Optional[str]

Currency associated with the salary amount.

currencyRaw: Optional[str]

Currency associated with the salary amount, without normalization.

rateType: Optional[str]

The type of rate associated with the salary, e.g. monthly, annual, daily.

raw: Optional[str]

Salary amount as it appears on the website.

valueMax: Optional[str]

The maximum value of the base salary as a number string.

valueMin: Optional[str]

The minimum value of the base salary as a number string.

class zyte_common_items.Brand(**kwargs)

Brand.

See Product.brand.

name: str

Name as it appears on the source webpage (no post-processing).

class zyte_common_items.Breadcrumb(**kwargs)

A breadcrumb from the breadcrumb trail of a webpage.

See Product.breadcrumbs.

name: Optional[str]

Displayed name.

url: Optional[str]

Target URL.

class zyte_common_items.Gtin(**kwargs)

GTIN type-value pair.

See Product.gtin.

type: str

Identifier of the GTIN format of value.

One of: "gtin13", "gtin8", "gtin14", "isbn10", "isbn13", "ismn", "issn", "upc".

value: str

Value.

It should only contain digits.

class zyte_common_items.Header(**kwargs)

An HTTP header

name: str

Name of the header

value: str

Value of the header

class zyte_common_items.HiringOrganization(**kwargs)

Organization that is hiring for a job offer.

id: Optional[str]

Identifier of the organization used by job posting website.

name: Optional[str]

Name of the hiring organization.

nameRaw: Optional[str]

Organization information as available on the website.

class zyte_common_items.Image(**kwargs)

Image.

See for example Product.images and Product.mainImage.

url: str

URL.

When multiple URLs exist for a given media element, pointing to different-quality versions, the highest-quality URL should be used.

Data URIs are not allowed in this attribute.

class zyte_common_items.JobLocation(**kwargs)

Location of a job offer.

raw: Optional[str]

Job location, as it appears on the website.

A link from a webpage to another webpage.

text: Optional[str]

Displayed text.

url: Optional[str]

Target URL.

A link from a webpage to another webpage.

name: Optional[str]

The name of the link.

url: Optional[str]

Target URL.

class zyte_common_items.OpeningHoursItem(**kwargs)

Specification of opening hours of a business place.

closes: Optional[str]

Closing time in ISO 8601 format, local time.

dayOfWeek: Optional[str]

English weekday name.

opens: Optional[str]

Opening time in ISO 8601 format, local time.

rawCloses: Optional[str]

Closing time, as it appears on the page, without processing.

rawDayOfWeek: Optional[str]

Day of the week, as it appears on the page, without processing.

rawOpens: Optional[str]

Opening time, as it appears on the page, without processing.

class zyte_common_items.ParentPlace(**kwargs)

If the place is located inside another place, these are the details of the parent place.

name: str

Name of the parent place.

placeId: str

Identifier of the parent place.

class zyte_common_items.ProbabilityMetadata(**kwargs)

Data extraction process metadata.

probability: Optional[float]

The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.

For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).

class zyte_common_items.ProbabilityRequest(**kwargs)

A Request that includes a probability value.

metadata: Optional[ProbabilityMetadata]

Data extraction process metadata.

class zyte_common_items.Reactions(**kwargs)

Details of reactions to a post.

dislikes: Optional[int]

Number of dislikes or other negative reactions to the post.

likes: Optional[int]

Number of likes or other positive reactions to the post.

reposts: Optional[int]

Number of times the post has been shared.

class zyte_common_items.RealEstateArea(**kwargs)

Area of a place, with type, units, value and raw value.

areaType: Optional[str]

Type of area, one of: LOT, FLOOR

raw: str

Area in the raw format, as it appears on the website.

unitCode: str

Unit of the value field, one of: SQMT (square meters), SQFT (square feet), ACRE (acres).

value: float

Area

class zyte_common_items.Request(**kwargs)

Describe a web request to load a page

cast(cls: Type[RequestT]) RequestT

Convert value, an instance of Request or a subclass, into cls, a different class that is also either Request or a subclass.

to_scrapy(callback, **kwargs)

Convert a request to scrapy.Request. All kwargs are passed to scrapy.Request as-is.

body: Optional[str]

HTTP request body, Base64-encoded

property body_bytes: Optional[bytes]

Request.body as bytes

headers: Optional[List[Header]]

HTTP headers

method: str

HTTP method

name: Optional[str]

Name of the page being requested.

url: str

HTTP URL

class zyte_common_items.SocialMediaPostAuthor(**kwargs)

Details of the author of a social media post.

dateAccountCreated: Optional[str]

The date of the creation of the author’s account.

isVerified: Optional[bool]

Indication if the author’s account is verified.

location: Optional[str]

The location of the author, if it’s available in the author profile. Country or city location only.

numberOfFollowers: Optional[int]

The number of the followers that observe the author.

numberOfFollowing: Optional[int]

The number of the users that the author follows.

class zyte_common_items.StarRating(**kwargs)

Official star rating of a place.

ratingValue: Optional[float]

Star rating value of the place.

raw: Optional[str]

Star rating of the place, as it appears on the page, without processing.

class zyte_common_items.Url(**kwargs)

A URL.

class zyte_common_items.Video(**kwargs)

Video.

See Article.videos.

url: str

URL.

When multiple URLs exist for a given media element, pointing to different-quality versions, the highest-quality URL should be used.

Data URIs are not allowed in this attribute.

Item metadata components

class zyte_common_items.Metadata(**kwargs)

Bases: DetailsMetadata

Generic metadata class.

It defines all attributes of metadata classes for specific item types, so that it can be used during extraction instead of a more specific class, and later converted to the corresponding, more specific metadata class.

dateDownloaded: Optional[str]

Date and time when the product data was downloaded, in UTC timezone and the following format: YYYY-MM-DDThh:mm:ssZ.

probability: Optional[float]

The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.

For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).

searchText: Optional[str]

The search text used to find the item.

validationMessages: Optional[Dict[str, List[str]]]

Contains paths to fields with the description of issues found with their values.

class zyte_common_items.components.metadata.ProbabilityMetadata(**kwargs)

Bases: BaseMetadata

Data extraction process metadata.

probability: Optional[float]

The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.

For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).

class zyte_common_items.components.metadata.ListMetadata(**kwargs)

Bases: BaseMetadata

Minimal metadata for list item classes, such as ProductList or ArticleList.

See ArticleList.metadata.

get_date_downloaded_parsed() Optional[datetime]

Return dateDownloaded as a TZ-aware datetime object

dateDownloaded: Optional[str]

Date and time when the product data was downloaded, in UTC timezone and the following format: YYYY-MM-DDThh:mm:ssZ.

validationMessages: Optional[Dict[str, List[str]]]

Contains paths to fields with the description of issues found with their values.

class zyte_common_items.components.metadata.DetailsMetadata(**kwargs)

Bases: ListMetadata

Minimal metadata for details item classes, such as Product or Article.

get_date_downloaded_parsed() Optional[datetime]

Return dateDownloaded as a TZ-aware datetime object

dateDownloaded: Optional[str]

Date and time when the product data was downloaded, in UTC timezone and the following format: YYYY-MM-DDThh:mm:ssZ.

probability: Optional[float]

The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.

For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).

validationMessages: Optional[Dict[str, List[str]]]

Contains paths to fields with the description of issues found with their values.

class zyte_common_items.components.metadata.BaseMetadata(**kwargs)

Bases: Item

Base metadata class

cast(cls: Type[MetadataT]) MetadataT

Convert value, a metadata instance, into a different metadata cls.

Typing

class zyte_common_items.components.metadata.MetadataT

TypeVar for BaseMetadata.

class zyte_common_items.components.request.RequestT

TypeVar for Request.