Item API
Product
- class zyte_common_items.Product(**kwargs)
Product from an e-commerce website.
urlis the only required attribute.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- additionalProperties: List[AdditionalProperty] | None
List of name-value pairs of product data.
Additional properties usually appear in product pages in the form of a specification table or a free-form specification list that can be easily turned into key-value pairs, where keys indicate the name of a property and values indicate the value of that property.
Additional properties that require 1 or more extra requests may not be extracted.
See also
features.
- aggregateRating: AggregateRating | None
Aggregate data about reviews and ratings.
- availability: str | None
Product availability status.
The value is expected to be one of:
"InStock","OutOfStock".
- breadcrumbs: List[Breadcrumb] | None
Webpage breadcrumb trail.
- color: str | None
Color of the product.
It is extracted as displayed (e.g.
"white").See also:
size,style.
- currencyRaw: str | None
Price currency as it appears on the webpage (no post-processing).
This is usually the currency that appears next to the price visually on the webpage. It is commonly a symbol but can also appear normalized already next to the price. For example, both “$” and “USD” are possible values.
Non-currencies, such as
"-", should not be extracted as currencyRaw.See also
currency.
- description: str | None
Plain-text, complete product description.
If the description is split across different parts of the source webpage, only the main part, containing the most useful pieces of information, should be extracted into this attribute.
It may contain data found in other attributes (
features,additionalProperties).Format-wise:
Line breaks and non-ASCII characters are allowed.
There is no length limit for this attribute, the content should not be truncated.
There should be no whitespace at the beginning or end.
See also
descriptionHtml.
- descriptionHtml: str | None
HTML containing the complete product description.
See
descriptionfor extraction details.The format is not the raw HTML from the source webpage. See the HTML normalization specification for details.
- features: List[str] | None
List of product features.
They are usually listed as bullet points in product webpages.
See also
additionalProperties.
- gtin: List[Gtin] | None
List of standardized GTIN product identifiers associated with the product, which are unique for the product across different sellers.
See also:
mpn,productId,sku.
- images: List[Image] | None
All product images.
The main image (see
mainImage) should be first in the list.Images only displayed as part of the product description are excluded.
- metadata: ProductMetadata | None
Data extraction process metadata.
- mpn: str | None
Manufacturer part number (MPN) of the product.
The MPN is issued by the manufacturer, so a product should have the same MPN across different e-commerce websites.
See also:
gtin,productId,sku.
- price: str | None
Price at which the product is being offered at the moment.
It must be formatted with a full stop as decimal separator and no thousands separator or currency, e.g.
"10500.99".If there are any discounts, this is the price with discounts applied.
If the price is indicated with and without value-added tax (VAT), this is the price with VAT.
See also:
regularPrice,currency,currencyRaw.
- productId: str | None
Product identifier, unique within an e-commerce website.
It may come in the form of an SKU or any other identifier, a hash, or even a URL.
See also:
gtin,mpn,sku.
- regularPrice: str | None
Price shown on the webpage as a price at which the product has been offered in the past by the same retailer, presented as a reference next to the current price.
It may be labeled as the original price, the price before discount, the list price, or the maximum retail price for which the product is sold.
It must be formatted with a full stop as decimal separator and no thousands separator or currency, e.g.
"15000.99".regularPricemust beNoneifpriceisNone. If notNone,regularPricemust be higher thanprice.If
priceis extracted with value-added tax (VAT),regularPricemust be extracted with VAT. Ifpriceis extracted without VAT,regularPricemust be extracted without VAT.See also:
price,currency,currencyRaw.
- size: str | None
Size, dimensions or volume of the product.
It is extracted as displayed (e.g.
"XL","32Wx34L","Large","750x450x800","10m","Height: 48cm - 86cm, Width: 204cm, Depth: 93cm").See also:
color,style.
- sku: str | None
Stock keeping unit (SKU) identifier, i.e. a merchant-specific product identifier.
See also:
gtin,mpn,productId.
- style: str | None
Style, pattern or finish of the product.
It is extracted as displayed (e.g.
"polka dots","Striped","Nickel finish with Translucent glass").See also:
color,size.
- variants: List[ProductVariant] | None
List of product variants.
When slightly different versions of a product are displayed on the same product page, allowing you to choose a specific product version from a selection, each of those product versions are considered a product variant.
Product variants usually differ in
colororsize.The following items are not considered product variants:
Other products.
Recommended products.
Different products within the same bundle of products.
Product add-ons, e.g. premium upgrades of a base product.
If only one “variant” is shown in the page, it is not considered a product variant.
Only variant-specific data is extracted as product variant details. For example, if variant-specific versions of the product description do not exist in the source webpage, the description attributes of the product variant are not filled with the base product description.
Extracted product variants may not include those that are not visible in the source webpage.
Product variant details may not include those that require multiple additional requests (e.g. 1 or more requests per variant).
There must not be duplicate variants.
- class zyte_common_items.ProductVariant(**kwargs)
Productvariant.See
Product.variants,ProductVariantExtractor,ProductVariantSelectorExtractor.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- additionalProperties: List[AdditionalProperty] | None
List of name-value pais of data about a specific, otherwise unmapped feature.
Additional properties usually appear in product pages in the form of a specification table or a free-form specification list.
Additional properties that require 1 or more extra requests may not be extracted.
See also
features.
- availability: str | None
Availability status.
The value is expected to be one of:
"InStock","OutOfStock".
- currencyRaw: str | None
Price currency as it appears on the webpage (no post-processing), e.g.
"$".See also
currency.
- gtin: List[Gtin] | None
List of standardized GTIN product identifiers associated with the product, which are unique for the product across different sellers.
See also:
mpn,productId,sku.
- images: List[Image] | None
All product images.
The main image (see
mainImage) should be first in the list.Images only displayed as part of the product description are excluded.
- mpn: str | None
Manufacturer part number (MPN).
A product should have the same MPN across different e-commerce websites.
See also:
gtin,productId,sku.
- price: str | None
Price at which the product is being offered.
It is a string with the price amount, with a full stop as decimal separator, and no thousands separator or currency (see
currencyandcurrencyRaw), e.g."10500.99".If
regularPriceis notNone,priceshould always be lower thanregularPrice.
- productId: str | None
Product identifier, unique within an e-commerce website.
It may come in the form of an SKU or any other identifier, a hash, or even a URL.
See also:
gtin,mpn,sku.
- regularPrice: str | None
Price at which the product was being offered in the past, and which is presented as a reference next to the current price.
It may be labeled as the original price, the list price, or the maximum retail price for which the product is sold.
See
pricefor format details.If
regularPriceis notNone, it should always be higher thanprice.
- size: str | None
Size or dimensions.
Pertinent to products such as garments, shoes, accessories, etc.
It is extracted as displayed (e.g.
"XL").See also:
color,style.
- sku: str | None
Stock keeping unit (SKU) identifier, i.e. a merchant-specific product identifier.
See also:
gtin,mpn,productId.
- class zyte_common_items.ProductMetadata(**kwargs)
Metadata class for
zyte_common_items.Product.metadata.- dateDownloaded: str | None
Date and time when the product data was downloaded, in UTC timezone and the following format:
YYYY-MM-DDThh:mm:ssZ.
- probability: float | None
The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.
For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).
Product list
- class zyte_common_items.ProductList(**kwargs)
Product list from a product listing page of an e-commerce webpage.
It represents, for example, a single page from a category.
urlis the only required attribute.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- breadcrumbs: List[Breadcrumb] | None
Webpage breadcrumb trail.
- categoryName: str | None
Name of the product listing as it appears on the webpage (no post-processing).
For example, if the webpage is one of the pages of the Robots category,
categoryNameis'Robots'.
- metadata: ProductListMetadata | None
Data extraction process metadata.
- pageNumber: int | None
Current page number, if displayed explicitly on the list page.
Numeration starts with 1.
- products: List[ProductFromList] | None
List of products.
It only includes product information found in the product listing page itself. Product information that requires visiting each product URL is not meant to be covered.
The order of the products reflects their position on the rendered page. Product order is top-to-bottom, and left-to-right or right-to-left depending on the webpage locale.
- class zyte_common_items.ProductFromList(**kwargs)
Product from a product list from a product listing page of an e-commerce webpage.
See
ProductList,ProductFromListExtractor,ProductFromListSelectorExtractor.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- currencyRaw: str | None
Price currency as it appears on the webpage (no post-processing), e.g.
"$".See also
currency.
- metadata: ProbabilityMetadata | None
Data extraction process metadata.
- price: str | None
Price at which the product is being offered.
It is a string with the price amount, with a full stop as decimal separator, and no thousands separator or currency (see
currencyandcurrencyRaw), e.g."10500.99".If
regularPriceis notNone,priceshould always be lower thanregularPrice.
- productId: str | None
Product identifier, unique within an e-commerce website.
It may come in the form of an SKU or any other identifier, a hash, or even a URL.
- regularPrice: str | None
Price at which the product was being offered in the past, and which is presented as a reference next to the current price.
It may be labeled as the original price, the list price, or the maximum retail price for which the product is sold.
See
pricefor format details.If
regularPriceis notNone, it should always be higher thanprice.
- class zyte_common_items.ProductListMetadata(**kwargs)
Metadata class for
zyte_common_items.ProductList.metadata.
Article
- class zyte_common_items.Article(**kwargs)
Article, typically seen on online news websites, blogs, or announcement sections.
urlis the only required attribute.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- articleBody: str | None
Clean text of the article, including sub-headings, with newline separators.
Format:
trimmed (no whitespace at the beginning or the end of the body string),
line breaks included,
no length limit,
no normalization of Unicode characters.
- articleBodyHtml: str | None
Simplified and standardized HTML of the article, including sub-headings, image captions and embedded content (videos, tweets, etc.).
Format: HTML string normalized in a consistent way.
- breadcrumbs: List[Breadcrumb] | None
Webpage breadcrumb trail.
- dateModified: str | None
Date when the article was most recently modified.
Format: ISO 8601 format: “YYYY-MM-DDThh:mm:ssZ” or “YYYY-MM-DDThh:mm:ss±zz:zz”.
With timezone, if available.
- dateModifiedRaw: str | None
Same date as
dateModified, but before parsing/normalization, i.e. as it appears on the website.
- datePublished: str | None
Publication date of the article.
Format: ISO 8601 format: “YYYY-MM-DDThh:mm:ssZ” or “YYYY-MM-DDThh:mm:ss±zz:zz”.
With timezone, if available.
If the actual publication date is not found, the value of
dateModifiedis used instead.
- datePublishedRaw: str | None
Same date as
datePublished, but before parsing/normalization, i.e. as it appears on the website.
- description: str | None
A short summary of the article.
It can be either human-provided (if available), or auto-generated.
- inLanguage: str | None
Language of the article, as an ISO 639-1 language code.
Sometimes the article language is not the same as the web page overall language.
- metadata: ArticleMetadata | None
Data extraction process metadata.
- class zyte_common_items.ArticleMetadata(**kwargs)
Metadata class for
zyte_common_items.Article.metadata.- dateDownloaded: str | None
Date and time when the product data was downloaded, in UTC timezone and the following format:
YYYY-MM-DDThh:mm:ssZ.
- probability: float | None
The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.
For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).
Article list
- class zyte_common_items.ArticleList(**kwargs)
Article list from an article listing page.
The
urlattribute is the only required attribute, all other fields are optional.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- articles: List[ArticleFromList] | None
List of article details found on the page.
The order of the articles reflects their position on the page.
- breadcrumbs: List[Breadcrumb] | None
Webpage breadcrumb trail.
- metadata: ArticleListMetadata | None
Data extraction process metadata.
- class zyte_common_items.ArticleFromList(**kwargs)
Article from an article list from an article listing page.
See
ArticleList.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- articleBody: str | None
Clean text of the article, including sub-headings, with newline separators.
Format:
trimmed (no whitespace at the beginning or the end of the body string),
line breaks included,
no length limit,
no normalization of Unicode characters.
- datePublished: str | None
Publication date of the article.
Format: ISO 8601 format: “YYYY-MM-DDThh:mm:ssZ” or “YYYY-MM-DDThh:mm:ss±zz:zz”.
With timezone, if available.
If the actual publication date is not found, the date of the last modification is used instead.
- datePublishedRaw: str | None
Same date as
datePublished, but before parsing/normalization, i.e. as it appears on the website.
- inLanguage: str | None
Language of the article, as an ISO 639-1 language code.
Sometimes the article language is not the same as the web page overall language.
- metadata: ProbabilityMetadata | None
Data extraction process metadata.
- class zyte_common_items.ArticleListMetadata(**kwargs)
Metadata class for
zyte_common_items.ArticleList.metadata.
Business place
- class zyte_common_items.BusinessPlace(**kwargs)
Business place, with properties typically seen on maps or business listings.
urlis the only required attribute.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- actions: List[NamedLink] | None
List of actions that can be performed directly from the URLs on the place page, including URLs.
- additionalProperties: List[AdditionalProperty] | None
List of name-value pais of any unmapped additional properties specific to the place.
- aggregateRating: AggregateRating | None
The overall rating, based on a collection of reviews or ratings.
- containedInPlace: ParentPlace | None
If the place is located inside another place, these are the details of the parent place.
- metadata: BusinessPlaceMetadata | None
Data extraction process metadata.
- openingHours: List[OpeningHoursItem] | None
Ordered specification of opening hours, including data for opening and closing time for each day of the week.
- priceRange: str | None
How is the price range of the place viewed by its customers (from z to zzzz).
- reservationAction: NamedLink | None
The details of the reservation action, e.g. table reservation in case of restaurants or room reservation in case of hotels.
- starRating: StarRating | None
Official star rating of the place.
- timezone: str | None
Which timezone is the place situated in.
Standard: Name compliant with IANA tz database (tzdata).
- class zyte_common_items.BusinessPlaceMetadata(**kwargs)
Metadata class for
zyte_common_items.BusinessPlace.metadata.- dateDownloaded: str | None
Date and time when the product data was downloaded, in UTC timezone and the following format:
YYYY-MM-DDThh:mm:ssZ.
- probability: float | None
The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.
For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).
Real estate
- class zyte_common_items.RealEstate(**kwargs)
Real state offer, typically seen on real estate offer aggregator websites.
urlis the only required attribute.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- additionalProperties: List[AdditionalProperty] | None
A name-value pair field holding information pertaining to specific features. Usually in a form of a specification table or freeform specification list.
- area: RealEstateArea | None
Real estate area details.
- breadcrumbs: List[Breadcrumb] | None
Webpage breadcrumb trail.
- currencyRaw: str | None
Currency associated with the price, as appears on the page (no post-processing).
- datePublished: str | None
Publication date of the real estate offer.
Format: ISO 8601 format: “YYYY-MM-DDThh:mm:ssZ”
With timezone, if available.
- datePublishedRaw: str | None
Same date as datePublished, but before parsing/normalization, i.e. as it appears on the website.
- description: str | None
The description of the real estate.
Format:
trimmed (no whitespace at the beginning or the end of the description string),
line breaks included,
no length limit,
no normalization of Unicode characters,
no concatenation of description from different parts of the page.
- metadata: RealEstateMetadata | None
Contains metadata about the data extraction process.
- numberOfRooms: int | None
The number of rooms (excluding bathrooms and closets) of the real estate.
- realEstateId: str | None
The identifier of the real estate, usually assigned by the seller and unique within a website, similar to product SKU.
- class zyte_common_items.RealEstateMetadata(**kwargs)
Metadata class for
zyte_common_items.RealEstate.metadata.- dateDownloaded: str | None
Date and time when the product data was downloaded, in UTC timezone and the following format:
YYYY-MM-DDThh:mm:ssZ.
- probability: float | None
The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.
For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).
Job posting
- class zyte_common_items.JobPosting(**kwargs)
A job posting, typically seen on job posting websites or websites of companies that are hiring.
urlis the only required attribute.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- baseSalary: BaseSalary | None
The base salary of the job or of an employee in the proposed role.
- dateModified: str | None
The date when the job posting was most recently modified.
Format: ISO 8601 format: “YYYY-MM-DDThh:mm:ssZ”
With timezone, if available.
- dateModifiedRaw: str | None
Same date as dateModified, but before parsing/normalization, i.e. as it appears on the website.
- datePublished: str | None
Publication date of the job posting.
Format: ISO 8601 format: “YYYY-MM-DDThh:mm:ssZ”
With timezone, if available.
- datePublishedRaw: str | None
Same date as datePublished, but before parsing/normalization, i.e. as it appears on the website.
- description: str | None
A description of the job posting including sub-headings, with newline separators.
Format:
trimmed (no whitespace at the beginning or the end of the description string),
line breaks included,
no length limit,
no normalization of Unicode characters.
- descriptionHtml: str | None
Simplified HTML of the description, including sub-headings, image captions and embedded content.
- employmentType: str | None
Type of employment (e.g. full-time, part-time, contract, temporary, seasonal, internship).
- hiringOrganization: HiringOrganization | None
Information about the organization offering the job position.
- jobLocation: JobLocation | None
A (typically single) geographic location associated with the job position.
- jobStartDate: str | None
Job start date.
Format: ISO 8601 format: “YYYY-MM-DDThh:mm:ssZ”
With timezone, if available.
- jobStartDateRaw: str | None
Same date as jobStartDate, but before parsing/normalization, i.e. as it appears on the website.
- metadata: JobPostingMetadata | None
Contains metadata about the data extraction process.
- class zyte_common_items.JobPostingMetadata(**kwargs)
Metadata class for
zyte_common_items.JobPosting.metadata.- dateDownloaded: str | None
Date and time when the product data was downloaded, in UTC timezone and the following format:
YYYY-MM-DDThh:mm:ssZ.
- probability: float | None
The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.
For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).
Search engine result
- class zyte_common_items.SerpOrganicResult(**kwargs)
Data from a non-paid result of a search engine results page.
- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- displayedUrlText: str | None
Text displayed to represent
url.It may not be an actual URL, but some stylized or simplified representation of it. For example, if
urlis https://en.wikipedia.org/wiki/Foobar,displayedUrlTextcould be something like"https://en.wikipedia.org › wiki › Foobar".
Search engine results
- class zyte_common_items.Serp(**kwargs)
Data from a search engine results page.
- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- metadata: SerpMetadata | None
Contains metadata about the data extraction process.
- organicResults: List[SerpOrganicResult] | None
List of search results excluding paid results.
- class zyte_common_items.SerpMetadata(**kwargs)
Metadata class for
zyte_common_items.Serp.metadata.
Forum thread
- class zyte_common_items.ForumThread(**kwargs)
Represents a forum thread page.
- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- metadata: ForumThreadMetadata | None
Contains metadata about the data extraction process.
- posts: List[SocialMediaPost] | None
List of posts available on the page, including the first or top post.
- class zyte_common_items.ForumThreadMetadata(**kwargs)
Metadata class for
zyte_common_items.ForumThread.metadata.
Search Request templates
- class zyte_common_items.SearchRequestTemplate(**kwargs)
Request template to build a search
Request.- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- request(*, query: str | ~typing.Any = <object object>, keyword: str | ~typing.Any = <object object>) Request
Return a
Requestto search for keyword.
- body: str | None
Jinja template for
Request.body.It must be a plain
str, notbytesor a Base64-encodedstr. Base64-encoding is done byrequest()after rendering this value as a Jinja template.Defining a non-UTF-8 body is not supported.
- headers: List[Header] | None
List of
Header, forRequest.headers, where everynameandvalueis a Jinja template.When a header name template renders into an empty string (after stripping spacing), that header is removed from the resulting list of headers.
- metadata: SearchRequestTemplateMetadata | None
Data extraction process metadata.
- class zyte_common_items.SearchRequestTemplateMetadata(**kwargs)
Metadata class for
zyte_common_items.SearchRequestTemplate.metadata.- dateDownloaded: str | None
Date and time when the product data was downloaded, in UTC timezone and the following format:
YYYY-MM-DDThh:mm:ssZ.
- probability: float | None
The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.
For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).
Custom attributes
- class zyte_common_items.CustomAttributes(**kwargs)
Extracted custom attribute values and metadata.
- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- metadata: CustomAttributesMetadata
Custom attribute extraction metadata.
- values: CustomAttributesValues
Custom attribute values.
- class zyte_common_items.CustomAttributesValues(**kwargs)
-
Container for custom attribute values.
- class zyte_common_items.CustomAttributesMetadata(**kwargs)
Custom attribute extraction metadata.
- classmethod from_list(items: List[Dict] | None, *, trail: str | None = None) List
Read items from a list.
- error: str | None
Error message, if any.
The
extraction/unparsable-responseerror is given when the LLM response could not be parsed or recovered. If this error happens, we suggest simplifying the task or reducing the number of attributes.The
extraction/schema-size-exceedederror is given when the schema did not fit into the input limits, leaving no space for the input text, and therefore the LLM could not be used. If this error happens, we suggest either making the schema smaller (fewer attributes and/or shorter descriptions), or increasingmaxInputTokens.
- excludedPIIAttributes: List[str] | None
A list of all attributes dropped from the output due to a risk of PII (Personally Identifiable Information) extraction.
- inputTokens: int | None
Total number of used input tokens, excluding our internal fixed prompt with the LLM instruction, when using the “generate” method.
- maxInputTokens: int | None
Maximum number of allowed input tokens for the model, when using the “generate” method.
Custom items
Subclass Item to create your own item classes.
- class zyte_common_items.base.ProbabilityMixin(**kwargs)
Provides
get_probability()to make it easier to access the probability of an item or item component that is nested under its metadata attribute.
Social media post
Represents a single social media post.
Read an item from a dictionary.
Read items from a list.
Returns the item probability if available, otherwise
None.Details of the author of the post.
No easily identifiable information can be contained in here, such as usernames.
The timestamp at which the post was created.
Format: Timezone: UTC. ISO 8601 format: “YYYY-MM-DDThh:mm:ssZ”
The list of hashtags contained in the post.
The list of URLs of media files (images, videos, etc.) linked from the post.
Contains metadata about the data extraction process.
The identifier of the post.
Details of reactions to the post.
The text content of the post.
The URL of the final response, after any redirects.
Metadata class for
zyte_common_items.SocialMediaPost.metadata.Date and time when the product data was downloaded, in UTC timezone and the following format:
YYYY-MM-DDThh:mm:ssZ.The probability (0 for 0%, 1 for 100%) that the resource features the expected data type.
For example, if the extraction of a product from a given URL is requested, and that URL points to the webpage of a product with complete certainty, the value should be 1. If with complete certainty the webpage features a job listing instead of a product, the value should be 0. When there is no complete certainty, the value could be anything in between (e.g. 0.96).
The search text used to find the item.
Contains paths to fields with the description of issues found with their values.