Request templates
Request templates are items that
allow writing reusable code that creates Request
objects from parameters.
Using request templates
After you write a request template page object for a website, you can get a request template
item for that website and call its request
method to build a request with
specific parameters. For example:
from scrapy import Request, Spider
from scrapy_poet import DummyResponse
from zyte_common_items import SearchRequestTemplate
class ExampleComSpider(Spider):
name = "example_com"
def start_requests(self):
yield Request("https://example.com", callback=self.start_search)
def start_search(
self, response: DummyResponse, search_request_template: SearchRequestTemplate
):
yield search_request_template.request(query="foo bar").to_scrapy(
callback=self.parse_result
)
def parse_result(self, response): ...
search_request_template.request(query="foo bar")
builds a
Request
object, e.g. with URL
https://example.com/search?q=foo+bar
.
Writing a request template page object
To enable building a request template for a given website, build a page object for that website that returns the corresponding request template item class. For example:
from web_poet import handle_urls
from zyte_common_items import BaseSearchRequestTemplatePage
@handle_urls("example.com")
class ExampleComSearchRequestTemplatePage(BaseSearchRequestTemplatePage):
@field
def url(self):
return "https://example.com/search?q={{ query|quote_plus }}"
Strings returned by request template page object fields are Jinja
templates, and may use the query arguments of the
request
method of the corresponding request template item class.
Often, you only need to build a URL template by figuring out where request
parameters go and using the right URL-encoding filter,
urlencode()
or quote_plus()
, depending
on how spaces are encoded:
Example search URL for “foo bar” |
URL template |
---|---|
|
|
|
You can use any of Jinja’s built-in filters, plus
quote_plus()
, and all other Jinja features. Jinja enables
very complex scenarios:
class ComplexSearchRequestTemplatePage(BaseSearchRequestTemplatePage):
@field
def url(self):
return """
{%-
if query|length > 1
and query[0]|lower == 'p'
and query[1:]|int(-1) != -1
-%}
https://example.com/p/{{ query|upper }}
{%- else -%}
https://example.com/search
{%- endif -%}
"""
@field
def method(self):
return """
{%-
if query|length > 1
and query[0]|lower == 'p'
and query[1:]|int(-1) != -1
-%}
GET
{%- else -%}
POST
{%- endif -%}
"""
@field
def body(self):
return """
{%-
if query|length > 1
and query[0]|lower == 'p'
and query[1:]|int(-1) != -1
-%}
{%- else -%}
{"query": {{ query|tojson }}}
{%- endif -%}
"""
@field
def headers(self):
return [
Header(
name=(
"""
{%-
if query|length > 1
and query[0]|lower == 'p'
and query[1:]|int(-1) != -1
-%}
{%- else -%}
Query
{%- endif -%}
"""
),
value="{{ query }}",
),
]