Request templates

Request templates are items that allow writing reusable code that creates Request objects from parameters.

Using request templates

After you write a request template page object for a website, you can get a request template item for that website and call its request method to build a request with specific parameters. For example:

from scrapy import Request, Spider
from scrapy_poet import DummyResponse
from zyte_common_items import SearchRequestTemplate


class ExampleComSpider(Spider):
    name = "example_com"

    def start_requests(self):
        yield Request("https://example.com", callback=self.start_search)

    def start_search(
        self, response: DummyResponse, search_request_template: SearchRequestTemplate
    ):
        yield search_request_template.request(keyword="foo bar").to_scrapy(
            callback=self.parse_result
        )

    def parse_result(self, response): ...

search_request_template.request(keyword="foo bar") builds a Request object, e.g. with URL https://example.com/search?q=foo+bar.

Writing a request template page object

To enable building a request template for a given website, build a page object for that website that returns the corresponding request template item class. For example:

from web_poet import handle_urls
from zyte_common_items import SearchRequestTemplatePage


@handle_urls("example.com")
class ExampleComSearchRequestTemplatePage(SearchRequestTemplatePage):
    @field
    def url(self):
        return "https://example.com/search?q={{ keyword|quote_plus }}"

Strings returned by request template page object fields are Jinja templates, and may use the keyword arguments of the request method of the corresponding request template item class.

Often, you only need to build a URL template by figuring out where request parameters go and using the right URL-encoding filter, urlencode() or quote_plus(), depending on how spaces are encoded:

Example search URL for “foo bar”

URL template

https://example.com/?q=foo%20bar

https://example.com/?q={{ keyword|urlencode }}

https://example.com/?q=foo+bar

https://example.com/?q={{ keyword|quote_plus }}

You can use any of Jinja’s built-in filters, plus quote_plus(), and all other Jinja features. Jinja enables very complex scenarios:

class ComplexSearchRequestTemplatePage(SearchRequestTemplatePage):
    @field
    def url(self):
        return """
            {%-
                if keyword|length > 1
                and keyword[0]|lower == 'p'
                and keyword[1:]|int(-1) != -1
            -%}
                https://example.com/p/{{ keyword|upper }}
            {%- else -%}
                https://example.com/search
            {%- endif -%}
        """

    @field
    def method(self):
        return """
            {%-
                if keyword|length > 1
                and keyword[0]|lower == 'p'
                and keyword[1:]|int(-1) != -1
            -%}
                GET
            {%- else -%}
                POST
            {%- endif -%}
        """

    @field
    def body(self):
        return """
            {%-
                if keyword|length > 1
                and keyword[0]|lower == 'p'
                and keyword[1:]|int(-1) != -1
            -%}
            {%- else -%}
                {"query": {{ keyword|tojson }}}
            {%- endif -%}
        """

    @field
    def headers(self):
        return [
            Header(
                name=(
                    """
                        {%-
                            if keyword|length > 1
                            and keyword[0]|lower == 'p'
                            and keyword[1:]|int(-1) != -1
                        -%}
                        {%- else -%}
                            Query
                        {%- endif -%}
                    """
                ),
                value="{{ keyword }}",
            ),
        ]