elm.web.html_pw.load_html_with_pw

async load_html_with_pw(url, browser_semaphore=None, timeout=90000, use_scrapling_stealth=False, load_state='networkidle', **pw_launch_kwargs)[source]

Extract HTML from URL using Playwright.

Parameters:
  • url (str) – URL to pull HTML for.

  • browser_semaphore (asyncio.Semaphore, optional) – Semaphore instance that can be used to limit the number of playwright browsers open concurrently. If None, no limits are applied. By default, None.

  • timeout (int, optional) – Maximum time to wait for page loading state time in milliseconds. Pass 0 to disable timeout. By default, 90,000.

  • use_scrapling_stealth (bool, default=False) – Option to use scrapling stealth scripts instead of tf-playwright-stealth. By default, False.

  • load_state (str, default=”networkidle”) –

    The load state to wait for. One of:

    • “load” - consider navigation to be finished when the load

      event is fired.

    • “domcontentloaded” - consider navigation to be finished

      when the DOMContentLoaded event is fired.

    • “networkidle” - consider navigation to be finished when

      there are no network connections for at least 500 ms.

    By default, "networkidle".

  • **pw_launch_kwargs – Keyword-value argument pairs to pass to async_playwright.chromium.launch().

Returns:

str – HTML from page.