Give us a call: (800) 252-6164
PHP · Selenium WebDriver · JavaScript-Rendered Pages

DOWNLOAD A WEBPAGE
Using Selenium & PHP (including JS-rendered text)

Need to download a webpage with PHP when the site relies on JavaScript? This tutorial shows how to control Chrome with Selenium WebDriver in PHP, navigate to a URL, and extract visible text (including content generated after load).

  • Run Chrome via Selenium
  • Load JS-heavy pages
  • Extract visible page text
  • Ship a clean starter script
Prefer YouTube? Watch the video here.

What you’ll build

You’ll create a small PHP script that connects to a Selenium server, launches Chrome, navigates to a target URL, and runs a JavaScript helper that returns all visible text on the page (including content rendered by JavaScript).

When to use Selenium: If the page content is incomplete when fetched via cURL because it’s rendered client-side (React/Vue/Angular, infinite scroll, “load more”, etc.), Selenium is often the fastest path to a reliable first version.

Quickstart (copy/paste)

If you already have Selenium running, this is the simplest working script. Then we’ll break down each piece below.

<?php
use Facebook\WebDriver\Chrome\ChromeOptions;
use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;

require __DIR__ . '/vendor/autoload.php';

// ===== Settings =====
$seleniumHost = 'http://127.0.0.1:4444/wd/hub'; // common default (Grid/Standalone)
$windowSize   = '1280,1024';
$userAgent    = 'Selenium PHP Crawler';
$connTimeout  = 10 * 1000;  // ms
$reqTimeout   = 60 * 1000;  // ms

// ===== Chrome args =====
$args = [
  '--window-size=' . $windowSize,
  '--user-agent='  . $userAgent,
  // Optional hardening flags (often helpful in servers/containers):
  // '--headless=new',
  // '--disable-gpu',
  // '--no-sandbox',
  // '--disable-dev-shm-usage',
];

// ===== Options + capabilities =====
$chromeOptions = new ChromeOptions();
$chromeOptions->setExperimentalOption('w3c', false);
$chromeOptions->addArguments($args);

$capabilities = DesiredCapabilities::chrome();
$capabilities->setCapability(ChromeOptions::CAPABILITY, $chromeOptions);

// ===== Run =====
$driver = null;

try {
  $driver = RemoteWebDriver::create($seleniumHost, $capabilities, $connTimeout, $reqTimeout);

  $url = 'https://potentpages.com/';
  $driver->navigate()->to($url);

  // Wait a beat if needed (simple approach). For production, prefer explicit waits.
  usleep(350 * 1000);

  $script = file_get_contents(__DIR__ . '/getPageText.js');
  $text   = $driver->executeScript($script);

  echo $text . PHP_EOL;

} finally {
  if ($driver) {
    $driver->quit();
  }
}

Next you’ll create getPageText.js (provided below) and run the script from CLI.

Requirements

  • PHP CLI (recommended). Running via the web server can time out on slow pages.
  • Composer for dependency installation.
  • Selenium (Grid or standalone) running locally or on a VPS.
  • Chrome available to Selenium (usually via Selenium’s Docker images).
Need help installing Selenium? Potent Pages has a guide for installing Selenium & Docker on a VPS. See the Selenium + Docker install tutorial.

Step 1 — Install php-webdriver

The main library we’ll use is php-webdriver, which lets PHP connect to Selenium and control Chrome. In your project folder:

composer require php-webdriver/webdriver

This creates a vendor/ directory and an autoloader you’ll include in your script.

Step 2 — Start Selenium

You can run Selenium locally or on a server. For quick local testing, Selenium’s Docker images are the simplest path. Here’s a common example (choose an approach that matches your environment):

# Example (Docker): Selenium standalone Chrome
# docker run -d --rm -p 4444:4444 --shm-size="2g" selenium/standalone-chrome
Tip: If you see random crashes on bigger pages, increasing shared memory (shm-size) helps a lot in Docker.

Step 3 — Create the PHP script

1

Create a file

Create script.php in your project folder.

2

Add imports + autoload

Include Composer’s autoloader so the webdriver classes resolve.

3

Set Chrome options

Window size + user agent make debugging easier and reduce “it works on my machine” differences.

4

Create RemoteWebDriver

Connect to Selenium at http://host:4444/wd/hub (typical default).

5

Navigate to a URL

Use $driver->navigate()->to($url) to load the page in Chrome.

SEO note: This tutorial intentionally targets phrases like “PHP Selenium WebDriver”, “download a webpage using Selenium in PHP”, and “extract visible text from a JavaScript-rendered page” by using them naturally in headings and explanations.

Step 4 — Extract visible text (including JavaScript-rendered content)

Once the page is loaded, Selenium can run JavaScript in the browser context. We’ll use a small helper that selects the body and returns the browser’s “visible text” representation.

Create getPageText.js

function getVisibleText(element) {
  window.getSelection().removeAllRanges();

  const range = document.createRange();
  range.selectNode(element);
  window.getSelection().addRange(range);

  const visibleText = window.getSelection().toString().trim();
  window.getSelection().removeAllRanges();

  return visibleText;
}

return getVisibleText(document.body);

Run it from PHP

$script = file_get_contents(__DIR__ . '/getPageText.js');
$text   = $driver->executeScript($script);
echo $text . PHP_EOL;
Why this works: You’re extracting what Chrome considers selectable, visible page text—so it includes content injected after load, not just the initial HTML response.

Step 5 — Run the crawler

From your project folder:

php script.php

You should see the page’s visible text printed in your terminal.

Troubleshooting

  • “Could not connect to Selenium”: confirm Selenium is reachable on port 4444 and that your host URL is correct.
  • Blank/partial text: the page may still be rendering; add an explicit wait or a short delay before executing JS.
  • Docker Chrome crashes: increase --shm-size or add --disable-dev-shm-usage.
  • Hanging sessions: ensure $driver->quit() runs in a finally block (as in the quickstart).

Next steps (turn this into a real crawler)

This tutorial is a “first win.” Production crawlers usually add:

  • Explicit waits (wait for specific DOM elements instead of sleeping)
  • Retries + error capture (screenshots/HTML dumps on failure)
  • Extraction (parse specific fields instead of full text)
  • Scheduling (daily/weekly runs) + monitoring/alerts
  • Compliance + safety (robots, rate limits, polite behavior)

Need a maintained Selenium pipeline?

If your team needs repeatable collection (finance/law/enterprise), Potent Pages builds monitored crawlers that deliver structured outputs on schedule.

FAQ: Selenium + PHP Web Crawling

Common questions about downloading pages with Selenium in PHP, especially when content is rendered with JavaScript.

Why use Selenium instead of cURL for PHP web crawling? +

Use cURL when the HTML response contains the data you need. Use Selenium when the page renders content in the browser after load (client-side apps, “load more”, dynamic pricing/availability, authenticated flows).

Can Selenium extract text created by JavaScript? +

Yes. Selenium drives a real browser, so it can read DOM content after JavaScript runs. In this tutorial, we run a small JavaScript snippet to return the page’s visible text.

What’s the easiest way to run Selenium for this tutorial? +

For most developers, Docker + a Selenium standalone Chrome image is the quickest path for local testing. On a VPS, you’ll typically run Selenium/Grid and point your PHP script at that host.

How do I wait for the page to finish rendering? +

The best approach is an explicit wait: wait until a specific element exists or until a known selector contains text. For a quick demo, a short delay can work, but explicit waits are much more reliable.

Can Potent Pages build this as a maintained data pipeline? +

Yes—especially when you need scheduled runs, monitoring, alerting, and structured outputs (CSV/DB/API). That’s where “tutorial code” becomes production infrastructure.

David Selden-Treiman, Director of Operations at Potent Pages.

David Selden-Treiman is Director of Operations and a project manager at Potent Pages. He specializes in custom web crawler development, website optimization, server management, web application development, and custom programming. Working at Potent Pages since 2012 and programming since 2003, David has extensive expertise solving problems using programming for dozens of clients. He also has extensive experience managing and optimizing servers, managing dozens of servers for both Potent Pages and other clients.

Web Crawlers

Data Collection

There is a lot of data you can collect with a web crawler. Often, xpaths will be the easiest way to identify that info. However, you may also need to deal with AJAX-based data.

Development

Deciding whether to build in-house or finding a contractor will depend on your skillset and requirements. If you do decide to hire, there are a number of considerations you'll want to take into account.

It's important to understand the lifecycle of a web crawler development project whomever you decide to hire.

Web Crawler Industries

There are a lot of uses of web crawlers across industries to generate strategic advantages and alpha. Industries benefiting from web crawlers include:

Building Your Own

If you're looking to build your own web crawler, we have the best tutorials for your preferred programming language: Java, Node, PHP, and Python. We also track tutorials for Apache Nutch, Cheerio, and Scrapy.

Legality of Web Crawlers

Web crawlers are generally legal if used properly and respectfully.

GPT & Web Crawlers

GPTs like GPT4 are an excellent addition to web crawlers. GPT4 is more capable than GPT3.5, but not as cost effective especially in a large-scale web crawling context.

There are a number of ways to use GPT3.5 & GPT 4 in web crawlers, but the most common use for us is data analysis. GPTs can also help address some of the issues with large-scale web crawling.

Scroll To Top