Anthony Chu Contact Me

Running headless Chromium in Azure Functions with Puppeteer and Playwright

Monday, August 17, 2020

With a recent update to Azure Functions, it is now possible to run headless Chromium in the Linux Consumption plan. This enables some serverless browser automation scenarios using popular frameworks such as Puppeteer and Playwright.

Browser automation with Puppeteer and Playwright

Browser automation has been around for a long time. Selenium WebDriver was a pioneer in this space. More recently, Puppeteer and Playwright have gained in popularity. The two frameworks are very similar. Google maintains Puppeteer and Microsoft maintains Playwright. It's interesting to note that some of the folks who worked on Puppeteer are now working on Playwright.

Puppeteer and Playwright each support a different set of browsers. Both of them can automate Chromium. They automatically install Chromium and can use it without extra configuration.

Azure Functions support for headless Chromium

It's been a challenge to run headless Chromium on Azure Functions, especially in the Consumption (serverless) plan. Until now, the only way to run it has been by using a custom Docker image on the Premium plan.

Very recently, the necessary dependencies to run headless Chromium were added to the Azure Functions Linux Consumption environment. This means that we can simply npm install Puppeteer or Playwright in a Node.js function app to start using one of those frameworks to interact with Chromium.

Use Puppeteer and Playwright in Azure Functions

It's pretty straightforward to run either Puppeteer or Playwright in Azure Functions. We use npm to install it. Note that because it is needed at run-time, we should install the package as a production dependency. In the examples below, we use Puppeteer/Playwright with headless Chromium in an HTTP triggered function to open a web page and return a screenshot.

Puppeteer

# also installs and uses Chromium by default
npm install puppeteer
const puppeteer = require("puppeteer");

module.exports = async function (context, req) {
    const url = req.query.url || "https://google.com/";
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto(url);
    const screenshotBuffer = await page.screenshot({ fullPage: true });
    await browser.close();

    context.res = {
        body: screenshotBuffer,
        headers: {
            "content-type": "image/png"
        }
    };
};

Playwright

Note: Playwright 1.4.0 requires some dependencies that are not installed in the Linux Consumption plan. Use 1.3.0 until this issue is resolved.

# the default playwright package installs Chromium, Firefox, and WebKit
# use playwright-chromium if we only need Chromium
npm install playwright-chromium@1.3.0
const { chromium } = require("playwright-chromium");

module.exports = async function (context, req) {
    const url = req.query.url || "https://google.com/";
    const browser =  await chromium.launch();
    const page = await browser.newPage();
    await page.goto(url);
    const screenshotBuffer = await page.screenshot({ fullPage: true });
    await browser.close();

    context.res = {
        body: screenshotBuffer,
        headers: {
            "content-type": "image/png"
        }
    };
};

For the full source, take a look at this repo. When we run the function app locally and visit http://localhost:7071/api/screenshot?url=https://bing.com/, we get back a screenshot of the page.

puppeteer screenshot

Deploy to Azure

Since we're deploying to a Linux environment, we have to make sure we run npm install in Linux so it downloads a version of Chromium that matches the deployment target. Thankfully, Azure Functions supports remote build so that the app is built in the correct Linux environment during deployment, even though we might be developing locally in macOS or Windows.

Configuring VS Code for remote build

If we are deploying using Azure Functions Core Tools, we can skip this step.

By default, the Azure Functions VS Code extension will deploy the app using local build, which means it'll run npm install locally and deploy the app package. For remote build, we update the app's .vscode/settings.json to enable scmDoBuildDuringDeployment.

{
    "azureFunctions.deploySubpath": ".",
    "azureFunctions.projectLanguage": "JavaScript",
    "azureFunctions.projectRuntime": "~3",
    "debug.internalConsoleOptions": "neverOpen",
    "azureFunctions.scmDoBuildDuringDeployment": true
}

We can also remove the postDeployTask and preDeployTask settings that runs npm commands before and after the deployment; they're not needed because we're running the build remotely.

And because we're running npm install remotely, we can add node_modules to .funcignore. This excludes the node_modules folder from the deployment package to make the upload as small as possible.

Creating a Linux Consumption function app

We can use any tool, such as the Azure Portal or VS Code, to create a Node.js 12 Linux Consumption function app in Azure that we'll deploy the app to.

Configuring Chromium download location (Playwright only)

By default, Playwright downloads Chromium to a location outside the function app's folder. In order to include Chromium in the build artifacts, we need to instruct Playwright to install Chromium in the app's node_modules folder. To do this, create an app setting named PLAYWRIGHT_BROWSERS_PATH with a value of 0 in the function app in Azure. This setting is also used by Playwright at run-time to locate Chromium in node_modules.

Publishing the app

If using VS Code, we can use the Azure Functions: Deploy to Function App... command to publish the app. It'll recognize the settings we configured earlier and use remote build.

If using Azure Functions Core Tools, we need to run the command with the --build remote flag:

func azure functionapp publish $appName --build remote

And that's it! We've deployed a consumption Azure Functions app that uses Puppeteer or Playwright to interact with Chromium!

Resources