Headless Browsing Recipe

Setup Selenium + Chromium + Chromedriver + Docker

This article discusses how to quickly setup with Selenium + Chrome in a Docker context. Chromium 73+ supports headless mode making it perfect for use in Docker (no need for an X virtual server).

Dockerfile

We use python as our base image so our container comes bundled with pip and python, but we could have just as easily used any Ubuntu or Debian image.

# Define the base image
FROM python:3.6-stretch

# Install jupyter with pip so we can serve notebooks
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install jupyter

# Install Chromium
RUN apt update
RUN apt --assume-yes install chromium chromium-l10n

# Create a notebooks directory within the container and make it our working directory
WORKDIR /notebooks

# Create a chrome directory to host our chromedriver
RUN mkdir ../chrome

# Copy the chromedriver file from host to our container
COPY chromedriver ../chrome

# Starts jupyter notebooks # ip flag is to resolve issues
CMD jupyter notebook --ip=0.0.0.0 --port=8080 --allow-root

There are a couple things to note here:

We install chromium on line 10. I believe the version of Chromium that is installed in this Dockerfile is 73. To install chrome/chromium binaries see: https://chromium.woolyss.com/
Selenium needs a browser-specific webdriver to drive it. For chrome/chromium we add a chromedriver. You want to match the version of chromedriver to the version of chromium installed. We find the binary for the chrome at this link: https://chromedriver.storage.googleapis.com/73.0.3683.68/chromedriver_linux64.zip
More chromedrivers: https://sites.google.com/a/chromium.org/chromedriver/downloads

Troubleshoot

Verify that chromium was properly installed by running bash in the container and launching chromium.

$ docker exec -it <container name/hash> bash
$ chromium --headless --no-sandbox --disable-gpu

You might want to run it with the --disable-gpu and --headless flags

Automation with Selenium

# Install selenium
! pip install selenium

notebook.py

import os  
import time
import re
from selenium import webdriver  
from selenium.webdriver.common.keys import Keys  
from selenium.webdriver.chrome.options import Options  
from selenium.webdriver.common.by import By

# Context
context = 'linux' # or mac

chrome_options = Options()

### CONFIG - SETUP CHROME

# Default configuration for headless chrome on linux/container
chromedriver = './chromedriver-linux'
chrome_options.add_argument("--headless") 
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--remote-debugging-port=9222")

# Change some stuff up if mac
if context is 'mac':
    chromedriver = '../chrome/chromedriver'
    chrome_options.arguments.remove("--headless")
    chrome_options.binary_location = '../chrome/Chromium.app/Contents/MacOS/Chromium'
    

driver = webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options)
driver.get("https://test-mohammed-2-1162831764.us-east-2.elb.amazonaws.com")

In most browser automation work - we want to run non-headless on the host machine (Mac) and then port the recipe over to our headless Docker context. The notebook configures the chromeoptions object to use a different binary of Chrome & Chromedriver dependent on the context.

Github Repo

To see a ready-to-go example of this clone the following repo and run the commands listed under Usage:

https://github.com/mohammedabdulwahhab/chrome-automation-notebooks

PreviousAutomation NextGit

Last updated 6 years ago

Was this helpful?