Headless Browsing Recipe
Setup Selenium + Chromium + Chromedriver + Docker
This article discusses how to quickly setup with Selenium + Chrome in a Docker context. Chromium 73+ supports headless mode making it perfect for use in Docker (no need for an X virtual server).
Dockerfile
We use python as our base image so our container comes bundled with pip and python, but we could have just as easily used any Ubuntu or Debian image.
# Define the base image
FROM python:3.6-stretch
# Install jupyter with pip so we can serve notebooks
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install jupyter
# Install Chromium
RUN apt update
RUN apt --assume-yes install chromium chromium-l10n
# Create a notebooks directory within the container and make it our working directory
WORKDIR /notebooks
# Create a chrome directory to host our chromedriver
RUN mkdir ../chrome
# Copy the chromedriver file from host to our container
COPY chromedriver ../chrome
# Starts jupyter notebooks # ip flag is to resolve issues
CMD jupyter notebook --ip=0.0.0.0 --port=8080 --allow-root
There are a couple things to note here:
We install chromium on line 10. I believe the version of Chromium that is installed in this Dockerfile is 73. To install chrome/chromium binaries see: https://chromium.woolyss.com/
Selenium needs a browser-specific webdriver to drive it. For chrome/chromium we add a chromedriver. You want to match the version of chromedriver to the version of chromium installed. We find the binary for the chrome at this link: https://chromedriver.storage.googleapis.com/73.0.3683.68/chromedriver_linux64.zip
More chromedrivers: https://sites.google.com/a/chromium.org/chromedriver/downloads
Troubleshoot
Verify that chromium was properly installed by running bash in the container and launching chromium.
$ docker exec -it <container name/hash> bash
$ chromium --headless --no-sandbox --disable-gpu
You might want to run it with the --disable-gpu and --headless flags
Automation with Selenium
# Install selenium
! pip install selenium
import os
import time
import re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
# Context
context = 'linux' # or mac
chrome_options = Options()
### CONFIG - SETUP CHROME
# Default configuration for headless chrome on linux/container
chromedriver = './chromedriver-linux'
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--remote-debugging-port=9222")
# Change some stuff up if mac
if context is 'mac':
chromedriver = '../chrome/chromedriver'
chrome_options.arguments.remove("--headless")
chrome_options.binary_location = '../chrome/Chromium.app/Contents/MacOS/Chromium'
driver = webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options)
driver.get("https://test-mohammed-2-1162831764.us-east-2.elb.amazonaws.com")
In most browser automation work - we want to run non-headless on the host machine (Mac) and then port the recipe over to our headless Docker context. The notebook configures the chromeoptions object to use a different binary of Chrome & Chromedriver dependent on the context.
Github Repo
To see a ready-to-go example of this clone the following repo and run the commands listed under Usage:
https://github.com/mohammedabdulwahhab/chrome-automation-notebooks
Last updated
Was this helpful?