Web scraping and automated interaction with websites in Ruby using Watir WebDriver - a tutorial with exercises and solutions
This post is based on a Learning Hour I ran at FutureLearn
Repository with instructions, exercises and tutorials
Watir WebDriver can be used to give instructions to your browser in Ruby.
Watir WebDriver is popular for testing. In this post and the exercises we focus on using Watir for interacting with websites to browse through them and read data. Watir WebDriver uses Selenium-WebDriver under the hood.
The Watir syntax is very idiomatic. Once you create an instance of a Watir browser by calling
Watir::Browser.new you can send it commands, such as
browser.goto 'http://www.google.com'. If not run headless, which is the default, you can see a browser window and follow the executed instructions.
You can combine manual browsing with commands in Ruby. Verify how the current page, even after manual browsing is tracked by Watir. You can verify this by manually browsing to a different page from your Watir Browser window and calling
The hardest part is often finding how to uniquely identify and access elements of interest on a site. Elements can be found by their HTML attributes, such as
browser.button(:name => 'submit'), or by using xPath. Often there is more than one way to find information. Note that you can also ask Watir to return all types of an element on a page, e.g.
browser.h1s will return all
h1 headers on the present page. You can iterate over the returned results with Ruby's built in methods, such as
A caveat when working with Watir WebDriver are compatibility issues with Firefox or Chrome. Firefox updates constantly and new versions might no longer work with your version of Watir WebDriver. This is usually to the underlying version of Selenium WebDriver. Constant updates from Selenium are usually quick to remedy the compatibility issues. When in doubt use an older version of Firefox with the latest version of WebDriver.
New WebDriver browser instances will open new Firefox windows as separately running programs in your dock. They have the process name
firefox-bin and they can all be closed with the command $ killall firefox-bin` to avoid closing them manually.
Get started with Watir WebDriver
- install the gem
watir-webdriver. I created a repository with exercises and instructions for how to get started.
- include the library:
- create an instance of a new browser:
browser = Watir::Browser.new
- direct the browser to a url:
- Fill out a simple form:
require 'watir-webdriver' browser = Watir::Browser.new browser.goto 'bit.ly/watir-webdriver-demo' browser.text_field(:id => 'entry_1000000').set 'your name' browser.select_list(:id => 'entry_1000001').select 'Ruby' browser.select_list(:id => 'entry_1000001').selected? 'Ruby' browser.button(:name => 'submit').click browser.text.include? 'Thank you' # An example from https://watirwebdriver.com/
- Get the names of Coursera's partners and write them to a file:
require 'watir-webdriver' browser = Watir::Browser.new browser.goto 'https://www.coursera.org/about/partners' # Coursera lists they have 145 partners. browser.divs(class: 'partner-box-wrapper card-one-clicker flex-1').count # => 145 indeed # Get all partners names scraped_list = browser.divs(class: 'partner-box-wrapper card-one-clicker flex-1').map(&:text) class FileMaker require 'csv' attr_reader :scraped_list def initialize(scraped_list) @scraped_list = scraped_list end def save_file File.open("scraped_list.txt", "w+") do |file| file.puts scraped_list end end end FileMaker.new(scraped_list).save_file
- Download this git repository
$ git clone [email protected]:neudabei/watir_learning_hour.git
$ cd watir_learning_hourand run
$ bundle install
- Install Firefox: https://download.mozilla.org/?product=firefox-47.0.1-SSL&os=osx&lang=en-US (alternatively see these instructions for using Watir WebDriver with Chrome: https://watirwebdriver.com/chrome/ . It requires the download of a binary which has to be put in your path.)
- Watir documentation: https://watir.com/documentation/ and https://watirwebdriver.com/
- Watir source code: https://github.com/watir/watir-webdriver/
- A tutorial for interacting with Watir: https://www.sitepoint.com/watir-webdriver-control-browser/
- A freemium ebook on Watir: https://leanpub.com/watirways/read
- An ebook on Watir: https://leanpub.com/watir-recipes