how to

Scrape the web: using Google Sheets to import data

When web pages are properly formatted, you can easily use Google spreadsheets to import (“scrape”) their data. Here’s how.

I wanted to rank National Cancer Institute research expenditures by cancer type; on the website, the data are presented alphabetically.

NCI data

 

How to scape data into your spreadsheet for analysis

Log in to Google Drive and pick create new spreadsheet.

In cell A1 (upper-left hand corner), paste this:

=IMPORTHTML(“https://www.cancer.gov/about-nci/budget/fact-book/data/research-funding”,”table”, 1)

Wait a moment, and you should see something that looks like this:

scrape with google sheets import

 

Now that you have the data, you can make the comparisons – and charts – that interest you.

Be sure to freeze the header row before you begin sorting data:

  • Highlight the row (click on the row number, in this case, it’s 1)
  • Tap “View” and then select “Freeze”
  • I chose “2 rows” so that the total budget would remain fixed at the top

scrape data then freeze row google

 

How do you scrape data off a page that you’re interested in?

You need three things:

  • The page URL
  • The HTML container that holds the data (in this example, “table”)
  • And a number reflecting which table you want (in this example, it’s the first table)

To identify the HTML container, you can use the “inspect element” feature of your browser (Chrome, Safari*) by right-clicking the page near the title of the data. In well-formatted HTML, the container will be a table.

To identify which table (or div) contains the information I’m looking for, I copy

  • the page source HTML;
  • paste it into BBEdit;
  • then search for the HTML tag (in this case, I used <table — I don’t close the tag on search because sometimes there are CSS rules included in the tag space).

Here are additional resources:

 

How to enable “inspect element” in Safari

  1. Go to Safari > Preferences > Advanced
  2. Ensure that Show Develop menu in menu bar is enabled

    Safari Advanced Preferences

    Safari Advanced Preferences

  3. You now have a new menu tab and an advanced set of tools at your disposal.

    Safari's developer tools

    Safari’s developer tools

  4. Inspect element by right-clicking in the page

    inspect element

    Right-click to launch the inspect element dialog

  5. Move up or down the page to see the element you’re interested in.
    inspect element code

 

 

 

Leave a Reply