Google Spreadsheet “importHTML” Rocks For Quick Analytics

I usually take a peek at the Internet Traffic Report (ITR) a couple times a day as part of my routine and was a bit troubled by all of the red today:

I wanted to do some crunching on the data, and I deliberately do not have Word or Excel on my new MacBook Pro (for reasons I can detail if asked). A SELECT / CUT / PASTE into TextWrangler did not really thrill me and I knew there had to be a way to get non-marked-up, columnar data into a format I could mangle and share easily.

Enter, Google Shreadsheet’s importHTML function.

If you don’t have the forumla bar enabled in Google Spreadsheets, just go to View->Formula Bar to enable it. Once there, enter the following in the formula bar to get the data from the ITR into a set of columns that will auto-update every time you reference the sheet.

=importHTML("http://www.internettrafficreport.com/namerica.htm","table",0)

(as you can see, it’s not case sensitive, either)

Yes, I know Excel can do this. I could have done a quick script whack the pasted data in TextWrangler. You can do something similar in R with htmlTreeParse + xpathApply and Perl has HTML::TableContentParser (and other handy modules), but this was a fast, easy way to get me to a point where I could do the basic analytics I wanted to perform (and, sometimes, all you need is quick & easy).

Official Google Help page on importHTML.

Cover image from Data-Driven Security
Amazon Author Page

2 Comments Google Spreadsheet “importHTML” Rocks For Quick Analytics

  1. Pingback: Easier HTML Table-scraping For Scripts With Google Drive | rud.is

  2. Pingback: Easier HTML Table-scraping For Scripts With Google Drive | rud.is

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.