I usually take a peek at the Internet Traffic Report (ITR) a couple times a day as part of my routine and was a bit troubled by all of the red today:
data:image/s3,"s3://crabby-images/8d332/8d332a2c2b0cfd35a897e49e12803c3b7569b8c9" alt="Details for North America ___ Internet Traffic Report"
I wanted to do some crunching on the data, and I deliberately do not have Word or Excel on my new MacBook Pro (for reasons I can detail if asked). A SELECT / CUT / PASTE into TextWrangler did not really thrill me and I knew there had to be a way to get non-marked-up, columnar data into a format I could mangle and share easily.
Enter, Google Shreadsheet’s importHTML
function.
If you don’t have the forumla bar enabled in Google Spreadsheets, just go to View->Formula Bar
to enable it. Once there, enter the following in the formula bar to get the data from the ITR into a set of columns that will auto-update every time you reference the sheet.
=importHTML("http://www.internettrafficreport.com/namerica.htm","table",0)
data:image/s3,"s3://crabby-images/bf060/bf06088ee639d44f5b5819130ed72b9ddf57510e" alt="Internet Traffic Report"
(as you can see, it’s not case sensitive, either)
Yes, I know Excel can do this. I could have done a quick script whack the pasted data in TextWrangler. You can do something similar in R with htmlTreeParse
+ xpathApply
and Perl has HTML::TableContentParser
(and other handy modules), but this was a fast, easy way to get me to a point where I could do the basic analytics I wanted to perform (and, sometimes, all you need is quick & easy).
2 Trackbacks/Pingbacks
[…] setup a Google Drive spreadsheet to use the importHTML formula to read in the closings table on the […]
[…] setup a Google Drive spreadsheet to use the importHTML formula to read in the closings table on the […]