I usually take a peek at the Internet Traffic Report (ITR) a couple times a day as part of my routine and was a bit troubled by all of the red today:
I wanted to do some crunching on the data, and I deliberately do not have Word or Excel on my new MacBook Pro (for reasons I can detail if asked). A SELECT / CUT / PASTE into TextWrangler did not really thrill me and I knew there had to be a way to get non-marked-up, columnar data into a format I could mangle and share easily.
Enter, Google Shreadsheet’s importHTML
function.
If you don’t have the forumla bar enabled in Google Spreadsheets, just go to View->Formula Bar
to enable it. Once there, enter the following in the formula bar to get the data from the ITR into a set of columns that will auto-update every time you reference the sheet.
=importHTML("http://www.internettrafficreport.com/namerica.htm","table",0)
(as you can see, it’s not case sensitive, either)
Yes, I know Excel can do this. I could have done a quick script whack the pasted data in TextWrangler. You can do something similar in R with htmlTreeParse
+ xpathApply
and Perl has HTML::TableContentParser
(and other handy modules), but this was a fast, easy way to get me to a point where I could do the basic analytics I wanted to perform (and, sometimes, all you need is quick & easy).