Skip navigation

Tag Archives: Google

I usually take a peek at the Internet Traffic Report (ITR) a couple times a day as part of my routine and was a bit troubled by all of the red today:

I wanted to do some crunching on the data, and I deliberately do not have Word or Excel on my new MacBook Pro (for reasons I can detail if asked). A SELECT / CUT / PASTE into TextWrangler did not really thrill me and I knew there had to be a way to get non-marked-up, columnar data into a format I could mangle and share easily.

Enter, Google Shreadsheet’s importHTML function.

If you don’t have the forumla bar enabled in Google Spreadsheets, just go to View->Formula Bar to enable it. Once there, enter the following in the formula bar to get the data from the ITR into a set of columns that will auto-update every time you reference the sheet.

=importHTML("http://www.internettrafficreport.com/namerica.htm","table",0)

(as you can see, it’s not case sensitive, either)

Yes, I know Excel can do this. I could have done a quick script whack the pasted data in TextWrangler. You can do something similar in R with htmlTreeParse + xpathApply and Perl has HTML::TableContentParser (and other handy modules), but this was a fast, easy way to get me to a point where I could do the basic analytics I wanted to perform (and, sometimes, all you need is quick & easy).

Official Google Help page on importHTML.

Feedburner has borked the old RSS feed for the site and has completely disassociated me from it (meaning it’s no longer in my Google Feedburner admin options and they won’t let me re-claim it).

So… the new feed link is http://rud.is/b/feed/atom/.

Apologies for any inconvenience.

As you can probably tell from a previous post, I’m not a fan of paywalls—especially poorly implemented ones. Clicking on a link in an RSS feed post and having it land on a page, only to have it smothered in an HTML layer or — in the following case — promptly redirected to “Pay up, buddy!” sites is frustrating at best. I’ll gladly debate the efficacy of paywalls vs other means of generating revenue in another post (or even in the comments, if civil). I primarily wanted to write this post to both show the silliness of the implementation of Foster’s Daily Democrat’s paywall and point out a serious deficiency in Chrome.

First up, lame paywall. You get three free direct story link visits prior to be asked to register and eventually pay for content. NOTE: You could just be going to the same story three times (say, after a browser crash) and each counts as a visit. After those visits, you have to register and give up what little anonymity you have on the Internet to be able to view up to an additional ten free direct story links before then being forced to pay up. If you are a print subscriber, you do get access for “free”, but there’s that tracking thing again. Foster’s uses a service called Clickshare to handle the subscription and tracking. Just how many places do you need to have your data stored/tracked just to read a (most likely) mediocre piece of news?

The paywall setup is accomplished by a simple “Meta Refresh” tag. In its most basic form, it is an instruction that tells the browser to load a particular URL after a certain amount of time. In the case of Foster’s paywall, the HTML tag/directive looks like this:

[code lang=”html”]<meta http-equiv="refresh" content="0;url=https://home.fosters.com/clickshare/authenticateUserSubscription.do?CSAuthReq=1&CSTargetURL=…"/>[/code]

It’s telling your browser to double-check with their Clickshare code immediately after teasing you with the article content. And, it’s easy to circumvent. Mostly. The problem is, I’m a Chrome user 99% of the time and Google has not seen fit to allow control over the meta refresh directive. To jump the paywall, you’ll need to fire up Firefox. And enter “about:config” in the location bar (and click through the warning message).

Once there, filter for “refresh”, find the setting for “blockautorefresh” and set it to “true“.

Now, every time you visit a web site that attempts to auto-refresh full browser pages, you’ll see a warning (with the option to allow the action):

Why Chrome has not implemented a way to control this is beyond me. Since Safari also has no ability to control this setting, it may have something to do with the webkit core that both browsers are based on.

This doesn’t stop the frustration with the RSS-click-to-read and it doesn’t help iOS/Android users, but it does provide a means help keep a bit of anonymity (if you also use other extensions and controls) and should force these sites to kick their paywall game up a notch.

One of my subdomains is for mail and I was using an easy DNS hack to point it to my hosted Gmail setup (just create a CNAME pointing to ghs.google.com). This stopped working for some folks this week and I’ve had no time to debug exactly why so I decided to go back to a simple HTTP 301 redirect to avoid any glitches (for whatever reason) in the future – or, at least ensure the glitches were due to any ineptness on my part. Unfortunately, this created an interesting problem that I had not foreseen.

I started playing with Strict Transport Security (HSTS) a while ago and – for kicks & some enhanced WordPress & Drupal cookie security – moved a couple domains to it. I neglected to actually pay for a cert that would give me wildcard subdomain usage and only put in a couple domains for the cert request. I neglected to put the mail one in and that caused Chrome to not honor the redirect due to the certificate not being valid for the mail domain.

I tweaked theStrict-Transport-Security header setting in my nginx config to not include subdomains, but it seems Chrome had already tucked the entry into (on OS X):

[code padlinenumbers=”false” gutter=”false”]~/Library/Application Support/Google/Chrome/Default/TransportSecurity[/code]

and was ignoring the new expiration and subdomain settings I was now sending. Again, no time to research why as I really just needed to get the mail redirect working. I guessed that removing the entry would be the easiest way to bend Chrome to my will but it turns out that it’s not that simple since the browser seems to hash the host value:

[code]"wA9USN1KVIEHgBTF9j2q0wPLlLieQoLrXKheK9lkgl8=": {
"created": 1300919611.230054,
"expiry": 1303563439.443086,
"include_subdomains": true,
"mode": "strict"
},[/code]

(I have no idea which host that is, btw.)

I ended up backing up the TransportSecurity file and removing all entries from it. Any site I visit that has the cookie will re-establish itself and it cleared up the redirect issue. I still need to get a new certificate, but that can wait for another day.

Windows and Linux folk should be able to find that file pretty easily in their home directories if they are experiencing any similar issue. If you can’t find it, drop a note in the comments and I’ll dig out the locations.

If anyone is experiencing the “Are you sure…” problem when trying to upload media to your WordPress blog, it’s usually due to a wonky plugin. In my case it was “Google Analytics 3 codes for WordPress”. Hopefully this helps others who are searching for a resolution to the problem.