Skip navigation

Tag Archives: HTML

I usually take a peek at the Internet Traffic Report (ITR) a couple times a day as part of my routine and was a bit troubled by all of the red today:

I wanted to do some crunching on the data, and I deliberately do not have Word or Excel on my new MacBook Pro (for reasons I can detail if asked). A SELECT / CUT / PASTE into TextWrangler did not really thrill me and I knew there had to be a way to get non-marked-up, columnar data into a format I could mangle and share easily.

Enter, Google Shreadsheet’s importHTML function.

If you don’t have the forumla bar enabled in Google Spreadsheets, just go to View->Formula Bar to enable it. Once there, enter the following in the formula bar to get the data from the ITR into a set of columns that will auto-update every time you reference the sheet.

=importHTML("http://www.internettrafficreport.com/namerica.htm","table",0)

(as you can see, it’s not case sensitive, either)

Yes, I know Excel can do this. I could have done a quick script whack the pasted data in TextWrangler. You can do something similar in R with htmlTreeParse + xpathApply and Perl has HTML::TableContentParser (and other handy modules), but this was a fast, easy way to get me to a point where I could do the basic analytics I wanted to perform (and, sometimes, all you need is quick & easy).

Official Google Help page on importHTML.

As you can probably tell from a previous post, I’m not a fan of paywalls—especially poorly implemented ones. Clicking on a link in an RSS feed post and having it land on a page, only to have it smothered in an HTML layer or — in the following case — promptly redirected to “Pay up, buddy!” sites is frustrating at best. I’ll gladly debate the efficacy of paywalls vs other means of generating revenue in another post (or even in the comments, if civil). I primarily wanted to write this post to both show the silliness of the implementation of Foster’s Daily Democrat’s paywall and point out a serious deficiency in Chrome.

First up, lame paywall. You get three free direct story link visits prior to be asked to register and eventually pay for content. NOTE: You could just be going to the same story three times (say, after a browser crash) and each counts as a visit. After those visits, you have to register and give up what little anonymity you have on the Internet to be able to view up to an additional ten free direct story links before then being forced to pay up. If you are a print subscriber, you do get access for “free”, but there’s that tracking thing again. Foster’s uses a service called Clickshare to handle the subscription and tracking. Just how many places do you need to have your data stored/tracked just to read a (most likely) mediocre piece of news?

The paywall setup is accomplished by a simple “Meta Refresh” tag. In its most basic form, it is an instruction that tells the browser to load a particular URL after a certain amount of time. In the case of Foster’s paywall, the HTML tag/directive looks like this:

[code lang=”html”]<meta http-equiv="refresh" content="0;url=https://home.fosters.com/clickshare/authenticateUserSubscription.do?CSAuthReq=1&CSTargetURL=…"/>[/code]

It’s telling your browser to double-check with their Clickshare code immediately after teasing you with the article content. And, it’s easy to circumvent. Mostly. The problem is, I’m a Chrome user 99% of the time and Google has not seen fit to allow control over the meta refresh directive. To jump the paywall, you’ll need to fire up Firefox. And enter “about:config” in the location bar (and click through the warning message).

Once there, filter for “refresh”, find the setting for “blockautorefresh” and set it to “true“.

Now, every time you visit a web site that attempts to auto-refresh full browser pages, you’ll see a warning (with the option to allow the action):

Why Chrome has not implemented a way to control this is beyond me. Since Safari also has no ability to control this setting, it may have something to do with the webkit core that both browsers are based on.

This doesn’t stop the frustration with the RSS-click-to-read and it doesn’t help iOS/Android users, but it does provide a means help keep a bit of anonymity (if you also use other extensions and controls) and should force these sites to kick their paywall game up a notch.

Brett Stone-Gross
Ryan Abman
Richard A. Kemmerer
Christopher Kruegel
Douglas G Steigerwald

Presentation [PDF]

Twitter transcript

#weis2011 presenting analysis of *actual* data from 21 servers from 3 multi-million $ fake a/v ops!!! < #spiffy #weis2011 showing example of fake a/v exploit that was embedded in HTML. good walkthrough. useful slides for an orgs tech ed/brown bag sessn #weis2011 good/succinct survey of techniques blackhat seo, annoying popups, preying on user naivete. #weis2011 great graphic on the flow of the money trail in fake a/v. Brett & his colleagues paid attention to detail. #weis2011 talking about affiliate programs (think amazon associates but for bad guys) & webmoney (evil bitcoins). #weis2011 189K sales; $11mil in 3mos!! 8.4m installs. conversion rate 2.4% (wow). if it had not been stopped, fy net $ wld be 45mil! #weis2011 comparing campaigns & operations. the choice in malicious hosting provider is key. downtime reduces profits. #timeforMalCloud? #weis2011 fake a/v providers actually give refunds to help avoid bank fraud detection. Refund rates between 3-9%. #weis2011 now showing their economic statistical models (and plugging real data into them) and the back-end infrastructure that runs the biz #weis2011 (me) the bad guys have better metrics, better partnerships & rely on naivete of users. the good guys don't share anything w/anyone #weis2011 the threshold for payment processors to terminate a bad account is when bad transactions (chargbacks) hit 10%. virt no incentive

One of my most popular blog posts — 24,000 reads — in the old, co-mingled site was a short snippet on how to strip HTML tags from a block of content in Objective-C. It’s been used by many-an-iOS developer (which was the original intent).

An intrepid reader & user (“Brian” – no other attribution available) found a memory leak that really rears it’s ugly head when parsing large-content blocks. The updated code is below (with the original post text) and also in the comments on the old site. If Brian reads this, please post full attribution info in the comments or to @hrbrmstr so I can give you proper credit.

I needed to strip the tags from some HTML that was embedded in an XML feed so I could display a short summary from the full content in a UITableView. Rather than go through the effort of parsing HTML on the iPhone (as I already parsed the XML file) I built this simple method from some half-finished snippets I found. It has worked in all of the cases I have needed, but your mileage may vary. It is at least a working method (which cannot be said about most of the other examples). It works both in iOS (iPhone/iPad) and in plain-old OS X code, too.

– (NSString *) stripTags:(NSString *)str {

NSMutableString *html = [NSMutableString stringWithCapacity:[str length]];

NSScanner *scanner = [NSScanner scannerWithString:str];
NSString *tempText = nil;

while (![scanner isAtEnd]) {

[scanner scanUpToString:@"<" intoString:&tempText];

if (tempText != nil)
[html appendString:tempText];

[scanner scanUpToString:@">" intoString:NULL];

if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation] + 1];

tempText = nil;

}

return html ;

}

Those were the words that greeted me within five minutes of checking out the Flask microframework for Python web applications. I feel compelled to inline those four, short paragraphs:

I’m not joking. Well, maybe a little. If you write a web application, you are probably allowing users to register and leave their data on your server. The users are entrusting you with data. And even if you are the only user that might leave data in your application, you still want that data to be stored securely.

Unfortunately, there are many ways the security of a web application can be compromised. Flask protects you against one of the most common security problems of modern web applications: cross-site scripting (XSS). Unless you deliberately mark insecure HTML as secure, Flask and the underlying Jinja2 template engine have you covered. But there are many more ways to cause security problems.

The documentation will warn you about aspects of web development that require attention to security. Some of these security concerns are far more complex than one might think, and we all sometimes underestimate the likelihood that a vulnerability will be exploited, until a clever attacker figures out a way to exploit our applications. And don’t think that your application is not important enough to attract an attacker. Depending on the kind of attack, chances are that automated bots are probing for ways to fill your database with spam, links to malicious software, and the like.

So always keep security in mind when doing web development.

Let’s look at the key take-away messages…

Data Should Be Stored Securely

Interestingly enough, this is not the default mindset of one of the more popular modern database technologies [mongoDB] (and it has plenty of company [memcached], too).

Even if your app starts out without any real sensitive data, odds are you will be storing credentials, e-mail addresses, social network handles and other bits of information that you should feel some fundamental responsibility to treat with care. There are somemcached manymysql resourcesoracle tocouchdb helpsqlite that you really have no excuse.

And, it will save you time later on when you realize you actually need to have a secure storage foundation.

Watch The Input To Your Apps

Flask protects you against one of the most common security problems of modern web applications: cross-site scripting (XSS). There are many others. If you are a programmer and have never even heard of OWASP, then you need to put down your PS3/Xbox controller and do a quick read on at least their take on the top ten web app security risks (btw: there are way more than ten, but you need to start somewhere).

The thing is, unless the halls of higher education have crumbled completely since I was in school, I distinctly remember having the concept of input validation, bounds checking, etc. being rammed into my thick skull in almost every programming class (and this was way before web apps were even contemplated). You may think you’re innovating by posting a link to your functioning rapid prototype on Hacker News, but what you’re really doing is being sloppy, lazy and irresponsible. Period.

And, while it’s fine to seek out frameworks like Flask and rely on some of their inherent protections, it does not absolve you from your responsibility to deliberately & consciously build rugged software (which doesn’t just mean “secure”).

“Don’t think that your application is not important enough to attract an attacker”

I’m not sure if any amount of verbiage will convince someone of this fact if they are determined not to believe/accept it. It’s a much larger discussion (and this is already a long post). If you are inclined to have a slightly open mind, I encourage you to read So You Think Your Website Won’t Get Hacked by Joseph Schembr. It’s really slanted towards “script-kiddies,” but should pique your interest enough to keep exploring why your hacked-up personal URL shortener might be a target.

Fin

It’s impressive that the Flask authors cover security in some way, shape or form on 21 pages in the documentation [PDF]. If you’re building or contributing to other frameworks, projects or engines (hint, hint, Node.JS devs!) I would strongly encourage you to take as much time and consideration as the Flask team did to ensure you are making it as easy as possible for your users to deploy applications as securely as possible by default.

Security

  • VSR uses some high-ish profile attacks from 2010 to provide fodder for the VAR community :: Security Risk: Top Hacker Attacks of 2010. I include it as the examples they provide should make it easier for folks doing presentations where they need to show real-life attacks (without sifting through the individual entries at the various data breach web site databases). [Vertical Systems Reseller]

Windows

  • Windows 7/2008 SP1 looms large. OEMs, VLCs & MSDN/TechNet subscribers get it on February 16th and the rest of the masses can give it a go on February 22nd. It looks like it has a decidedly enterprise-y focus, but one can hope it continues Microsoft on the path to robust desktop & server experiences :: Announcing The Availability of Windows 7 and Windows Server R2 SP1 [Microsoft]
  • Autoruns – the ability to automatically perform tasks when certain devices are made available to Window systems (e.g. USB sticks) – are a boon to malware writers. While Microsoft has somewhat mitigated the threat they pose in more modern versions of their operating systems, it can be tricky to make older systems safe. With the latest round of Patch Tuesday updates, they included a way to disable Autoruns in older systems. W00t! Microsoft Update Offers an Easier Way to Turn off Autoruns [PC World]
  • Succinct and informative article by Chris Sanders on how to determine if your systems is being actively compromised. Chock full of screen shots and examples of what to look for. While not exactly aimed at the general Windows community, it does provide a solid introduction to core tools that technically-inclined users should make room for in their toolboxes :: http://www.windowsecurity.com/articles/Determining-You-Actively-Being-Compromised.html [WindowsSecurity.com]

Programming

  • Pageforest helps you ship complete web applications without having to write any server-side code. You can build your application using HTML[5], CSS & javascript and the Pageforest service provides application hosting, user authentication & data storage. You only use client-side javascript and are free to include jQuery, Prototype or any other frameworks that you need to include in your app. Hosting is currently free and the site includes a full IDE to help you get started coding :: A Pure JavaScript Web Application Platform [pageforest.com]

Security

Programming

Interesting points/counterpoints on the efficacy of Node.js being tied so closely to the V8 javascript engine:

HTML5

I wanted to play with the AwesomeChartJS library and figured an interesting way to do that was to use it to track Microsoft Security Bulletins this year. While I was drawn in by just how simple it is to craft basic charts, that simplicity really only makes it useful for simple data sets. So, while I’ve produced three diferent views of Microsoft Security Bulletins for 2011 (to-date, and in advance of February’s Patch Tuesday), it would not be a good choice to do a running comparison between past years and 20111 (per-month).  The authors self-admit that there are [deliberate] limitations and point folks to the most excellent flot library for more sophisticated analytics (which I may feature in March).

The library itself only works within an HTML5 environment (one of the reasons I chose it) and uses a separate <canvas> element to house each chart. After loading up the library iself in a script tag:

<script src="/b/js/AwesomeChartJS/awesomechart.js" type="application/javascript">

(which is ~32K un-minified) you then declare a canvas element:

<canvas id="canvas1" width="400" height="300"></canvas>


and use some pretty straighforward javascript (no dependency on jQuery or other large frameworks) to do the drawing:

var mychart = new AwesomeChart('canvas1');
mychart.title = "Microsoft Security Bulletins Raw Count By Month - 2011";
mychart.data = [2, 12];
mychart.colors = ["#0000FF","#0000FF"];
mychart.labels = ["January", "February"];
mychart.draw();

It’s definitely worth a look if you have simple charting needs.

Regrettably, it looks like February is going to be a busy month for Windows administrators.

Your web-browser does not support the HTML 5 canvas element.

Your web-browser does not support the HTML 5 canvas element.

Your web-browser does not support the HTML 5 canvas element.