One of my most popular blog posts — 24,000 reads — in the old, co-mingled site was a short snippet on how to strip HTML tags from a block of content in Objective-C. It’s been used by many-an-iOS developer (which was the original intent).
An intrepid reader & user (“Brian” – no other attribution available) found a memory leak that really rears it’s ugly head when parsing large-content blocks. The updated code is below (with the original post text) and also in the comments on the old site. If Brian reads this, please post full attribution info in the comments or to @hrbrmstr so I can give you proper credit.
I needed to strip the tags from some HTML that was embedded in an XML feed so I could display a short summary from the full content in a UITableView
. Rather than go through the effort of parsing HTML on the iPhone (as I already parsed the XML file) I built this simple method from some half-finished snippets I found. It has worked in all of the cases I have needed, but your mileage may vary. It is at least a working method (which cannot be said about most of the other examples). It works both in iOS (iPhone/iPad) and in plain-old OS X code, too.
– (NSString *) stripTags:(NSString *)str {
NSMutableString *html = [NSMutableString stringWithCapacity:[str length]];
NSScanner *scanner = [NSScanner scannerWithString:str];
NSString *tempText = nil;
while (![scanner isAtEnd]) {
[scanner scanUpToString:@"<" intoString:&tempText];
if (tempText != nil)
[html appendString:tempText];
[scanner scanUpToString:@">" intoString:NULL];
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation] + 1];
tempText = nil;
}
return html ;
}