Skip navigation

(If you don’t know what XML is, you should probably [read a primer](https://en.wikipedia.org/wiki/XML) before reading this post,)

When working with data, one inevitably comes across things encoded in XML. I’m in the “anti-XML” camp, but deal with my fair share of XML in “cyber” and help out enough people who have to work with XML that I’ve become pretty proficient when slicing & dicing it.

R has two main packages to deal with XML: the original `XML` package and the more lightweight and modern `xml2` package. If you really need all the power of `libxml2` (the C library that powers both packages) or are _creating_ XML from R, then you probably know your way around the `XML` package and are pretty self-sufficient.

Most folks can get by with the `xml2` package if their goal is to work with XML data. By “work with” I mean read in files or data from APIs that come in XML format and have to find nuggets of gold in between all those `<` and `>` tags. To do so requires finding what you need and that means using a query language called `XPath` to pinpoint the node(s) you are after. Working with `XPath` can be pretty daunting for those who went to school to ultimately cure diseases, build high-performing stock portfolios, target advertising to everyone or perform a host of other real work. Becoming an expert in `XPath` was not something on the bucket list but to work with XML you will need to be familiar with it.

The [`xmlview`](https://github.com/hrbrmstr/xmlview) package provides a way to visually inspect XML and interactively test out `XPath` expressions. It’s as simple to use as:

devtools::install_github("ramnathv/htmlwidgets") # we use some bleeding edge features
devtools::install_github("hrbrmstr/xmlview")
library(xml2)
library(xmlview)
 
# plain text XML
xml_view("<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>")
 
# read-in XML document
doc <- read_xml("http://www.npr.org/rss/rss.php?id=1001")
xml_view(doc, add_filter=TRUE)

(There’s also an experimental `xml_tree_view()` in there by @timelyportfolio that we’ll be adding features to at a pretty rapid pace.)

Here’s a screenshot of it in action:

RStudioScreenSnapz003

There are options to change the CSS styling for the formatted code. Yep, it will format and highlight XML for you so it’s easier to work with. There’s an animated gif of a screencast over [on github](https://github.com/hrbrmstr/xmlview) as well.

Once you perfect your `XPath` expression, hit the “R” button and it will generate the code you can copy back into RStudio. It understands namespaces but try not to stuff a huge XML document in there as browsers don’t work well with large data elements (the viewer is an `htmlwidget` and is, hence, browser-based).

It works with plain character XML/HTML, and many `xml2` data types. I have no current plans for `XML` package object support but toss up an issue on github if you really need it (or, better yet, a PR). If there are other desired features (especially from educators), please post a request in github issue as well.

Watch for more features in the coming weeks and a CRAN release once the bleeding edge `htmlwidgets` packages makes it to CRAN.

One Trackback/Pingback

  1. […] article was first published on R – rud.is, and kindly contributed to […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.