The mass power outages on the U.S. west coast provided an opportunity to test out my R 3.4.0 installation and share a quick idiom for turning portions of an XML document into a data frame — generically — using xml2
.
Packages we’ll need:
library(xml2)
library(httr)
library(leaflet)
library(htmltools)
library(widgetframe)
library(tidyverse)
This is where the power outage data is. You can grab this on your own and make your own map whenver you want!
res <- GET("https://m.sce.com/nrc/AOC/AOC_Location_Report.xml")
doc <- content(res, as="parsed", encoding="UTF-8")
Here’s what that document looks like. Scroll around a bit to see that there are definitely groups of nodes we can capture separately but it’d be a royal pain to write extraction code for every set of nodes.
To setup the generic extraction idiom, we first setup two helper functions.
The first is just a shorthand way to extract all targeted nodes into a vector.
xtrct <- function(doc, target) { xml_find_all(doc, target) %>% xml_text() %>% trimws() }
The next one will take a document or node list and a target node then extract the first one and find the children. Then it will use those child node names to extract them each into a separate vector (which becomes a named list) and then turns those into a data frame and makes sane types.
There is a big assumption that all child nodes exist and there aren’t multiple descendents. This can be made more generic, but this idiom handles many, many use-caes.
xtrct_df <- function(doc, top) {
xml_find_first(doc, sprintf(".//%s", top)) %>%
xml_children() %>%
xml_name() %>%
map(~{
xtrct(doc, sprintf(".//%s/%s", top, .x)) %>%
list() %>%
set_names(tolower(.x))
}) %>%
flatten_df() %>%
readr::type_convert()
}
Now, we can use that to extract data frames from that XML document:
county_df <- xtrct_df(doc, "COUNTY")
city_df <- xtrct_df(doc, "CITY")
zipcode_df <- xtrct_df(doc, "ZIPCODE")
district_df <- xtrct_df(doc, "DISTRICT")
sector_df <- xtrct_df(doc, "SECTOR")
incident_df <- xtrct_df(doc, "INCIDENT")
We can peek at a couple of them:
glimpse(county_df)
## Observations: 9
## Variables: 5
## $ county_name <chr> "Kern", "Los Angeles", "Not Available", "Ora...
## $ nbr_incidents <int> 1, 19, 1, 5, 5, 8, 1, 5, 2
## $ nbr_cust_affected <int> 54, 843, 0, 314, 126, 137, 19, 88, 26
## $ centroid_x <dbl> -118.7291, -118.3511, NA, -117.7793, -115.99...
## $ centroid_y <dbl> 35.34358, 33.81972, NA, 33.21832, 33.74648, ...
glimpse(incident_df)
## Observations: 47
## Variables: 23
## $ incident_id <int> 116586404, 116552979, 116558720, 1165618...
## $ incident_type <chr> "QI", "PO", "PO", "PO", "PO", "PO", "PO"...
## $ fac_job_status_cd <chr> "W", "D", "D", "D", "D", "D", "D", "W", ...
## $ oan_no <int> 820783, 818185, 818762, 818958, 819061, ...
## $ outage_start_datetime <chr> "4/21/2017 12:13:56 PM", "4/21/2017 8:06...
## $ version_dt <chr> "4/21/2017 1:34:09 PM", "4/21/2017 1:34:...
## $ last_chng_datetime <chr> "4/21/2017 12:48:55 PM", "4/21/2017 9:00...
## $ est_clu_datetime <chr> "4/21/2017 3:00:00 PM", "4/21/2017 5:00:...
## $ memo_cause_cd <chr> "16", "07", "07", "07", "07", "07", "07"...
## $ memo_cause_cd_desc <chr> "Equipment Problems", "Upgrading Equipme...
## $ crew_status_cd <chr> "04", "04", "04", "04", "04", "04", "04"...
## $ crew_status_cd_desc <chr> "Work is in progress.", "Work is in prog...
## $ result_cd <chr> "0", "01", "01", "01", "01", "01", "01",...
## $ result_cd_desc <chr> "Not Available", "Enhancing Reliability"...
## $ nbr_cust_affected <int> 12, 124, 23, 1, 106, 66, 50, 17, 5, 21, ...
## $ zip_code <chr> "92553", "92630", "90043", "93563", "928...
## $ county_name <chr> "Riverside", "Orange", "Los Angeles", "L...
## $ city_name <chr> "Moreno Valley", "Lake Forest", "View Pa...
## $ district_no <int> 77, 43, 44, 73, 48, 42, 27, 32, 39, 39, ...
## $ sector_no <chr> "VALLEY", "ELLIS", "EL NIDO", "LUGO", "V...
## $ ert_cd <chr> "06", "06", "07", "06", "06", "07", "06"...
## $ centroid_x <dbl> -117.2741, -117.6956, -118.3328, -117.68...
## $ centroid_y <dbl> 33.92560, 33.62584, 33.99711, 34.37945, ...
And, now we’ll use Leaflet to make our own outage map:
incident_df %>%
mutate(popup_text =
sprintf("%s<br/><br/>%s / %s County<br/><br/><b>Customers impacted:</b> %s<br/><b>Cause:</b> %s<br/><b>Status:</b> %s",
htmlEscape(outage_start_datetime),
htmlEscape(city_name),
htmlEscape(county_name),
htmlEscape(nbr_cust_affected),
htmlEscape(memo_cause_cd_desc),
htmlEscape(crew_status_cd_desc))) %>%
rename(lat=centroid_y, lng=centroid_x) %>%
leaflet(height="600px") %>%
addProviderTiles(providers$Esri.WorldStreetMap) %>%
addCircleMarkers(weight=1, radius=~sqrt(nbr_cust_affected)*2, color = "#bd0026",
fillColor="#fc4e2a", opacity=1, fillOpacity=0.5, popup = ~popup_text)%>%
setView(mean(range(incident_df$centroid_x)),
mean(range(incident_df$centroid_y)), zoom=7) %>%
frameWidget()