Researching “the internet” (i.e. $DAYJOB) means having to deal with a ton of “unique” (I’m being kind) data formats. This is ultimately a tale of how I performed full-text searches across one of them. It all started off innocently enough. This past week I need to be able to do full-text searches across metadata about… Continue reading
Post Category → R
Wicked Fast, Accurate Quantiles Using ‘t-Digests’ in R with the {tdigest} Package
@ted_dunning recently updated the t-Digest algorithm he created back in 2013. What is this “t-digest”? Fundamentally, it is a probabilistic data structure for estimating any percentile of distributed/streaming data. Ted explains it quite elegantly in this short video: Said video has a full transcript as well. T-digests have been baked into many “big data” analytics… Continue reading
Rome Was Not Built In A Day But widgetcard Was!
I saw a second post on turning htmlwidgets into interactive Twitter Player cards and felt somewhat compelled to make creating said entities a bit easier so posited the following: Wld this be useful packaged up, #rstats?https://t.co/sfqlWnEeJVhttps://t.co/troKzmzTNv (TLDR/V: Single function to turn an HTML widget into a deployable interactive Twitter card) pic.twitter.com/uahB52YfE2 — boB Rudis (@hrbrmstr)… Continue reading
Assumptions Matter More Than Dependencies
There’s been alot of talk about “dependencies” in the R universe of late. This is not really a post about that but more of a “really, don’t do this” if you decide you want to poke the dependency bear by trying to build a deeply flawed model off of CRAN package metadata. CRAN packages undergo… Continue reading
Handling & Sharing PCAPs Like a Boss with PacketTotal
The fine folks over at @PacketTotal bequeathed an API token on me so I cranked out an R package for it to enable more dynamic investigations work (RStudio makes for an amazing incident responder investigations console given that you can script in multiple languages, code in C[++], and write documentation all at the same time… Continue reading
Collecting Content Security Policy Violation Reports in S3 (‘Effortlessly’/’Freely’)
In the previous post I tried to explain what Content Security Policies (CSPs) are and how to work with them in R. In case you didn’t RTFPost the TLDR is that CSPs give you control over what can be loaded along with your web content and can optionally be configured to generate a violation report… Continue reading
Head’s Up! Roll Your Own HTTP Headers Investigations with the ‘hdrs’ Package
I blathered alot about HTTP headers in the last post. In the event you wanted to dig deeper I threw together a small package that will let you grab HTTP headers from a given URL and take a look at them. The README has examples for most things but we’ll go through a bit of… Continue reading
CRAN Mirror “Security”
In the “Changes on CRAN” section of the latest version of the The R Journal (Vol. 10/2, December 2018) had this short blurb entitled “CRAN mirror security”: Currently, there are 100 official CRAN mirrors, 68 of which provide both secure downloads via ‘https’ and use secure mirroring from the CRAN master (via rsync through ssh… Continue reading