Exploring 2018 R-bloggers & R Weekly Posts with Feedly & the 'seymour' package

Well, 2018 has flown by and today seems like an appropriate time to take a look at the landscape of R bloggerdom as seen through the eyes of readers of R-bloggers and R Weekly. We’ll do this via a new package designed to make it easier to treat Feedly as a data source: seymour [GL | GH] (which is a pun-ified name based on a well-known phrase from Little Shop of Horrors).

The seymour package builds upon an introductory Feedly API blog post from back in April 2018 and covers most of the “getters” in the API (i.e. you won’t be adding anything to or modifying anything in Feedly through this package unless you PR into it with said functions). An impetus for finally creating the package came about when I realized that you don’t need a Feedly account to use the search or stream endpoints. You do get more data back if you have a developer token and can also access your own custom Feedly components if you have one. If you are a “knowledge worker” and do not have a Feedly account (and, really, a Feedly Pro account) you are missing out. But, this isn’t a rah-rah post about Feedly, it’s a rah-rah post about R! Onwards!

Feeling Out The Feeds

There are a bunch of different ways to get Feedly metadata about an RSS feed. One easy way is to just use the RSS feed URL itself:

library(seymour) # git[la|hu]b/hrbrmstr/seymour
library(hrbrthemes) # git[la|hu]b/hrbrmstr/hrbrthemes
library(lubridate)
library(tidyverse)

r_bloggers <- feedly_feed_meta("http://feeds.feedburner.com/RBloggers")
r_weekly <- feedly_feed_meta("https://rweekly.org/atom.xml")
r_weekly_live <- feedly_feed_meta("https://feeds.feedburner.com/rweeklylive")

glimpse(r_bloggers)
## Observations: 1
## Variables: 14
## $ feedId      <chr> "feed/http://feeds.feedburner.com/RBloggers"
## $ id          <chr> "feed/http://feeds.feedburner.com/RBloggers"
## $ title       <chr> "R-bloggers"
## $ subscribers <int> 24518
## $ updated     <dbl> 1.546227e+12
## $ velocity    <dbl> 44.3
## $ website     <chr> "https://www.r-bloggers.com"
## $ topics      <I(list)> data sci....
## $ partial     <lgl> FALSE
## $ iconUrl     <chr> "https://storage.googleapis.com/test-site-assets/X...
## $ visualUrl   <chr> "https://storage.googleapis.com/test-site-assets/X...
## $ language    <chr> "en"
## $ contentType <chr> "longform"
## $ description <chr> "Daily news and tutorials about R, contributed by ...

glimpse(r_weekly)
## Observations: 1
## Variables: 13
## $ feedId      <chr> "feed/https://rweekly.org/atom.xml"
## $ id          <chr> "feed/https://rweekly.org/atom.xml"
## $ title       <chr> "RWeekly.org - Blogs to Learn R from the Community"
## $ subscribers <int> 876
## $ updated     <dbl> 1.546235e+12
## $ velocity    <dbl> 1.1
## $ website     <chr> "https://rweekly.org/"
## $ topics      <I(list)> data sci....
## $ partial     <lgl> FALSE
## $ iconUrl     <chr> "https://storage.googleapis.com/test-site-assets/2...
## $ visualUrl   <chr> "https://storage.googleapis.com/test-site-assets/2...
## $ contentType <chr> "longform"
## $ language    <chr> "en"

glimpse(r_weekly_live)
## Observations: 1
## Variables: 9
## $ id          <chr> "feed/https://feeds.feedburner.com/rweeklylive"
## $ feedId      <chr> "feed/https://feeds.feedburner.com/rweeklylive"
## $ title       <chr> "R Weekly Live: R Focus"
## $ subscribers <int> 1
## $ updated     <dbl> 1.5461e+12
## $ velocity    <dbl> 14.7
## $ website     <chr> "https://rweekly.org/live"
## $ language    <chr> "en"
## $ description <chr> "Live Updates from R Weekly"

Feedly uses some special terms, one of which (above) is velocity. “Velocity” is simply the average number of articles published weekly (Feedly’s platform updates that every few weeks for each feed). R-bloggers has over 24,000 Feedly subscribers so any post-rankings we do here should be fairly representative. I included both the “live” and the week-based R Weekly feeds as I wanted to compare post coverage between R-bloggers and R Weekly in terms of raw content.

On the other hand, R Weekly’s “weekly” RSS feed has less than 1,000 subscribers. WAT?! While I have mostly nothing against R-bloggers-proper I heartily encourage ardent readers to also subscribe to R Weekly and perhaps even consider switching to it (or at least adding the individual blog feeds they monitor to your own Feedly). It wasn’t until the Feedly API that I had any idea of how many folks were really viewing my R blog posts since we must provide a full post RSS feed to R-bloggers and get very little in return (at least in terms of data). R Weekly uses a link counter but redirects all clicks to the blog author’s site where we can use logs or analytics platforms to measure engagement. R Weekly is also run by a group of volunteers (more eyes == more posts they catch!) and has a Patreon where the current combined weekly net is likely not enough to buy each volunteer a latte. No ads, a great team and direct engagement stats for the community of R bloggers seems like a great deal for $1.00 USD. If you weren’t persuaded by the above rant, then perhaps at least consider installing this (from source that you control).

Lastly, I believe I’m that “1” subscriber to R Weekly Live O_o. But, I digress.

We’ve got the feedIds (which can be used as “stream” ids) so let’s get cracking!

Binding Up The Posts

We need to use the feedId in calls to feedly_stream() to get the individual posts. The API claims there’s a temporal parameter that allows one to get posts only after a certain date but I couldn’t get it to work (PRs are welcome on any community source code portal you’re most comfortable in if you’re craftier than I am). As a result, we need to make a guess as to how many calls we need to make for two of the three feeds. Basic maths of 44 * 52 / 1000 suggests ~3 should suffice for R Weekly (live) and R-bloggers but let’s do 5 to be safe. We should be able to get R Weekly (weekly) in one go.

r_weekly_wk <- feedly_stream(r_weekly$feedId)

range(r_weekly_wk$items$published) # my preview of this said it got back to 2016!
## [1] "2016-05-20 20:00:00 EDT" "2018-12-30 19:00:00 EST"

# NOTE: If this were more than 3 I'd use a loop/iterator
# In reality, I should make a helper function to do this for you (PRs welcome)

r_blog_1 <- feedly_stream(r_bloggers$feedId)
r_blog_2 <- feedly_stream(r_bloggers$feedId, continuation = r_blog_1$continuation)
r_blog_3 <- feedly_stream(r_bloggers$feedId, continuation = r_blog_2$continuation)

r_weekly_live_1 <- feedly_stream(r_weekly_live$feedId)
r_weekly_live_2 <- feedly_stream(r_weekly_live$feedId, continuation = r_weekly_live_1$continuation)
r_weekly_live_3 <- feedly_stream(r_weekly_live$feedId, continuation = r_weekly_live_2$continuation)

bind_rows(r_blog_1$items, r_blog_2$items, r_blog_3$items) %>% 
  filter(published >= as.Date("2018-01-01")) -> r_blog_stream

bind_rows(r_weekly_live_1$items, r_weekly_live_2$items, r_weekly_live_3$items) %>% 
  filter(published >= as.Date("2018-01-01")) -> r_weekly_live_stream

r_weekly_wk_stream <- filter(r_weekly_wk$items, published >= as.Date("2018-01-01"))

Let’s take a look:

glimpse(r_weekly_wk_stream)
## Observations: 54
## Variables: 27
## $ id                  <chr> "2nIALmjjlFcpPJKakm2k8hjka0FzpApixM7HHu8B0...
## $ originid            <chr> "https://rweekly.org/2018-53", "https://rw...
## $ fingerprint         <chr> "114357f1", "199f78d0", "9adc236e", "63f99...
## $ title               <chr> "R Weekly 2018-53 vroom, Classification", ...
## $ updated             <dttm> 2018-12-30 19:00:00, 2018-12-23 19:00:00,...
## $ crawled             <dttm> 2018-12-31 00:51:39, 2018-12-23 23:46:49,...
## $ published           <dttm> 2018-12-30 19:00:00, 2018-12-23 19:00:00,...
## $ alternate           <list> [<https://rweekly.org/2018-53.html, text/...
## $ canonicalurl        <chr> "https://rweekly.org/2018-53.html", "https...
## $ unread              <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
## $ categories          <list> [<user/c45e5b02-5a96-464c-bf77-4eea75409c...
## $ engagement          <int> 1, 5, 5, 3, 2, 3, 1, 2, 3, 2, 4, 3, 2, 2, ...
## $ engagementrate      <dbl> 0.33, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ recrawled           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ tags                <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL...
## $ content_content     <chr> "<p>Hello and welcome to this new issue!</...
## $ content_direction   <chr> "ltr", "ltr", "ltr", "ltr", "ltr", "ltr", ...
## $ origin_streamid     <chr> "feed/https://rweekly.org/atom.xml", "feed...
## $ origin_title        <chr> "RWeekly.org - Blogs to Learn R from the C...
## $ origin_htmlurl      <chr> "https://rweekly.org/", "https://rweekly.o...
## $ visual_processor    <chr> "feedly-nikon-v3.1", "feedly-nikon-v3.1", ...
## $ visual_url          <chr> "https://github.com/rweekly/image/raw/mast...
## $ visual_width        <int> 372, 672, 1000, 1000, 1000, 1001, 1000, 10...
## $ visual_height       <int> 479, 480, 480, 556, 714, 624, 237, 381, 36...
## $ visual_contenttype  <chr> "image/png", "image/png", "image/gif", "im...
## $ webfeeds_icon       <chr> "https://storage.googleapis.com/test-site-...
## $ decorations_dropbox <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...

glimpse(r_weekly_live_stream)
## Observations: 1,333
## Variables: 27
## $ id                  <chr> "rhkRVQ8KjjGRDQxeehIj6RRIBGntdni0ZHwPTR8B3...
## $ originid            <chr> "https://link.rweekly.org/ckb", "https://l...
## $ fingerprint         <chr> "c11a0782", "c1897fc3", "c0b36206", "7049e...
## $ title               <chr> "Top Tweets of 2018", "My #Best9of2018 twe...
## $ crawled             <dttm> 2018-12-29 11:11:52, 2018-12-28 11:24:22,...
## $ published           <dttm> 2018-12-28 19:00:00, 2018-12-27 19:00:00,...
## $ canonical           <list> [<https://link.rweekly.org/ckb, text/html...
## $ alternate           <list> [<http://feedproxy.google.com/~r/RWeeklyL...
## $ unread              <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, ...
## $ categories          <list> [<user/c45e5b02-5a96-464c-bf77-4eea75409c...
## $ tags                <list> [<user/c45e5b02-5a96-464c-bf77-4eea75409c...
## $ canonicalurl        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ ampurl              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ cdnampurl           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ engagement          <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ summary_content     <chr> "<p>maraaverick.rbind.io</p><img width=\"1...
## $ summary_direction   <chr> "ltr", "ltr", "ltr", "ltr", "ltr", "ltr", ...
## $ origin_streamid     <chr> "feed/https://feeds.feedburner.com/rweekly...
## $ origin_title        <chr> "R Weekly Live: R Focus", "R Weekly Live: ...
## $ origin_htmlurl      <chr> "https://rweekly.org/live", "https://rweek...
## $ visual_url          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ visual_processor    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ visual_width        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ visual_height       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ visual_contenttype  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ decorations_dropbox <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ decorations_pocket  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...

glimpse(r_blog_stream)
## Observations: 2,332
## Variables: 34
## $ id                  <chr> "XGq6cYRY3hH9/vdZr0WOJiPdAe0u6dQ2ddUFEsTqP...
## $ keywords            <list> ["R bloggers", "R bloggers", "R bloggers"...
## $ originid            <chr> "https://datascienceplus.com/?p=19513", "h...
## $ fingerprint         <chr> "2f32071a", "332f9548", "2e6f8adb", "3d7ed...
## $ title               <chr> "Leaf Plant Classification: Statistical Le...
## $ crawled             <dttm> 2018-12-30 22:35:22, 2018-12-30 19:01:25,...
## $ published           <dttm> 2018-12-30 19:26:20, 2018-12-30 13:18:00,...
## $ canonical           <list> [<https://www.r-bloggers.com/leaf-plant-c...
## $ author              <chr> "Giorgio Garziano", "Sascha W.", "Economet...
## $ alternate           <list> [<http://feedproxy.google.com/~r/RBlogger...
## $ unread              <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
## $ categories          <list> [<user/c45e5b02-5a96-464c-bf77-4eea75409c...
## $ entities            <list> [<c("nlp/f/entity/en/-/leaf plant classif...
## $ engagement          <int> 50, 39, 482, 135, 33, 12, 13, 41, 50, 31, ...
## $ engagementrate      <dbl> 1.43, 0.98, 8.76, 2.45, 0.59, 0.21, 0.22, ...
## $ enclosure           <list> [NULL, NULL, NULL, NULL, <c("https://0.gr...
## $ tags                <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL...
## $ recrawled           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ updatecount         <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ content_content     <chr> "<p><div><div><div><div data-show-faces=\"...
## $ content_direction   <chr> "ltr", "ltr", "ltr", "ltr", "ltr", "ltr", ...
## $ summary_content     <chr> "CategoriesAdvanced Modeling\nTags\nLinear...
## $ summary_direction   <chr> "ltr", "ltr", "ltr", "ltr", "ltr", "ltr", ...
## $ origin_streamid     <chr> "feed/http://feeds.feedburner.com/RBlogger...
## $ origin_title        <chr> "R-bloggers", "R-bloggers", "R-bloggers", ...
## $ origin_htmlurl      <chr> "https://www.r-bloggers.com", "https://www...
## $ visual_processor    <chr> "feedly-nikon-v3.1", "feedly-nikon-v3.1", ...
## $ visual_url          <chr> "https://i0.wp.com/datascienceplus.com/wp-...
## $ visual_width        <int> 383, 400, NA, 286, 456, 250, 450, 456, 397...
## $ visual_height       <int> 309, 300, NA, 490, 253, 247, 450, 253, 333...
## $ visual_contenttype  <chr> "image/png", "image/png", NA, "image/png",...
## $ webfeeds_icon       <chr> "https://storage.googleapis.com/test-site-...
## $ decorations_dropbox <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ decorations_pocket  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...

And also check how far into December for each did I get as of this post? (I’ll check again after the 31 and update if needed).

range(r_weekly_wk_stream$published)
## [1] "2018-01-07 19:00:00 EST" "2018-12-30 19:00:00 EST"

range(r_blog_stream$published)
## [1] "2018-01-01 11:00:27 EST" "2018-12-30 19:26:20 EST"

range(r_weekly_live_stream$published)
## [1] "2018-01-01 19:00:00 EST" "2018-12-28 19:00:00 EST"

Digging Into The Weeds Feeds

In the above glimpses there’s another special term, engagement. Feedly defines this as an “indicator of how popular this entry is. The higher the number, the more readers have read, saved or shared this particular entry”. We’ll use this to look at the most “engaged” content in a bit. What’s noticeable from the start is that R Weekly Live has 1,333 entries and R-bloggers has 2,330 entries (so, nearly double the number of entries). Those counts are a bit of “fake news” when it comes to overall unique posts as can be seen by:

bind_rows(
  mutate(r_weekly_live_stream, src = "R Weekly (Live)"),
  mutate(r_blog_stream, src = "R-bloggers")
) %>% 
  mutate(wk = lubridate::week(published)) -> y2018

filter(y2018, title == "RcppArmadillo 0.9.100.5.0") %>% 
  select(src, title, originid, published) %>% 
  gt::gt()

src	title	originid	published
R Weekly (Live)	RcppArmadillo 0.9.100.5.0	https://link.rweekly.org/bg6	2018-08-17 07:55:00
R Weekly (Live)	RcppArmadillo 0.9.100.5.0	https://link.rweekly.org/bfr	2018-08-16 21:20:00
R-bloggers	RcppArmadillo 0.9.100.5.0	https://www.r-bloggers.com/?guid=f8865e8a004f772bdb64e3c4763a0fe5	2018-08-17 08:00:00
R-bloggers	RcppArmadillo 0.9.100.5.0	https://www.r-bloggers.com/?guid=3046299f73344a927f787322c867233b	2018-08-16 21:20:00

Feedly has many processes going on behind the scenes to identify new entries and update entries as original sources are modified. This “duplication” (thankfully) doesn’t happen alot:

count(y2018, src, wk, title, sort=TRUE) %>% 
  filter(n > 1) %>% 
  arrange(wk) %>% 
  gt::gt() %>% 
  gt::fmt_number(c("wk", "n"), decimals = 0)

src	wk	title	n
R-bloggers	3	conapomx data package	2
R Weekly (Live)	5	R in Latin America	2
R Weekly (Live)	12	Truncated Poisson distributions in R and Stan by @ellis2013nz	2
R Weekly (Live)	17	Regression Modeling Strategies	2
R Weekly (Live)	18	How much work is onboarding?	2
R Weekly (Live)	18	Survey books, courses and tools by @ellis2013nz	2
R-bloggers	20	Beautiful and Powerful Correlation Tables in R	2
R Weekly (Live)	24	R Consortium is soliciting your feedback on R package best practices	2
R Weekly (Live)	33	RcppArmadillo 0.9.100.5.0	2
R-bloggers	33	RcppArmadillo 0.9.100.5.0	2
R-bloggers	39	Individual level data	2
R Weekly (Live)	41	How R gets built on Windows	2
R Weekly (Live)	41	R Consortium grant applications due October 31	2
R Weekly (Live)	41	The Economist’s Big Mac Index is calculated with R	2
R Weekly (Live)	42	A small logical change with big impact	2
R Weekly (Live)	42	Maryland’s Bridge Safety, reported using R	2
R-bloggers	47	OneR – fascinating insights through simple rules	2

In fact, it happens infrequently enough that I’m going to let the “noise” stay in the data since Feedly technically is tracking some content change.

Let’s look at the week-over-week curation counts (neither source publishes original content, so using the term “published” seems ill fitting) for each:

count(y2018, src, wk) %>% 
  ggplot(aes(wk, n)) +
  geom_segment(aes(xend=wk, yend=0, color = src), show.legend = FALSE) +
  facet_wrap(~src, ncol=1, scales="free_x") + 
  labs(
    x = "Week #", y = "# Posts", 
    title = "Weekly Post Curation Stats for R-bloggers & R Weekly (Live)"
  ) +
  theme_ft_rc(grid="Y")

Despite R-bloggers having curated more overall content, there’s plenty to read each week for consumers of either/both aggregators.

Speaking of consuming, let’s look at the distribution of engagement scores for both aggregators:

group_by(y2018, src) %>% 
  summarise(v = list(broom::tidy(summary(engagement)))) %>% 
  unnest()
## # A tibble: 2 x 8
##   src             minimum    q1 median  mean    q3 maximum    na
##   <chr>             <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl> <dbl>
## 1 R Weekly (Live)       0     0    0     0       0       0  1060
## 2 R-bloggers            1    16   32.5  58.7    70    2023    NA

Well, it seems that it’s more difficult for Feedly to track engagement for the link-only R Weekly (Live) feed, so we’ll have to focus on R-bloggers for engagement views. Summary values are fine, but we can get a picture of the engagement distribution (we’ll do it monthly to get a bit more granularity, too):

filter(y2018, src == "R-bloggers") %>% 
  mutate(month = lubridate::month(published, label = TRUE, abbr = TRUE)) %>% 
  ggplot(aes(month, engagement)) +
  geom_violin() +
  ggbeeswarm::geom_quasirandom(
    groupOnX = TRUE, size = 2, color = "#2b2b2b", fill = ft_cols$green,
    shape = 21, stroke = 0.25
  ) +
  scale_y_comma(trans = "log10") +
  labs(
    x = NULL, y = "Engagement Score",
    title = "Monthly Post Engagement Distributions for R-bloggers Curated Posts",
    caption = "NOTE: Y-axis log10 Scale"
  ) +
  theme_ft_rc(grid="Y")

I wasn’t expecting each month’s distribution to be so similar. There are definitely outliers in terms of positive engagement so we should be able see what types of R-focused content piques the interest of the ~25,000 Feedly subscribers of R-bloggers.

filter(y2018, src == "R-bloggers") %>% 
  group_by(author) %>% 
  summarise(n_posts = n(), total_eng = sum(engagement), avg_eng = mean(engagement), med_eng = median(engagement)) %>% 
  arrange(desc(n_posts)) %>% 
  slice(1:20) %>% 
  gt::gt() %>% 
  gt::fmt_number(c("n_posts", "total_eng", "avg_eng", "med_eng"), decimals = 0)

author	n_posts	total_eng	avg_eng	med_eng
David Smith	116	9,791	84	47
John Mount	94	4,614	49	33
rOpenSci – open tools for open science	89	2,967	33	19
Thinking inside the box	85	1,510	18	14
R Views	60	4,142	69	47
hrbrmstr	55	1,179	21	16
Dr. Shirin Glander	54	2,747	51	25
xi’an	49	990	20	12
Mango Solutions	42	1,221	29	17
Econometrics and Free Software	33	2,858	87	60
business-science.io – Articles	31	4,484	145	70
NA	31	1,724	56	40
statcompute	29	1,329	46	33
Ryan Sheehy	25	1,271	51	45
Keith Goldfeld	24	1,305	54	43
free range statistics – R	23	440	19	12
Jakob Gepp	21	348	17	13
Tal Galili	21	1,587	76	22
Jozef’s Rblog	18	1,617	90	65
arthur charpentier	16	1,320	82	68

It is absolutely no surprise David comes in at number one in both post count and almost every engagement summary statistic since he’s a veritable blogging machine and creates + curates some super interesting content (whereas your’s truly doesn’t even make the median engagement cut ?).

What were the most engaging posts?

filter(y2018, src == "R-bloggers") %>% 
  arrange(desc(engagement)) %>% 
  mutate(published = as.Date(published)) %>% 
  select(engagement, title, published, author) %>% 
  slice(1:50) %>% 
  gt::gt() %>% 
  gt::fmt_number(c("engagement"), decimals = 0)

engagement	title	published	author
2,023	Happy Birthday R	2018-08-27	eoda GmbH
1,132	15 Types of Regression you should know	2018-03-25	ListenData
697	R and Python: How to Integrate the Best of Both into Your Data Science Workflow	2018-10-08	business-science.io – Articles
690	Ultimate Python Cheatsheet: Data Science Workflow with Python	2018-11-18	business-science.io – Articles
639	Data Analysis with Python Course: How to read, wrangle, and analyze data	2018-10-31	Andrew Treadway
617	Machine Learning Results in R: one plot to rule them all!	2018-07-18	Bernardo Lares
614	R tip: Use Radix Sort	2018-08-21	John Mount
610	Data science courses in R (/python/etc.) for $10 at Udemy (Sitewide Sale until Aug 26th)	2018-08-24	Tal Galili
575	Why R for data science – and not Python?	2018-12-02	Learning Machines
560	Case Study: How To Build A High Performance Data Science Team	2018-09-18	business-science.io – Articles
516	R 3.5.0 is released! (major release with many new features)	2018-04-24	Tal Galili
482	R or Python? Why not both? Using Anaconda Python within R with {reticulate}	2018-12-30	Econometrics and Free Software
479	Sankey Diagram for the 2018 FIFA World Cup Forecast	2018-06-10	Achim Zeileis
477	5 amazing free tools that can help with publishing R results and blogging	2018-12-22	Jozef’s Rblog
462	What’s the difference between data science, machine learning, and artificial intelligence?	2018-01-09	David Robinson
456	XKCD “Curve Fitting”, in R	2018-09-28	David Smith
450	The prequel to the drake R package	2018-02-06	rOpenSci – open tools for open science
449	Who wrote that anonymous NYT op-ed? Text similarity analyses with R	2018-09-07	David Smith
437	Elegant regression results tables and plots in R: the finalfit package	2018-05-16	Ewen Harrison
428	How to implement neural networks in R	2018-01-12	David Smith
426	Data transformation in #tidyverse style: package sjmisc updated #rstats	2018-02-06	Daniel
413	Neural Networks Are Essentially Polynomial Regression	2018-06-20	matloff
403	Custom R charts coming to Excel	2018-05-11	David Smith
379	A perfect RStudio layout	2018-05-22	Ilya Kashnitsky
370	Drawing beautiful maps programmatically with R, sf and ggplot2 — Part 1: Basics	2018-10-25	Mel Moreno and Mathieu Basille
368	The Financial Times and BBC use R for publication graphics	2018-06-27	David Smith
367	Dealing with The Problem of Multicollinearity in R	2018-08-16	Perceptive Analytics
367	Excel is obsolete. Here are the Top 2 alternatives from R and Python.	2018-03-13	Appsilon Data Science Blog
365	New R Cheatsheet: Data Science Workflow with R	2018-11-04	business-science.io – Articles
361	Tips for analyzing Excel data in R	2018-08-30	David Smith
360	Importing 30GB of data in R with sparklyr	2018-02-16	Econometrics and Free Software
358	Scraping a website with 5 lines of R code	2018-01-24	David Smith
356	Clustering the Bible	2018-12-27	Learning Machines
356	Finally, You Can Plot H2O Decision Trees in R	2018-12-26	Gregory Kanevsky
356	Geocomputation with R – the afterword	2018-12-12	Rstats on Jakub Nowosad’s website
347	Time Series Deep Learning: Forecasting Sunspots With Keras Stateful LSTM In R	2018-04-18	business-science.io – Articles
343	Run Python from R	2018-03-27	Deepanshu Bhalla
336	Machine Learning Results in R: one plot to rule them all! (Part 2 – Regression Models)	2018-07-24	Bernardo Lares
332	R Generation: 25 Years of R	2018-08-01	David Smith
329	How to extract data from a PDF file with R	2018-01-05	Packt Publishing
325	R or Python? Python or R? The ongoing debate.	2018-01-28	tomaztsql
322	How to perform Logistic Regression, LDA, & QDA in R	2018-01-05	Prashant Shekhar
321	Who wrote the anti-Trump New York Times op-ed? Using tidytext to find document similarity	2018-09-06	David Robinson
311	Intuition for principal component analysis (PCA)	2018-12-06	Learning Machines
310	Packages for Getting Started with Time Series Analysis in R	2018-02-18	atmathew
309	Announcing the R Markdown Book	2018-07-13	Yihui Xie
307	Automated Email Reports with R	2018-11-01	JOURNEYOFANALYTICS
304	future.apply – Parallelize Any Base R Apply Function	2018-06-23	JottR on R
298	How to build your own Neural Network from scratch in R	2018-10-09	Posts on Tychobra
293	RStudio 1.2 Preview: SQL Integration	2018-10-02	Jonathan McPherson

Weekly & monthly curated post descriptive statstic patterns haven’t changed much since the April post:

filter(y2018, src == "R-bloggers") %>% 
  mutate(wkday = lubridate::wday(published, label = TRUE, abbr = TRUE)) %>%
  count(wkday) %>% 
  ggplot(aes(wkday, n)) +
  geom_col(width = 0.5, fill = ft_cols$slate, color = NA) +
  scale_y_comma() +
  labs(
    x = NULL, y = "# Curated Posts",
    title = "Day-of-week Curated Post Count for the R-bloggers Feed"
  ) +
  theme_ft_rc(grid="Y")

filter(y2018, src == "R-bloggers") %>% 
  mutate(month = lubridate::month(published, label = TRUE, abbr = TRUE)) %>%
  count(month) %>% 
  ggplot(aes(month, n)) +
  geom_col(width = 0.5, fill = ft_cols$slate, color = NA) +
  scale_y_comma() +
  labs(
    x = NULL, y = "# Curated Posts",
    title = "Monthly Curated Post Count for the R-bloggers Feed"
  ) +
  theme_ft_rc(grid="Y")

Surprisingly, monthly post count consistency (or even posting something each month) is not a common trait amongst the top 20 (by total engagement) authors:

w20 <- scales::wrap_format(20)

filter(y2018, src == "R-bloggers") %>% 
  filter(!is.na(author)) %>% # some posts don't have author attribution
  mutate(author_t = map_chr(w20(author), paste0, collapse="\n")) %>% # we need to wrap for facet titles (below)
  count(author, author_t, wt=engagement, sort=TRUE) %>% # get total author engagement
  slice(1:20) %>% # top 20
  { .auth_ordr <<- . ; . } %>% # we use the order later
  left_join(filter(y2018, src == "R-bloggers"), "author") %>% 
  mutate(month = lubridate::month(published, label = TRUE, abbr = TRUE)) %>%
  count(month, author_t, sort = TRUE) %>% 
  mutate(author_t = factor(author_t, levels = .auth_ordr$author_t)) %>% 
  ggplot(aes(month, nn, author_t)) +
  geom_col(width = 0.5) +
  scale_x_discrete(labels=substring(month.abb, 1, 1)) +
  scale_y_comma() +
  facet_wrap(~author_t) +
  labs(
    x = NULL, y = "Curated Post Count",
    title = "Monthly Curated Post Counts-per-Author (Top 20 by Engagement)",
    subtitle = "Arranged by Total Author Engagement"
  ) +
  theme_ft_rc(grid="yY")

Overall, most authors favor shorter titles for their posts:

filter(y2018, src == "R-bloggers") %>% 
  mutate(
    `Character Count Distribution` = nchar(title), 
    `Word Count Distribution` = stringi::stri_count_boundaries(title, type = "word")
  ) %>% 
  select(id, `Character Count Distribution`, `Word Count Distribution`) %>% 
  gather(measure, value, -id) %>% 
  ggplot(aes(value)) +
  ggalt::geom_bkde(alpha=1/3, color = ft_cols$slate, fill = ft_cols$slate) +
  scale_y_continuous(expand=c(0,0)) +
  facet_wrap(~measure, scales = "free") +
  labs(
    x = NULL, y = "Density",
    title = "Title Character/Word Count Distributions",
    subtitle = "~38 characters/11 words seems to be the sweet spot for most authors",
    caption = "Note Free X/Y Scales"
  ) +
  theme_ft_rc(grid="XY")

This post is already kinda tome-length so I’ll leave it to y’all to grab the data and dig in a bit more.

A Word About Using The `content_content` Field For R-bloggers Posts

Since R-bloggers requires a full feed from contributors, they, in-turn, post a “kinda” full-feed back out. I say “kinda” as they still haven’t fixed a reported bug in their processing engine which causes issues in (at least) Feedly’s RSS processing engine. If you use Feedly, take a look at the R-bloggers RSS feed entry for the recent “R or Python? Why not both? Using Anaconda Python within R with {reticulate}” post. It cuts off near “Let’s check its type:”. This is due to the way the < character is processed by the R-bloggers ingestion engine which turns the ## <class 'pandas.core.frame.DataFrame'> in the original post and doesn’t even display right on the R-bloggers page as it mangles the input and turns the descriptive output into an actuall <class> tag: <class 'pandas.core.frame.dataframe'=""></class>. It’s really an issue on both sides, but R-bloggers is doing the mangling and should seriously consider addressing it in 2019.

Since it is still not fixed, it forces you to go to R-bloggers (clicks FTW? and may partly explain why that example post has a 400+ engagement score) unless you scroll back up to the top of the Feedly view and go to the author’s blog page. Given that tibble output invariably has a < right up top, your best bet for getting more direct views of your own content is to get a code-block with printed ## < output in it as close to the beginning as possible (perhaps start each post with a print(tbl_df(mtcars)))? ?).

Putting post-view-hacking levity aside, this content mangling means you can’t trust the content_content column in the stream data frame to have all the content; that is, if you were planning on taking the provided data and doing some topic clustering or content-based feature extraction for other stats/ML ops you’re out of luck and need to crawl the original site URLs on your own to get the main content for such analyses.

A Bit More About seymour

The seymour package has the following API functions:

feedly_access_token: Retrieve the Feedly Developer Token
feedly_collections: Retrieve Feedly Connections
feedly_feed_meta: Retrieve Metadata for a Feed
feedly_opml: Retrieve Your Feedly OPML File
feedly_profile: Retrieve Your Feedly Profile
feedly_search_contents: Search content of a stream
feedly_search_title: Find feeds based on title, url or ‘#topic’
feedly_stream: Retrieve contents of a Feedly “stream”
feedly_tags: Retrieve List of Tags

along with following helper function (which we’ll introduce in a minute):

render_stream: Render a Feedly Stream Data Frame to RMarkdown

and, the following helper reference (as Feedly has some “universal” streams):

global_resource_ids: Global Resource Ids Helper Reference

The render_stream() function is semi-useful on its own but was designed as more of a “you may want to replicate this on your own” (i.e. have a look at the source code and riff off of it). “Streams” are individual feeds, collections or even “boards” you make and with this new API package and the power of R Markdown, you can make your own “newsletter” like this:

fp <- feedly_profile() # get profile to get my id

# use the id to get my "security" category feed in my feedly
fs <- feedly_stream(sprintf("user/%s/category/security", fp$id))

# get the top 10 items with engagement >= third quartile of all posts
# and don't include duplicates in the report
mutate(fs$items, published = as.Date(published)) %>% 
  filter(published >= as.Date("2018-12-01")) %>%
  filter(engagement > fivenum(engagement)[4]) %>% 
  filter(!is.na(summary_content)) %>% 
  mutate(alt_url = map_chr(alternate, ~.x[[1]])) %>% 
  distinct(alt_url, .keep_all = TRUE) %>% 
  slice(1:10) -> for_report

# render the report
render_stream(
  feedly_stream = for_report, 
  title = "Cybersecurity News", 
  include_visual = TRUE,
  browse = TRUE
)

Which makes the following Rmd and HTML. (So, no need to “upgrade” to “Teams” to make newsletters!).

FIN

As noted, the 2018 data for R Weekly (Live) & R-bloggers is available and you can find the seymour package on [GL | GH].

If you’re not a Feedly user I strongly encourage you to give it a go! And, if you don’t subscribe to R Weekly, you should make that your first New Year’s Resolution.

Here’s looking to another year of great R content across the R blogosphere!

3 Trackbacks/Pingbacks

By Exploring 2018 R-bloggers & R Weekly Posts with Feedly & the ‘seymour’ package – Data Science Austria on 31 Dec 2018 at 3:46 pm

[…] leave a comment for the author, please follow the link and comment on their blog: R – rud.is. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data […]
By Quick Hit: Using seymour to Subscribe to your Git[la|hu]b Repo Issues in Feedly | rud.is on 30 Jan 2019 at 3:26 am

[…] seymour? Feedly API package has been updated to support subscribing to RSS/Atom feeds. Previously the package was intended to just treat your Feedly as a data source, but there was a compelling use […]
By Quick Hit: Using seymour to Subscribe to your Git[la|hu]b Repo Issues in Feedly – Data Science Austria on 30 Jan 2019 at 2:24 pm

[…] API package has been updated to support subscribing to RSS/Atom feeds. Previously the package was intended to just treat your Feedly as a data source, but there was a compelling use […]

rud.is