"Experiment" Hypothesis:
We can use DuckDB to wrangle data for us, let R do some "modeling", and let Observable Plot show us the results
"Experiment" parameters:
DuckDBCLientBuilding off of the previous experiment, today we will combine DuckDB data ops with WebR, letting R do some trivial modeling with glm on data we load and wrangle with DuckDB.
Let's be super clear, right up front: this data is small enough to load into R, process in R, and then model and plot in R without any other packages (save {svglite}). It is deliberately a toy example to make it easier to work with while showing the core concepts.
Here's the tables we have:
This is the schema for our tags table:
This is what's in it:
-- Setup a date range that spans the entire min/max created_at
-- We need this b/c we don't have tags every day so there are
-- gaps in the time series
WITH date_range AS (
  SELECT UNNEST(generate_series(
    (SELECT MIN(created_at) FROM tags),
    (SELECT MAX(created_at) FROM tags),
    INTERVAL '1 day'
  )) AS date
),
-- count number of tags/day
grouped_tags AS (
  SELECT
  created_at,
    COUNT(*) AS daily_count
  FROM
    tags
  GROUP BY
    created_at
),
-- join to the full range and fill in values
joined_dates_counts AS (
  SELECT
    dr.date,
    COALESCE(gt.daily_count, 0) AS filled_daily_count
  FROM
    date_range dr
  LEFT JOIN
    grouped_tags gt
  ON
    dr.date = gt.created_at
)
-- get the cumulative sum and days since the min created_at
SELECT
  date,
  filled_daily_count,
  SUM(filled_daily_count) OVER (ORDER BY date) AS running_cumulative_sum,
  DATEDIFF('day', (SELECT MIN(date) FROM joined_dates_counts), date) AS days_elapsed
FROM
  joined_dates_counts;
function(csum, days_elapsed, target_csum) {
  # saddest. model. ever.
  model <- glm(csum ~ days_elapsed, family = "poisson")
  predicted_days_elapsed <- days_elapsed
  while (TRUE) {
    predicted_days_elapsed <- max(predicted_days_elapsed) + 1
    predict(
      model, 
      newdata = data.frame(days_elapsed = predicted_days_elapsed), 
      type = "response"
    ) -> predicted_csum
    if (predicted_csum >= target_csum) break
  }
  predicted_days_elapsed
}
// call the function
const nDays = await predict(
  tagsCumSum.map(d => d.csum),
  tagsCumSum.map(d => d.days_elapsed),
  1000
)
// I hate date stuff in JS so much
function addDays(date, days) {
  const copy = new Date(Number(date))
  copy.setDate(date.getDate() + days)
  return copy
}
const minDate = ddbResToArray(await db.sql`SELECT min(created_at) AS min_date FROM tags`)[0].min_date
addDays(minDate, nDays.values[0]).toDateString()
We will reach 1,000 tags on or about .
You can find the source on GitHub.
Brought to you by @hrbrmstr