

{"id":12127,"date":"2019-04-07T10:31:29","date_gmt":"2019-04-07T15:31:29","guid":{"rendered":"https:\/\/rud.is\/b\/?p=12127"},"modified":"2019-04-07T10:31:29","modified_gmt":"2019-04-07T15:31:29","slug":"a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/","title":{"rendered":"A Limited-but-Functional Couchbase Free Text Search &#038; Retrieval Un-package; or, &#8220;How I Abused Couchbase &#038; R to Perform Bulk IP Whois Full-text Searches&#8221; (a Cobbler&#8217;s Tale)"},"content":{"rendered":"<p>Researching &#8220;the internet&#8221; (i.e. $DAYJOB) means having to deal with a ton of &#8220;unique&#8221; (I&#8217;m being kind) data formats. This is ultimately a tale of how I performed full-text searches across one of them.<\/p>\n<p>It all started off innocently enough. This past week I need to be able to do full-text searches across metadata about who is using which parts of the internet. Normally I don&#8217;t need to do that at scale and can just go to <a href=\"https:\/\/apps.db.ripe.net\/db-web-ui\/#\/fulltextsearch\">RIPE&#8217;s excellent resource<\/a> and manage to find what I need on the first page. However, this time I needed <em>all<\/em> the resultant info and noticed an interesting foible on that full text search interface. To reproduce it. Enter something like &#8220;<code>domino's<\/code>&#8221; (for the record, I&#8217;m not researching Domino&#8217;s Pizza \u2014 nor would I ever consume it \u2014 but a Twitter ad happened to fly by for Domino&#8217;s and I just typed it for kicks) into the field and page around, keeping an eye on the results. I think they still use Solr for indexing\/searching and aren&#8217;t passing in all they need to keep session context or something. Anyway, suffice it to say it was fairly useless (I filed a bug report, so I&#8217;m not just complaining, and I wish more sites had the same easy error reporting filing capability the RIPE folks do).<\/p>\n<p>If it were just searching for precise data in one field, that&#8217;s not really an issue since we have ALL THE WHOIS IP THINGS in Parquet. But:<\/p>\n<ul>\n<li>I really <em>hate<\/em> giving Amazon money (even if it&#8217;s $WORK money) for Athena queries<\/li>\n<li>Full text search across all columns is not one of Parquet&#8217;s strengths<\/li>\n<li>This is a third bullet b\/c I feel compelled to have a minimum of three points in bullet lists likely thanks to an overbearing middle-school English teacher<\/li>\n<\/ul>\n<p>Since I have a modest analytics server setup at home, I figured I&#8217;d take the opportunity to re-brush-up on either Elasticsearch or Couchbase since <a href=\"https:\/\/dzone.com\/articles\/searching-json-comparing-text-search-in-couchbase-3\">both are pretty great at free text searching JSON data<\/a>. Except\u2026this isn&#8217;t JSON data, It&#8217;s records formatted like this:<\/p>\n<pre><code class=\"language-plain\">#\n# The contents of this file are subject to \n# RIPE Database Terms and Conditions\n#\n# http:\/\/www.ripe.net\/db\/support\/db-terms-conditions.pdf\n#\n\nas-block:       AS7 - AS7\ndescr:          RIPE NCC ASN block\nremarks:        These AS Numbers are assigned to network operators in the RIPE NCC service region.\nmnt-by:         RIPE-NCC-HM-MNT\ncreated:        2018-11-22T15:27:05Z\nlast-modified:  2018-11-22T15:27:05Z\nsource:         RIPE\nremarks:        ****************************\nremarks:        * THIS OBJECT IS MODIFIED\nremarks:        * Please note that all data that is generally regarded as personal\nremarks:        * data has been removed from this object.\nremarks:        * To view the original object, please query the RIPE Database at:\nremarks:        * http:\/\/www.ripe.net\/whois\nremarks:        ****************************\n\nas-block:       AS28 - AS28\ndescr:          RIPE NCC ASN block\nremarks:        These AS Numbers are assigned to network operators in the RIPE NCC service region.\nmnt-by:         RIPE-NCC-HM-MNT\ncreated:        2018-11-22T15:27:05Z\nlast-modified:  2018-11-22T15:27:05Z\nsource:         RIPE\nremarks:        ****************************\nremarks:        * THIS OBJECT IS MODIFIED\nremarks:        * Please note that all data that is generally regarded as personal\nremarks:        * data has been removed from this object.\nremarks:        * To view the original object, please query the RIPE Database at:\nremarks:        * http:\/\/www.ripe.net\/whois\nremarks:        ****************************\n<\/code><\/pre>\n<p>They &#8220;keys&#8221; (the colon-ified line prefixes) vary and there are other record types (which I don&#8217;t need) that have other prefixes in them plus those <code>#<\/code>-prefixed comments are not necessarily only at the top. But, after judicious use of <code>stringi::stri::stri_enc_toutf8()<\/code>, <code>stringi::stri_split_regex()<\/code> and some vectorized record targeting they&#8217;re pretty easily converted to lovely ndjson data like this (random selection further in the conversion):<\/p>\n<pre><code class=\"language-plain\">{\"descr\":\"Reseau Teleinformatique de l'Education Nationale Educational and research network for Luxembourg\",\"admin_c\":\"DUMY-RIPE\",\"as_set\":\"AS-RESTENA\",\"members\":\"AS2602, AS42909, AS51966, AS49624\",\"mnt_by\":\"AS2602-MNT\",\"notify\":\"noc@restena.lu\",\"tech_c\":\"DUMY-RIPE\"}\n{\"descr\":\"CWIX ASes announced to EBONE\",\"admin_c\":\"DUMY-RIPE\",\"as_set\":\"AS-TMPEBONECWIX\",\"members\":\"AS3727, AS4445, AS4610, AS4624, AS4637, AS4654, AS4655, AS4656, AS4659 AS4681, AS4696, AS4714, AS4849, AS5089, AS5090, AS5532, AS5551, AS5559 AS5655, AS6081, AS6255, AS6292, AS6618, AS6639\",\"mnt_by\":\"EBONE-MNT\",\"notify\":\"staff@ebone.net\",\"tech_c\":\"DUMY-RIPE\"}\n{\"descr\":\"ASs accepted by DFN from the University of Cologne\",\"admin_c\":\"DUMY-RIPE\",\"as_set\":\"AS-DFNFROMCOLOGNE\",\"members\":\"AS5520 AS6733\",\"mnt_by\":\"DFN-MNT\",\"tech_c\":\"DUMY-RIPE\"}\n{\"descr\":\"NetMatters UK\",\"admin_c\":\"DUMY-RIPE\",\"as_set\":\"AS-NETMATTERS\",\"members\":\"AS6765 AS3344\",\"mnt_by\":\"AS8407-MNT\",\"tech_c\":\"DUMY-RIPE\"}\n<\/code><\/pre>\n<p>I went with Couchbase since it <a href=\"https:\/\/docs.couchbase.com\/server\/6.0\/tools\/cbimport-json.html\">handles ndjson import by default<\/a> and \u2014 as you know since you read the comparison in the aforelinked article \u2014 it can <a href=\"https:\/\/docs.couchbase.com\/server\/6.0\/fts\/fts-searching-from-the-ui.html\">easily index all fields by default without you having to do virtually anything<\/a>. Plus, Couchbase has been around long enough that it generally installs without pain and has a fairly decent web admin panel. Here&#8217;s a snapshot of the final import:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/cb-screen.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"12130\" data-permalink=\"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/cb-screen\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/cb-screen.png?fit=2880%2C1639&amp;ssl=1\" data-orig-size=\"2880,1639\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cb-screen\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/cb-screen.png?fit=510%2C290&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/cb-screen.png?resize=510%2C290&#038;ssl=1\" alt=\"\" width=\"510\" height=\"290\" class=\"aligncenter size-full wp-image-12130\" \/><\/a><\/p>\n<p>and here&#8217;s the config for the &#8220;<code>all<\/code>&#8221; full text index:<\/p>\n<pre><code class=\"language-plain\">{\n  \"type\": \"fulltext-index\",\n  \"name\": \"all\",\n  \"uuid\": \"481bc7ed642dddfb\",\n  \"sourceType\": \"couchbase\",\n  \"sourceName\": \"ripe\",\n  \"sourceUUID\": \"3ffbbe0c0923f233ffe0fc96c652262d\",\n  \"planParams\": {\n    \"maxPartitionsPerPIndex\": 171\n  },\n  \"params\": {\n    \"doc_config\": {\n      \"docid_prefix_delim\": \"\",\n      \"docid_regexp\": \"\",\n      \"mode\": \"type_field\",\n      \"type_field\": \"type\"\n    },\n    \"mapping\": {\n      \"analysis\": {},\n      \"default_analyzer\": \"standard\",\n      \"default_datetime_parser\": \"dateTimeOptional\",\n      \"default_field\": \"_all\",\n      \"default_mapping\": {\n        \"dynamic\": true,\n        \"enabled\": true\n      },\n      \"default_type\": \"_default\",\n      \"docvalues_dynamic\": true,\n      \"index_dynamic\": true,\n      \"store_dynamic\": false,\n      \"type_field\": \"_type\"\n    },\n    \"store\": {\n      \"indexType\": \"scorch\",\n      \"kvStoreName\": \"\"\n    }\n  },\n  \"sourceParams\": {}\n}\n<\/code><\/pre>\n<h3>You Said This Is A Post With R Code<\/h3>\n<p>Very true! We&#8217;ll get to that in a minute.<\/p>\n<p>Going with Couchbase introduced a different problem: there&#8217;s almost no R support for Couchbase. Sure, Couchbase has a gnarly, two-year old, raw <code>httr::<\/code>-prefixed bit of a <a href=\"https:\/\/blog.couchbase.com\/using-couchbase-r\/\">tutorial post<\/a>  but that&#8217;s not really as cool as if there were a <code>library(couchbase)<\/code>. I mean, you can check <a href=\"https:\/\/github.com\/search?l=r&amp;q=couchbase&amp;type=Repositories\">GitUgh<\/a> or <a href=\"https:\/\/www.google.com\/search?q=site%3Acran+couchbase&amp;oq=site%3Acran+couchbase&amp;aqs=chrome..69i57j69i58.3351j0j4&amp;sourceid=chrome&amp;ie=UTF-8\">CRAN<\/a> or <a href=\"http:\/\/bfy.tw\/N6wW\">a more general search<\/a> yourself if you&#8217;d like but it&#8217;s going to come up bupkis.<\/p>\n<p>If you were expecting a big reveal, right now, that I&#8217;ve got a feature-packed, full R Couchbase package ready to roll\u2026you didn&#8217;t actually read the title of the post. What I do have is <a href=\"https:\/\/paste.sr.ht\/~hrbrmstr\/051f5d5400644952a3ad2cf8664b84e2cbb9ac6b\">a set of functions<\/a> that \u2014 given server\/connection metadata, a bucket, a full text index, and a query \u2014 will return all matching documents (I still do not like that term for &#8220;record&#8221;) for said set of parameters:<\/p>\n<pre><code class=\"language-r\"># function code is in: https:\/\/paste.sr.ht\/~hrbrmstr\/051f5d5400644952a3ad2cf8664b84e2cbb9ac6b\n\ncb_fts(\"domino's\", \"all\", \"ripe\")\n## # A tibble: 120 x 9\n##    admin_c   country descr                      inetnum                  mnt_by      netname  status    tech_c  notify         \n##    &lt;chr&gt;     &lt;chr&gt;   &lt;chr&gt;                      &lt;chr&gt;                    &lt;chr&gt;       &lt;chr&gt;    &lt;chr&gt;     &lt;chr&gt;   &lt;chr&gt;          \n##  1 DUMY-RIPE FR      OPEN IP DOMINO'S PIZZA     79.141.8.44 - 79.141.8.\u2026 ALPHALINK-\u2026 OPEN-IP  ASSIGNED\u2026 DUMY-R\u2026 NA             \n##  2 DUMY-RIPE NL      Domino's Pizza TILBURG     62.21.176.160 - 62.21.1\u2026 AS286-MNT   OTS2634\u2026 ASSIGNED\u2026 DUMY-R\u2026 ip-reg@kpn.net \n##  3 DUMY-RIPE NL      Domino's Pizza EINDHOVEN   62.132.252.168 - 62.132\u2026 AS286-MNT   OTS2270\u2026 ASSIGNED\u2026 DUMY-R\u2026 ip-reg@kpn.net \n##  4 DUMY-RIPE NL      Domino's Pizza SPYKENISSE  194.123.233.232 - 194.1\u2026 AS286-MNT   OTS69259 ASSIGNED\u2026 DUMY-R\u2026 ip-reg@kpn.net \n##  5 DUMY-RIPE NL      Domino's AMSTERDAM         37.74.38.188 - 37.74.38\u2026 AS286-MNT   OTS6103\u2026 ASSIGNED\u2026 DUMY-R\u2026 kpn-ip-office@\u2026\n##  6 DUMY-RIPE NL      Domino's Pizza VOORSCHOTEN 92.66.116.136 - 92.66.1\u2026 AS286-MNT   OTS1914\u2026 ASSIGNED\u2026 DUMY-R\u2026 ip-reg@kpn.net \n##  7 DUMY-RIPE NL      Domino's Pizza Doetinchem\u2026 212.241.42.136 - 212.24\u2026 AS286-MNT   OTS2301\u2026 ASSIGNED\u2026 DUMY-R\u2026 ip-reg@kpn.net \n##  8 DUMY-RIPE NL      Domino's Pizza AMSTERDAM   194.120.45.224 - 194.12\u2026 AS286-MNT   OTS82906 ASSIGNED\u2026 DUMY-R\u2026 ip-reg@kpn.net \n##  9 DUMY-RIPE NL      Domino's Pizza [Woerden] \u2026 62.41.228.80 - 62.41.22\u2026 AS286-MNT   OTS2024\u2026 ASSIGNED\u2026 DUMY-R\u2026 ip-reg@kpn.net \n## 10 DUMY-RIPE NL      Domino's Pizza GRONINGEN   188.203.128.0 - 188.203\u2026 AS286-MNT   OTS3767\u2026 ASSIGNED\u2026 DUMY-R\u2026 kpn-ip-office@\u2026\n## # \u2026 with 110 more rows\n<\/code><\/pre>\n<p>It&#8217;s not fancy.<\/p>\n<p>It&#8217;s meets the needs of a narrow use-case.<\/p>\n<p>It&#8217;s not in a standalone package (which is triggering my R code OCD something fierce).<\/p>\n<p>But, it&#8217;s seriously fast, got me back to &#8220;work mode&#8221; with a minimum of hassle, and now there&#8217;s some google-able Couchbase R code that isn&#8217;t just bare <code>httr<\/code> calls that may help someone else who&#8217;s on a quest for how to work with Couchbase in R.<\/p>\n<p>The first primary function \u2013 <code>cb_fts()<\/code> \u2014 uses the <code>\/api\/index\/{index-name}\/query<\/code> API endpoint to paginate through results of the full text search and retrieves all matching record doc id keys, then calls the last primary function \u2014 <code>cb_get_records_from_keys()<\/code> \u2014 which uses the <code>\/query\/service<\/code> API endpoint, issues a <code>SELECT * FROM {bucket} USE KEYS {keys}<\/code> query with all the found document (record) key ids and returns the result set. Nothing more fancy than that.<\/p>\n<h3>FIN<\/h3>\n<p>While I do not have these functions in a standalone, Couchbase-focused package I <em>do<\/em> have them in the package associated with this particular project. If you do know of a Couchbase R package (please don&#8217;t link to JDBC\/ODBC drivers as I&#8217;m not going to buy) please link to them in the comments.<\/p>\n<p>If you have other strategies for how to deal with these &#8220;un-packages&#8221;, please blog about it and post a link as well! I&#8217;m curious how others balance the package\/not-a-package\/un-package tension, especially when you may need to depend on a series of functions across projects.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Researching &#8220;the internet&#8221; (i.e. $DAYJOB) means having to deal with a ton of &#8220;unique&#8221; (I&#8217;m being kind) data formats. This is ultimately a tale of how I performed full-text searches across one of them. It all started off innocently enough. This past week I need to be able to do full-text searches across metadata about [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[91],"tags":[],"class_list":["post-12127","post","type-post","status-publish","format-standard","hentry","category-r"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A Limited-but-Functional Couchbase Free Text Search &amp; Retrieval Un-package; or, &quot;How I Abused Couchbase &amp; R to Perform Bulk IP Whois Full-text Searches&quot; (a Cobbler&#039;s Tale) - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Limited-but-Functional Couchbase Free Text Search &amp; Retrieval Un-package; or, &quot;How I Abused Couchbase &amp; R to Perform Bulk IP Whois Full-text Searches&quot; (a Cobbler&#039;s Tale) - rud.is\" \/>\n<meta property=\"og:description\" content=\"Researching &#8220;the internet&#8221; (i.e. $DAYJOB) means having to deal with a ton of &#8220;unique&#8221; (I&#8217;m being kind) data formats. This is ultimately a tale of how I performed full-text searches across one of them. It all started off innocently enough. This past week I need to be able to do full-text searches across metadata about [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2019-04-07T15:31:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/rud.is\/b\/wp-content\/uploads\/2019\/04\/cb-screen.png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"A Limited-but-Functional Couchbase Free Text Search &#038; Retrieval Un-package; or, &#8220;How I Abused Couchbase &#038; R to Perform Bulk IP Whois Full-text Searches&#8221; (a Cobbler&#8217;s Tale)\",\"datePublished\":\"2019-04-07T15:31:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/\"},\"wordCount\":921,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2019\\\/04\\\/cb-screen.png\",\"articleSection\":[\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/\",\"name\":\"A Limited-but-Functional Couchbase Free Text Search & Retrieval Un-package; or, \\\"How I Abused Couchbase & R to Perform Bulk IP Whois Full-text Searches\\\" (a Cobbler's Tale) - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2019\\\/04\\\/cb-screen.png\",\"datePublished\":\"2019-04-07T15:31:29+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/#primaryimage\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2019\\\/04\\\/cb-screen.png?fit=2880%2C1639&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2019\\\/04\\\/cb-screen.png?fit=2880%2C1639&ssl=1\",\"width\":2880,\"height\":1639},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/04\\\/07\\\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A Limited-but-Functional Couchbase Free Text Search &#038; Retrieval Un-package; or, &#8220;How I Abused Couchbase &#038; R to Perform Bulk IP Whois Full-text Searches&#8221; (a Cobbler&#8217;s Tale)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A Limited-but-Functional Couchbase Free Text Search & Retrieval Un-package; or, \"How I Abused Couchbase & R to Perform Bulk IP Whois Full-text Searches\" (a Cobbler's Tale) - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/","og_locale":"en_US","og_type":"article","og_title":"A Limited-but-Functional Couchbase Free Text Search & Retrieval Un-package; or, \"How I Abused Couchbase & R to Perform Bulk IP Whois Full-text Searches\" (a Cobbler's Tale) - rud.is","og_description":"Researching &#8220;the internet&#8221; (i.e. $DAYJOB) means having to deal with a ton of &#8220;unique&#8221; (I&#8217;m being kind) data formats. This is ultimately a tale of how I performed full-text searches across one of them. It all started off innocently enough. This past week I need to be able to do full-text searches across metadata about [&hellip;]","og_url":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/","og_site_name":"rud.is","article_published_time":"2019-04-07T15:31:29+00:00","og_image":[{"url":"https:\/\/rud.is\/b\/wp-content\/uploads\/2019\/04\/cb-screen.png","type":"","width":"","height":""}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"A Limited-but-Functional Couchbase Free Text Search &#038; Retrieval Un-package; or, &#8220;How I Abused Couchbase &#038; R to Perform Bulk IP Whois Full-text Searches&#8221; (a Cobbler&#8217;s Tale)","datePublished":"2019-04-07T15:31:29+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/"},"wordCount":921,"commentCount":1,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2019\/04\/cb-screen.png","articleSection":["R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/","url":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/","name":"A Limited-but-Functional Couchbase Free Text Search & Retrieval Un-package; or, \"How I Abused Couchbase & R to Perform Bulk IP Whois Full-text Searches\" (a Cobbler's Tale) - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2019\/04\/cb-screen.png","datePublished":"2019-04-07T15:31:29+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/cb-screen.png?fit=2880%2C1639&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/cb-screen.png?fit=2880%2C1639&ssl=1","width":2880,"height":1639},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2019\/04\/07\/a-limited-but-functional-couchbase-free-text-search-or-how-i-abused-couchbase-r-to-perform-bulk-ip-whois-full-text-searches-a-cobblers-tale\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"A Limited-but-Functional Couchbase Free Text Search &#038; Retrieval Un-package; or, &#8220;How I Abused Couchbase &#038; R to Perform Bulk IP Whois Full-text Searches&#8221; (a Cobbler&#8217;s Tale)"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-39B","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":1878,"url":"https:\/\/rud.is\/b\/2013\/01\/17\/shodan-api-in-r-with-examples\/","url_meta":{"origin":12127,"position":0},"title":"SHODAN API in R (With Examples)","author":"hrbrmstr","date":"2013-01-17","format":false,"excerpt":"Folks may debate the merits of the SHODAN tool, but in my opinion it's a valuable resource, especially if used for \"good\". What is SHODAN? I think ThreatPost summed it up nicely: \"Shodan is a Web based search engine that discovers Internet facing computers, including desktops, servers and routers. The\u2026","rel":"","context":"In &quot;Charts &amp; Graphs&quot;","block_context":{"text":"Charts &amp; Graphs","link":"https:\/\/rud.is\/b\/category\/charts-graphs\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4547,"url":"https:\/\/rud.is\/b\/2016\/07\/24\/mid-year-r-packages-update-summary\/","url_meta":{"origin":12127,"position":1},"title":"Mid-year R Packages Update Summary","author":"hrbrmstr","date":"2016-07-24","format":false,"excerpt":"I been updating some existing packages and github-releasing new ones (before a CRAN push). Most are \"cyber\"-related, but there are some general purpose ones. Here's a quick overview: docxtractr (CRAN, now, v0.2.0) was initially designed to make it easy to get data tables out of MS Word (docx) documents. The\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6788,"url":"https:\/\/rud.is\/b\/2017\/10\/22\/a-call-to-tweets-blog-posts\/","url_meta":{"origin":12127,"position":2},"title":"A Call to Tweets (&#038; Blog Posts)!","author":"hrbrmstr","date":"2017-10-22","format":false,"excerpt":"Way back in July of 2009, the first version of the twitteR package was published by Geoff Jentry in CRAN. Since then it has seen 28 updates, finally breaking the 0.x.y barrier into 1.x.y territory in March of 2013 and receiving it's last update in July of 2015. For a\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6558,"url":"https:\/\/rud.is\/b\/2017\/10\/01\/retrieve-process-tv-news-chyrons-with-newsflash\/","url_meta":{"origin":12127,"position":3},"title":"Retrieve &#038; process TV News chyrons with newsflash","author":"hrbrmstr","date":"2017-10-01","format":false,"excerpt":"The Internet Archive recently announced a new service they've dubbed 'Third Eye'. This service scrapes the chyrons that annoyingly scroll across the bottom-third of TV news broadcasts. IA has a vast historical archive of TV news that they'll eventually process, but --- for now --- the more recent broadcasts from\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/10\/chy01.png?fit=1200%2C594&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/10\/chy01.png?fit=1200%2C594&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/10\/chy01.png?fit=1200%2C594&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/10\/chy01.png?fit=1200%2C594&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/10\/chy01.png?fit=1200%2C594&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":11215,"url":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/","url_meta":{"origin":12127,"position":4},"title":"Digging into mbox details: A tale of tm &#038; reticulate","author":"hrbrmstr","date":"2018-08-04","format":false,"excerpt":"\u2728 I had to processes a bunch of emails for a $DAYJOB task this week and my \"default setting\" is to use R for pretty much everything (this should come as no surprise). Treating mail as data is not an uncommon task and many R packages exist that can reach\u2026","rel":"","context":"In &quot;Python&quot;","block_context":{"text":"Python","link":"https:\/\/rud.is\/b\/category\/python-2\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12749,"url":"https:\/\/rud.is\/b\/2020\/05\/16\/attach-your-r-code-to-charts-you-tweet-for-reproducible-r-tweets\/","url_meta":{"origin":12127,"position":5},"title":"Attach Your R Code To Charts You Tweet For Reproducible R Tweets!","author":"hrbrmstr","date":"2020-05-16","format":false,"excerpt":"I caught this tweet by Terence Eden about using Twitter image alt-text to \"PGP sign\" tweet and my mind immediately went to \"how can I abuse this for covert communications, malicious command-and-control, and embedding R code in tweets?\". When you paste or upload an image to tweet (web interface, at\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/12127","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=12127"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/12127\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=12127"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=12127"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=12127"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}