

{"id":2717,"date":"2013-09-25T11:57:16","date_gmt":"2013-09-25T16:57:16","guid":{"rendered":"http:\/\/rud.is\/b\/?p=2717"},"modified":"2013-09-25T11:57:16","modified_gmt":"2013-09-25T16:57:16","slug":"scraping-content-from-google-groups","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/","title":{"rendered":"Scraping Content From Google Groups"},"content":{"rendered":"<p>I was helping a friend out who wanted to build a word cloud from the text in Google Groups posts. If you&#8217;ve made any efforts to try to get content out of Google Groups you know that the only way to do so is to ensure you subscribe to the group posts via e-mail, then extract all those messages. If you don&#8217;t e-mail subscribe to a group, there really is no way to create an archive of the content.<\/p>\n<p>After hacking around a bit and failing, I pulled up the mobile version of the group. You can do that for any Google Group by using the following URL and filling in <code>GROUPNAME<\/code> for the group you&#8217;re interested in: <code>https:\/\/groups.google.com\/forum\/m\/#!topic\/<b>GROUPNAME<\/b><\/code>.<\/p>\n<p><center><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"2721\" data-permalink=\"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/input_text_not_updated_-_google_groups\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/input_text_not_updated_-_Google_Groups.png?fit=670%2C297&amp;ssl=1\" data-orig-size=\"670,297\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"input_text_not_updated_-_Google_Groups\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/input_text_not_updated_-_Google_Groups.png?fit=510%2C225&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/input_text_not_updated_-_Google_Groups.png?resize=510%2C225&#038;ssl=1\" alt=\"input_text_not_updated_-_Google_Groups\" width=\"510\" height=\"225\" class=\"aligncenter size-large wp-image-2721\" srcset=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/input_text_not_updated_-_Google_Groups.png?resize=530%2C234&amp;ssl=1 530w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/input_text_not_updated_-_Google_Groups.png?resize=150%2C66&amp;ssl=1 150w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/input_text_not_updated_-_Google_Groups.png?resize=300%2C132&amp;ssl=1 300w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/input_text_not_updated_-_Google_Groups.png?resize=535%2C237&amp;ssl=1 535w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/input_text_not_updated_-_Google_Groups.png?w=670&amp;ssl=1 670w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><\/center><\/p>\n<p>Then, you&#8217;ll need to navigate to a thread, use the double-down arrow <img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/arrows2.png?resize=17%2C18&#038;ssl=1\" width=\"17\" height=\"18\" \/> to expand all the items in the thread, open up the JavaScript inspector on one of the posts and look for <code>&lt;div dir=\"ltr\"&gt;<\/code>. If that surrounds the post, the following hack will work. Google only seems to add this left-to-right attribute on newer groups, so if you have an older group you need to work with, you&#8217;ll need to figure out a different selector (which is coming up in a bit).<\/p>\n<p>With all of the posts expanded, paste the following code into the JavaScript console:<\/p>\n<pre lang=\"javascript\">nl = document.querySelectorAll('[dir=ltr]');\r\ns=\"\" ; \r\nfor (i=0; i<nl.length; i++) {\r\n  s = s + nl[i].textContent + \"<br\/><br\/>\";\r\n}; \r\nnw = window.open(); \r\nnd = nw.document; \r\nnd.write(s); \r\nnd.close()<\/pre>\n<p>and hit return (I have it spaced out in the code above just for clarity; it will all fit on one line which makes it easier to execute in the console).<\/p>\n<p><center><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"2722\" data-permalink=\"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/untitled_and_input_text_not_updated_-_google_groups\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&amp;ssl=1\" data-orig-size=\"535,356\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"Untitled_and_input_text_not_updated_-_Google_Groups\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=510%2C339&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?resize=510%2C339&#038;ssl=1\" alt=\"Untitled_and_input_text_not_updated_-_Google_Groups\" width=\"510\" height=\"339\" class=\"aligncenter size-large wp-image-2722\" srcset=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?resize=530%2C352&amp;ssl=1 530w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?resize=150%2C99&amp;ssl=1 150w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?resize=300%2C199&amp;ssl=1 300w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?w=535&amp;ssl=1 535w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><\/center><\/p>\n<p>You should get a new browser window (so, you may need to temporarily enable popups on Google Groups for this to work) with the text of all the posts in it. I only put the double <code>&lt;br\/&gt;<\/code> tags in there for the purposes of this example. I just needed the raw text, but you can mark the posts any way you&#8217;d like.<\/p>\n<p>You can tweak this hack in many ways to pull as much post metadata as you need since it&#8217;s all wrapped in heavily marked &lt;div&gt;s and the base technique should work in a GreaseMonkey or TamperMonkey userscript for those of you with time to code one up.<\/p>\n<p>This hack only lessens the tedium a small amount. You still need to go topic by topic in the group if you want all the content. There&#8217;s probably a way to get that navigation automation coded into the script as well. Thankfully, I didn&#8217;t need to do that this time around.<\/p>\n<p>If you have other ways to free Google Groups content, drop a note in the comments.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was helping a friend out who wanted to build a word cloud from the text in Google Groups posts. If you&#8217;ve made any efforts to try to get content out of Google Groups you know that the only way to do so is to ensure you subscribe to the group posts via e-mail, then [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2722,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[63,704,15],"tags":[],"class_list":["post-2717","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-development","category-hacks","category-javascript"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Scraping Content From Google Groups - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Scraping Content From Google Groups - rud.is\" \/>\n<meta property=\"og:description\" content=\"I was helping a friend out who wanted to build a word cloud from the text in Google Groups posts. If you&#8217;ve made any efforts to try to get content out of Google Groups you know that the only way to do so is to ensure you subscribe to the group posts via e-mail, then [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2013-09-25T16:57:16+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"535\" \/>\n\t<meta property=\"og:image:height\" content=\"356\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Scraping Content From Google Groups\",\"datePublished\":\"2013-09-25T16:57:16+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/\"},\"wordCount\":435,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2013\\\/09\\\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1\",\"articleSection\":[\"Development\",\"hacks\",\"Javascript\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/\",\"name\":\"Scraping Content From Google Groups - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2013\\\/09\\\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1\",\"datePublished\":\"2013-09-25T16:57:16+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/#primaryimage\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2013\\\/09\\\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2013\\\/09\\\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1\",\"width\":535,\"height\":356},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2013\\\/09\\\/25\\\/scraping-content-from-google-groups\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Scraping Content From Google Groups\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Scraping Content From Google Groups - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/","og_locale":"en_US","og_type":"article","og_title":"Scraping Content From Google Groups - rud.is","og_description":"I was helping a friend out who wanted to build a word cloud from the text in Google Groups posts. If you&#8217;ve made any efforts to try to get content out of Google Groups you know that the only way to do so is to ensure you subscribe to the group posts via e-mail, then [&hellip;]","og_url":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/","og_site_name":"rud.is","article_published_time":"2013-09-25T16:57:16+00:00","og_image":[{"width":535,"height":356,"url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1","type":"image\/png"}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Scraping Content From Google Groups","datePublished":"2013-09-25T16:57:16+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/"},"wordCount":435,"commentCount":1,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1","articleSection":["Development","hacks","Javascript"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/","url":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/","name":"Scraping Content From Google Groups - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1","datePublished":"2013-09-25T16:57:16+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1","width":535,"height":356},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Scraping Content From Google Groups"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p23idr-HP","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":400,"url":"https:\/\/rud.is\/b\/2011\/03\/24\/repairing-strict-transport-security-in-chrome-on-os-x\/","url_meta":{"origin":2717,"position":0},"title":"&#8220;Repairing&#8221; Strict Transport Security in Chrome on OS X","author":"hrbrmstr","date":"2011-03-24","format":false,"excerpt":"One of my subdomains is for mail and I was using an easy DNS hack to point it to my hosted Gmail setup (just create a CNAME pointing to ghs.google.com). This stopped working for some folks this week and I've had no time to debug exactly why so I decided\u2026","rel":"","context":"In &quot;Certificates&quot;","block_context":{"text":"Certificates","link":"https:\/\/rud.is\/b\/category\/certificates\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":10104,"url":"https:\/\/rud.is\/b\/2018\/04\/18\/access-your-saved-for-later-feedly-items-by-hooking-up-dropbox-to-feedly\/","url_meta":{"origin":2717,"position":1},"title":"Access Your &#8220;Saved for Later&#8221; Feedly Items By Hooking Up Dropbox to Feedly","author":"hrbrmstr","date":"2018-04-18","format":false,"excerpt":"If you come here often you've noticed that I've been writing a semi-frequent series on using the Feedly API with R. A recent post was created to help someone use the API. It worked for them but \u2014 as you can see in the comment \u2014 an assertion was made\u2026","rel":"","context":"In &quot;data wrangling&quot;","block_context":{"text":"data wrangling","link":"https:\/\/rud.is\/b\/category\/data-wrangling\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2871,"url":"https:\/\/rud.is\/b\/2014\/01\/06\/announcing-the-launch-of-the-data-driven-security-blogpodcast\/","url_meta":{"origin":2717,"position":2},"title":"Announcing The Launch Of The Data Driven Security [Blog|Podcast]","author":"hrbrmstr","date":"2014-01-06","format":false,"excerpt":"While you're waiting for the [book](http:\/\/amzn.to\/ddsec) by @jayjacobs & @hrbrmstr to hit the shelves, why not head on over to the inaugural post of the [Data Driven Security Blog](http:\/\/datadrivensecurity.info\/blog) & give a listen to the first episode of the [Data Driven Security Podcast](http:\/\/datadrivensecurity.info\/podcast). The Data Driven Security Blog aspires to\u2026","rel":"","context":"In &quot;Data Analysis&quot;","block_context":{"text":"Data Analysis","link":"https:\/\/rud.is\/b\/category\/data-analysis-2\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2123,"url":"https:\/\/rud.is\/b\/2013\/02\/20\/rnetintel-cross-check-apt-1s-ip-list-with-alienvault-reputation-db-some-graphsanalysis\/","url_meta":{"origin":2717,"position":3},"title":"R\/netintel : Cross-check APT-1&#8217;s IP list with AlienVault Reputation DB (+ some graphs\/analysis)","author":"hrbrmstr","date":"2013-02-20","format":false,"excerpt":"Here's a quick example of couple additional ways to use the netintel R package I've been tinkering with. This could easily be done on the command line with other tools, but if you're already doing scripting\/analysis with R, this provides a quick way to tell if a list of IPs\u2026","rel":"","context":"In &quot;Charts &amp; Graphs&quot;","block_context":{"text":"Charts &amp; Graphs","link":"https:\/\/rud.is\/b\/category\/charts-graphs\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4886,"url":"https:\/\/rud.is\/b\/2017\/01\/16\/the-devils-in-the-davos-details-a-quick-look-at-this-years-wef-global-risks-report\/","url_meta":{"origin":2717,"position":4},"title":"The Devil&#8217;s in the [Davos] Details \u2014 A quick look at this year&#8217;s WEF Global Risks Report","author":"hrbrmstr","date":"2017-01-16","format":false,"excerpt":"It's Davos time again. Each year the World Economic Forum (WEF) gathers the global elite together to discuss how they're going to shape our collective future. WEF also releases their annual Global Risks Report at the same time. I read it every year and have, in the past, borrowed some\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/01\/Cursor_and___Development_devils_in_the_davos_-_RStudio-4.png?fit=1200%2C536&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/01\/Cursor_and___Development_devils_in_the_davos_-_RStudio-4.png?fit=1200%2C536&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/01\/Cursor_and___Development_devils_in_the_davos_-_RStudio-4.png?fit=1200%2C536&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/01\/Cursor_and___Development_devils_in_the_davos_-_RStudio-4.png?fit=1200%2C536&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/01\/Cursor_and___Development_devils_in_the_davos_-_RStudio-4.png?fit=1200%2C536&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":2902,"url":"https:\/\/rud.is\/b\/2014\/02\/11\/live-google-spreadsheet-for-keeping-track-of-sochi-medals\/","url_meta":{"origin":2717,"position":5},"title":"Live Google Spreadsheet For Keeping Track Of Sochi Medals","author":"hrbrmstr","date":"2014-02-11","format":false,"excerpt":"The \"medals\" R post by [TRInker](http:\/\/trinkerrstuff.wordpress.com\/2014\/02\/09\/sochi-olympic-medals-2\/) and re-blogged by [Revolutions](http:\/\/blog.revolutionanalytics.com\/2014\/02\/winter-olympic-medal-standings-presented-by-r.html) were both spiffy and a live example why there's no point in not publishing raw data. You don't need to have R (or any other language) do the scraping, though. The \"`IMPORTHTML`\" function (yes, function names seem to be ALL\u2026","rel":"","context":"In &quot;Google Docs&quot;","block_context":{"text":"Google Docs","link":"https:\/\/rud.is\/b\/category\/google-docs\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/2717","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=2717"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/2717\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media\/2722"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=2717"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=2717"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=2717"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}