More “Scraping Ethics Gone Awry” and “Why Do This When There’s a Free API?”

I can’t seem to free my infrequently-viewed email inbox from “you might like!” notices by the content-lock-in site Medium. This one made it to the iOS notification screen (otherwise I’d’ve been blissfully unaware of it and would have saved you the trouble of reading this).

Today, they sent me this gem by @JeromeDeveloper: Scrapy and Scrapyrt: how to create your own API from (almost) any website. Go ahead and click it. Give the Medium author the 👀 they so desperately crave (and to provide context for the rant below).

I have no issue with @JeromeDeveloper’s coding prowess, nor Scrapy/Scrapyrt. In fact, I’m a huge fan of the folks at ScrapingHub, so much so that I wrote splashr to enable use of their Splash server from R.

My issue is with the example the author chose to use.

CoinMarketCap provides cryptocurrency prices and other cryptocurrency info. I use it to track cryptocurrency prices to see which currency attackers who pwn devices to install illegal cryptocurrency rigs might be switching to next and to get a feel for when they’ll stop mining and go back to just stealing data and breaking things.

CoinMarketCap has an API with a generous free tier with the following text in their Terms & Conditions (which, in the U.S. [soon] may stupidly be explicitly repeated & required on each page that scraping is prohibited on vs a universal site link):

You may not, and shall not, copy, reproduce, download, “screen scrape”, store, transmit, broadcast, publish, modify, create a derivative work from, display, perform, distribute, redistribute, sell, license, rent, lease or otherwise use, transfer (either in printed, electronic or other format) or exploit any Content, in whole or in part, in any way that does not comply with these Terms without our prior written permission.

There is only one reason (apart from complete oblivion) to use CoinMarketCap as an example: to show folks how clever you are at bypassing site restrictions and eventually avoiding paying for an API to get data that you did absolutely nothing to help gather, curate and setup infrastructure for. There is no mention of “be sure what you are doing is legal/ethical”, just a casual caution to not abuse the Scrapyrt technology since it may get you banned.

Ethics matter across every area of “data science” (of which, scraping is one component). Just because you can do something doesn’t mean you should and just because you don’t like Terms & Conditions and want to grift the work of others for fun, profit & 👀 also doesn’t mean you should; and, it definitely doesn’t mean you should be advocating others do it as well.

Ironically, Medium itself places restrictions on what you can do:

Crawling the Services is allowed if done in accordance with the provisions of our robots.txt file, but scraping the Services is prohibited.

yet they advocated I read and heed a post which violates similar terms of another site. So I wonder how they’d feel if I did a riff of that post and showed how to setup a hackish-API to scrape all their content. O_o

Cover image from Data-Driven Security
Amazon Author Page

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.