More sites are turning to services like Cloudflare due to just how stupid-easy it is to DDoS — perform a (possibly Distributed) Denial of Service attack on — a site. Sometimes the DDoS is intentional (malicious). Sometimes it’s because your bot didn’t play nice (stop that, btw). Sadly, at some point, most of us with “vital” sites are going to have to pay protection money to one of these services unless law enforcement or ISPs do a better job stopping DDoS (killing the plethora of pwnd IoT devices that make up one of the largest for-rent DDoS services out there would be a good start).
Soapbox aside, sites like this one — https://www.bitmarket.pl/docs.php?file=api_public.html — (which was giving an SO poster trouble) have DDoS protection enabled but they also want you to be able to automate the downloads (this one even calls it an “API”). However, try to grab one of the files there with your browser and you’ll likely see a Cloudflare interstitial page which eventually gets you the data.
Try the same thing with download.file()
or httr::GET()
and you’ll run into trouble since neither of those two functions have a way to perform the javascript challenge execution which ultimately is posted (well, GETted in this case) to a checker endpoint which eventually redirects to the original URL with enough ??? to ensure you won’t be bothered again.
Cloudflare has captcha and other types of interstitials, but if you happen on the 503+javascript challenge one, have I got a package for you! Meet: cfhttr
?.
The singular function (for now) — cf_GET()
— does the following:
- Makes an
httr::GET()
call with the initial URL - Checks to ensure it’s both on Cloudflare and is using the javascript challenge protection scheme
- Slices the javascript and tweaks it enough to enable running it in
V8
- Retrieves the challenge computation from
V8
- Posts (well,
httr::GET()
s it since that’s what Cloudflare expects) the challenge form with the properReferer
header and hopefully passes the test so you get your content.
devtools::install_github("hrbrmstr/cfhttr")
library(cfhttr)
res <- cf_GET("https://www.bitmarket.pl/graphs/BTCPLN/90m.json")
## Waiting 5 seconds...
str(httr::content(res, as="parsed"))
## List of 90
## $ :List of 6
## ..$ time : int 1512908160
## ..$ open : chr "48000.00000000"
## ..$ high : chr "48100.00000000"
## ..$ low : chr "48000.00000000"
## ..$ close: chr "48100.00000000"
## ..$ vol : chr "0.00124821"
## $ :List of 6
## ..$ time : int 1512908220
## ..$ open : chr "48100.00000000"
## ..$ high : chr "48100.00000000"
## ..$ low : chr "48100.00000000"
## ..$ close: chr "48100.00000000"
## ..$ vol : chr "0.00000000"
## $ :List of 6
## ..$ time : int 1512908280
## ..$ open : chr "48100.00000000"
## ..$ high : chr "48100.00000000"
## ..$ low : chr "48100.00000000"
## ..$ close: chr "48100.00000000"
## ..$ vol : chr "0.00000000"
## ...
FIN
If you end up using this in workflows and run into a problem, it likely means that Cloudflare changed the challenge code page. Please file an issue so I can update the code.
2 Comments
I am sorry to bother you, but could you answer a simple question? What does DDoS stand for? I am used to a strict rule to identify the meaning of every acronym the first time it is introduced. I find the rule very helpful, but you can make your own decisions.
I generally try to do that as well and have updated the post to include a brief definition (which is “perform a (possibly Distributed) Denial of Service attack on — a site”) and linked to the wikipedia page on it. I was far too lazy and def appreciate the follow-up question.