When Homoglyphs Attack! Generating Phishing Domain Names with R

It’s likely you’ve seen the news regarding yet-another researcher showing off a phishing domain attack. The technique is pretty simple:

find a target domain you want to emulate
register a homoglpyh version of it
use ~~the hacker’s favorite tool,~~ Let’s Encrypt to serve it up with a nice, shiny green lock icon
deploy some content
phish someone
Profit!

The phishing works since International Domain Names have been “a thing” for a while (anything for the registrars to make more money) and Let’s Encrypt provides a domain-laundering service for these attackers. But, why should attackers have all the fun! Let’s make some domain homoglyphs in R.

Have Glyph, Will Hack

Rob Dawson has a spiffy homoglyph generator and even has a huge glyph-alike file, but we don’t need the full list to don the hacker cap for this exercise. I’ve made a stripped-down version of it that has (mostly) glyphs that should display correctly in “western” locales. You can pull the full list and tweak the example to broaden the attack capabilities. Let’s take a look:

library(stringi)
library(urltools)
library(purrr)

URL <- "https://rud.is/dl/homoglyphs.txt" # trimmed down from https://github.com/codebox/homoglyph
fil <- basename(URL)
invisible(try(httr::GET(URL, httr::write_disk(fil)), silent = TRUE))

chars <- stri_read_lines(fil)
idx_char <- stri_sub(chars, 1,1)
stri_sub(chars, 1, 1) <-  ""
chars <- set_names(chars, idx_char)

tail(chars)
##                                         u 
##          "ʋυцս\u1d1cｕ??????????????????" 
##                                         v 
##        "νѵט\u1d20ⅴ∨⋁ｖ??????????????????" 
##                                         w 
##                                      "ｗ" 
##                                         x 
##                "×хᕁᕽ᙮ⅹ⤫⤬⨯ｘ?????????????" 
##                                         y 
## "ɣʏγуүყ\u1d8c\u1effℽｙ??????????????????" 
##                                         z 
##                   "\u1d22ｚ?????????????"

What we did there was to read in the homoglpyh lines and create a lookup table for Latin characters. Now we need a transformation function.

to_homoglyph <- function(domain) {

  suf <- suffix_extract(domain)
  domain <- stri_replace_last_fixed(domain, sprintf(".%s", suf$suffix[1]), "")

  domain_split <- stri_split_boundaries(domain, type="character")[[1]]

  map_chr(domain_split, ~{
    found <-  chars[.x]
    pos <- sample(stri_count_boundaries(found, type="character"), 1)
    stri_sub(found, pos, pos)
  }) %>%
    c(".", suf$suffix[1]) %>%
    stri_join(collapse="")

}

The basic idea is to:

carve out the domain suffix (we need to ensure valid TLDs/suffixes are used in the final domain)
split the input domain into separate characters
select a homoglyph of the character at random
join the separate glpyhs and the TLD/suffix back together.

We can try it out with a very familiar domain:

(converted <- to_homoglyph("google.com"))
## [1] "ƍ၀໐?|?.com"

Now, that’s using all possible homoglyphs and it might not look like google.com to you, but imagine whittling down the list to ones that are really close to Latin character set matches. Or, imagine you’re in a hurry and see that version of Google’s URL with a shiny, green lock icon from Let’s Encrypt. You might not really give it a second thought if the page looked fine (or were on a mobile browser without a location bar showing).

What’s the solution?

Firefox has a configuration setting to turn these IDNs into punycode in the location bar. What does that mean? We can use the urltools::puny_encode() function to find out:

puny_encode("ƍ၀໐?|?.com")
## [1] "xn--|-npa992hbmb6w79iesa.com"

Most folks will be much less likely to trust that domain name (if they bother looking in the location bar). Note that it will still have the “everything’s ?” green Let’s Encrypt lock icon, but you shouldn’t be trusting SSL/TLS anymore for integrity or authenticity anyway.

Chrome Canary (super early bird alpha versions) expands IDNs to punycode by default today and a shorter-cycle release to stable channel is forthcoming. I’m told Edge does somewhat sane things with IDNs and if Safari doesn’t presently handle them Apple will likely release an interstitial security update to handle it.

FIN

See if you can generate some fun look-alike’s, such as ???????.com and drop some latte change to register an IDN and add a free hacking certificate to it to see just how easy this entire process is. Note that attackers are automating this process, so they may have beat you to your favorite homoglyph IDN.

If you’re on Chrome, give the Punycode Alert extension a go if you’d like some extra notification/protection from these domains.

NOTE: to_homoglyph() is not vectorised (it’s an exercise left to the reader).

4 Trackbacks/Pingbacks

By When Homoglyphs Attack! Generating Phishing Domain Names with R – Cyber Security on 18 Apr 2017 at 3:59 am

[…] It’s likely you’ve seen the news regarding yet-another researcher showing off a phishing domain attack. The technique is pretty simple: find a target domain you want to emulate register a homoglpyh version of it use the hacker’s favorite tool, Let’s Encrypt to serve it up with a nice, shiny green lock icon deploy some content… Continue reading → […]
By When Homoglyphs Attack! Generating Phishing Domain Names with R | A bunch of data on 18 Apr 2017 at 5:13 am

[…] article was first published on R – rud.is, and kindly contributed to […]
By When Homoglyphs Attack! Generating Phishing Domain Names with R – Mubashir Qasim on 18 Apr 2017 at 6:17 am

[…] article was first published on R – rud.is, and kindly contributed to […]
By Keeping Users Safe While Collecting Data | rud.is on 13 Jun 2017 at 3:48 pm

[…] in the coffin with their Certs Gone Wild! initiative. With super-recent browser updates you can almost trust your eyes again when it comes to URLs, but you should be very wary of entering your info — especially […]

rud.is