Skip navigation

My {cdcfluview} package started tossing erros on CRAN just over a week ago when the CDC added an extra parameter to one of the hidden API endpoints that the package wraps. After a fairly hectic set of days since said NOTE came, I had time this morning to poke at a fix. There are alot of tests, so after successful debugging session I was awaiting CRAN checks on various remotes as well as README builds and figured I’d keep up some practice with another, nascent, package of mine, {swiftr}, which makes it dead simple to build R functions from Swift code, in similar fashion to what Rcpp::cppFunction() does for C/C++ code.

macOS comes with a full set of machine learning/AI libraries/frameworks that definitely have “batteries included” (i.e. you can almost just make one function call to get 90-95% what you want without even training new models). One of which is text extraction from Apple’s computer Vision framework. I thought it’d be a fun and quick “wait mode” distraction to wrap the VNRecognizeTextRequest() function and use it from R.

To show how capable the default model is, I pulled a semi-complex random image from DDG’s image search:

Yellow street signs against clear blue sky pointing different directions. Each plate on the street sign has a specific term like unsure, muddled, coonfused and so on. Dilemma and confusion concept. horizontal composition with copy space. Clipping path is included.

Let’s build the function (you need to be on macOS for this; exposition inine):

library(swiftr) # github.com/hrbrmstr/swiftr

swift_function(
  code = '
import Foundation
import CoreImage
import Cocoa
import Vision

@_cdecl ("detect_text")
public func detect_text(path: SEXP) -> SEXP {

   // turn R string into Swift String so we can use it
   let fileName = String(cString: R_CHAR(STRING_ELT(path, 0)))

   var res: String = ""
   var out: SEXP = R_NilValue

  // get image into the right format
  if let ciImage = CIImage(contentsOf: URL(fileURLWithPath:fileName)) {

    let context = CIContext(options: nil)
    if let img = context.createCGImage(ciImage, from: ciImage.extent) {

      // setup comptuer vision request
      let requestHandler = VNImageRequestHandler(cgImage: img)

      // start recognition
      let request = VNRecognizeTextRequest()
      do {
        try requestHandler.perform([request])

        // if we have results
        if let observations = request.results as? [VNRecognizedTextObservation] {

          // paste them together
          let recognizedStrings = observations.compactMap { observation in
            observation.topCandidates(1).first?.string
          }
          res = recognizedStrings.joined(separator: "\\n")
        }
      } catch {
        debugPrint("\\(error)")
      }
    }
  }

  res.withCString { cstr in out = Rf_mkString(cstr) }

  return(out)
}
')

The detect_text() is now available in R, so let’s see how it performs on that image of signs:

detect_text(path.expand("~/Data/signs.jpeg")) %>% 
  stringi::stri_split_lines() %>% 
  unlist()
##  [1] "BEWILDERED" "UNCLEAR"    "nAZEU"      "UNCERTAIN"  "VISA"       "INSURE"    
##  [7] "ATED"       "MUDDLED"    "LOsT"       "DISTRACTED" "PERPLEXED"  "CONFUSED"  
## [13] "PUZZLED" 

It works super-fast and gets far more correct than I would have expected.

Toy examples aside, it also works pretty well (as one would expect) on “real” text images, such as this example from the Tesseract test suite:

tesseract project newspaper clipping example text image

detect_text(path.expand("~/Data/tesseract/news.3B/0/8200_006.3B.tif")) %>% 
  stringi::stri_split_lines() %>% 
  unlist()
##  [1] "Tobacco chiefs still refuse to see the truth abou"                           
##  [2] "even of America's least conscionable"                                        
##  [3] "The tobacco industry would like to promote"                                  
##  [4] "men sat together in Washington last"                                         
##  [5] "under the conditions they are used.'"                                        
##  [6] "week to do what they do best: blow"                                          
##  [7] "the specter of prohibition."                                                 
##  [8] "panel\" of toxicologists as \"not hazardous"                                 
##  [9] "smoke at the truth about cigarettes."                                        
## [10] "'If cigarettes are too dangerous to be sold,"                                
## [11] "then ban them. Some smokers will obey the"                                   
## [12] "People not paid by the tobacco companies"                                    
## [13] "aren't so sure. The list includes several"                                   
## [14] "The CEOs of the nation's largest tobacco"                                    
## [15] "firms told congressional panel that nicotine"                                
## [16] "law, but many will not. People will be selling"                              
## [17] "iS not addictive, that they are unconvinced"                                 
## [18] "cigarettes out of the trunks of cars, cigarettes"                            
## [19] "substances the government does not allow in"                                 
## [20] "foods or classifies as potentially toxic. They"                              
## [21] "that smoking causes lung cancer or any other"                                
## [22] "made by who knows who, made of who knows include ammonia, a pesticide called"
## [23] "illness, and that smoking is no more harmful"                                
## [24] "what,\" said James Johnston of R.J. Reynolds."                               
## [25] "than drinking coffee or eating Twinkies."                                    
## [26] "It's a ruse. He knows cigarettes are not"                                    
## [27] "methoprene, and ethyl furoate, which has"                                    
## [28] "They said these things with straight taces."                                 
## [29] "going to be banned, at leasi not in his lifetime."                           
## [30] "caused liver damage in rats."                                                
## [31] "The list \"begs a number of important"                                       
## [32] "They said them in the face of massive"                                       
## [33] "STEVE WILSON"                                                                
## [34] "What he really fears are new taxes, stronger"                                
## [35] "questions about the safety of these additives,\""                            
## [36] "scientific evidence that smoking is responsible"                             
## [37] "anti-smoking campaigns, further smoking"                                     
## [38] "said a joint statement from the American"                                    
## [39] "for more than 400,000 deaths every year."                                    
## [40] "restrictions, limits on secondhand smoke and"                                
## [41] "Rep. Henry Waxman, D-Calif., put that"                                       
## [42] "Republic Columnist"                                                          
## [43] "Lung, Cancer and Heart associations. The"                                    
## [44] "limits on tar and nicotine."                                                 
## [45] "statement added that substances safe to eat"                                 
## [46] "frightful statistic another way:"                                            
## [47] "Collectively, these steps can accelerate the"                                
## [48] "\"Imagine our nation's outrage if two fully"                                 
## [49] "He and the others played dumb for the"                                       
## [50] "current 5 percent annual decline in cigarette"                               
## [51] "aren't necessarily safe to inhale."                                          
## [52] "The 50-page list can be obtained free by"                                    
## [53] "loaded jumbo jets crashed each day, killing all"                             
## [54] "entire six hours, but really didn't matter."                                 
## [55] "use and turn the tobacco business from highly"                               
## [56] "calling 1-800-852-8749."                                                     
## [57] "aboard. That's the same number of Americans"                                 
## [58] "The game i nearly over, and the tobacco"                                     
## [59] "profitable to depressed."                                                    
## [60] "Johnson's comment about cigarettes \"made"                                   
## [61] "Here are just the 44 ingredients that start"                                 
## [62] "that cigarettes kill every 24 hours.'"                                       
## [63] "executives know it."                                                         
## [64] "with the letter \"A\":"                                                      
## [65] "The CEOs were not impressed."                                                
## [66] "The hearing marked a turning point in the"                                   
## [67] "of who knows what\" was comical."                                            
## [68] "Acetanisole, acetic acid, acetoin,"                                          
## [69] "\"We have looked at the data."                                               
## [70] "It does"                                                                     
## [71] "nation's growing aversion to cigarettes. No"                                 
## [72] "The day before the hearing, the tobacco"                                     
## [73] "acetophenone,6-acetoxydihydrotheaspirane,"                                   
## [74] "not convince me that smoking causes death,\""                                
## [75] "2-acetyl-3-ethylpyrazine, 2-acetyl-5-"                                       
## [76] "said Andrew Tisch of the Lorillard Tobacco"                                  
## [77] "longer hamstrung by tobacco-state seniority"                                 
## [78] "companies released a long-secret list of 599"                                
## [79] "Co."                                                                         
## [80] "and the deep-pocketed tobacco lobby,"                                        
## [81] "methylfuran, acetylpyrazine, 2-acetylpyridine,"                              
## [82] "Congress is taking aim at cigarette makers."                                 
## [83] "additives used in cigarettes. The companies"                                 
## [84] "said all are certified by an \"independent"                                  
## [85] "3-acetylpyridine, 2-acetylthiazole, aconitic"   

(You can compare that on your own with the Tesseract results.)

FIN

{cdcfluview} checks are done, and the fixed functions are back on CRAN! Just in time to close out this post.

If you’re on macOS, definitely check out the various ML/AI frameworks Apple has to offer via Swift and have some fun playing with integrating them into R (or build some small, command line utilities if you want to keep Swift and R apart).

One Trackback/Pingback

  1. […] article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.