Thoughts on Data Science, IT Operations Analytics, programming and other random topics


Example use of partially applied functions in Scala

15 Mar 2015

A colleague asked me the other day about 'partially applied functions' in Scala. He knew basically what they were but was struggling to understand where they might be used. There are many great pages out there that describe this concept in detail, and yet, I remember too when I was first trying to get my head around them, I'd seen a number of similar comments on the various sites ie 'I get what the are, but why would I ever use them?'. I just happened to have a handy example of its use which I was working on, which I'll share.

First of all, I like to keep this image on my wall as a gentle reminder of the essential (practical) distinction between partially applied and curried functions.

It just keeps me straight on the differences and what happens when I supply less than the usual number of arguments. In the Partially Applied case, I get a new function back which has the argument I supplied (p1) fixed, leaving me to supply the other two arguments at a later time. The curried function gives me a function back, that returns a function of a function etc. This aspect is covered well enough on Stack Trace etc.

I distilled out the essence of a common usage pattern that I seem to use partially applied functions for, below, in Spark/Scala. The pattern is one where I fix one or more of a functions arguments earlier in the program. Using partially applied functions like this enables my code to be more compact and at the same time, enhance readability, by highlighting the essential transformations occuring and removing non-essential variables (at least at that point in the program) info at that stage.

My very simple example is one where I am reading log data (encapsulated in JSON messages) and have to parse that data in different ways depending on the nature/format of the data. In this simplified example, assume the only difference in processing is in how the timestamps are handled. The processing I want to do following the basic flow


  // ** Potental parsers
  // parseFormatX(logRecord:String,
  //   DateTimeFormatter.ofPattern("yyyy-MM-dd'T'hh:mm:ss.SSSX"))
  // parseFormatY(logRecord:String,
  //   DateTimeFormatter.ofPattern("dd-MM-dd-yyyy hh:mm:ss"))
  //
  // These parsers will do all kinds of format specific formatting,
  // beyond just differences in timestamp formatting

  // assume that by here we've chosen a parser function,

  val parsedLogs = sc.textFile("logFile")
   .map(logRecord =>
     parseFormatX(logRecord,
                  DateTimeFormatter.ofPattern("yyyy-MM-dd'T'hh:mm:ss.SSSX")))

The above ingests the file, creates the RDD, and applies the parseFormatX to parse each log according to the expected format. An RDD with appropriately parsed Log data is returned. If we had other variables to supply as part of parsing, the flow would get fairly texty quickly enough, detracting from highlighting the essential transformations. As often with Scala, we can clean up things and make it much more compact and readable.

First, create a partially applied function, parseLog providing the chosen parser


  // again, going with parseFormatX, create a partially applied function parseLog
  val parseLog =
    parseFormatX(_:String,DateTimeFormatter.ofPattern("yyyy-MM-dd'T'hh:mm:ss.SSSX"))

  // Now we can use directly with the map function
  val parsedLogs = sc.textFile("logFile")
    .map(parseLog)

By 'eliminating' (or fixing) some of the variables, the code gets much cleaner, and the essential transformation becomes explicit. In my own case, there were further transformations required e.g.


val parsedLogs = sc.textFile("logFile")
    .map(parseLog)
    .map(computeResults)
    .map(outputFormatter)

Again, the essential, high-level transformation is clear, but in each case, behind the scenes, the partially applied functions have fixed many degrees of freedom.

comments powered by Disqus