Saturday 9 March 2019

purrr magic!

Used purrr::pmap for the first time and it's brilliant! 

Run the same function multiple times changing a bunch of parameters, and combine all the results in a single data frame. All in a single call.

Keeping code and data separate is, of course, good practice. But it can be easy to slip into mixing your code up with metadata (in this case data on where to find data and what it is). pmap makes it really easy to keep metadata separate from the code, too.

In my case, I'm pulling sets of data out of spreadsheets, where the blocks start on different rows in each sheet (don't ask!). So I just create a csv file with the parameter names on the first line, and the parameter values on the second line.

sheet,startRow,book,line
1, 53,What Might,Classics
2, 70,Runagates,Classics
3, 43,Desire,Classics
4, 45,Vocations,Classics
etc.

Read this metadata into a data frame, and then run my 'readSales' loading function, is just two lines of code:

classicSets <- read.csv(file = "data/classicSets.csv", stringsAsFactors = F)
classicSales <- pmap_dfr(classicSets, readSales)

'readSales' is just a wrapper around the excellent readxl:read_xls that cleans up my data and adds some identifying columns (book, line).

And you're done.  Thanks to Hadley Wickham and Lionel Henry!

No comments:

Post a Comment