Saturday, 9 March 2019

purrr magic!

Used purrr::pmap for the first time and it's brilliant! 

Run the same function multiple times changing a bunch of parameters, and combine all the results in a single data frame. All in a single call.

Keeping code and data separate is, of course, good practice. But it can be easy to slip into mixing your code up with metadata (in this case data on where to find data and what it is). pmap makes it really easy to keep metadata separate from the code, too.

In my case, I'm pulling sets of data out of spreadsheets, where the blocks start on different rows in each sheet (don't ask!). So I just create a csv file with the parameter names on the first line, and the parameter values on the second line.

sheet,startRow,book,line
1, 53,What Might,Classics
2, 70,Runagates,Classics
3, 43,Desire,Classics
4, 45,Vocations,Classics
etc.

Read this metadata into a data frame, and then run my 'readSales' loading function, is just two lines of code:

classicSets <- read.csv(file = "data/classicSets.csv", stringsAsFactors = F)
classicSales <- pmap_dfr(classicSets, readSales)

'readSales' is just a wrapper around the excellent readxl:read_xls that cleans up my data and adds some identifying columns (book, line).

And you're done.  Thanks to Hadley Wickham and Lionel Henry!

Thursday, 17 January 2019

Sustained high speeds along the Warminster Rd, Bath

In the last few posts, I was analysing number plate reader (ANPR) data from B&NES Council and BathHacked, a data activist organisation, looking at how traffic patterns might change if a clean air zone were introduced.

I've gone off on a tangent, analysing data from just two ANPRs, a mile apart on the A36 Warminster Rd, on the outskirts of Bath. They give me average speeds for some 65,000 transits over two weeks.

This is a residential 30mph zone, with a straight section, blind bend and narrow hill, though perhaps you might not think that, judging from this box plot (a smaller sample of 3 days). Each dot is a vehicle along the road. The lowest speeds are where they stop or detour (I'll put a lower limit of 5mph on to eliminate these). November 8 & 9 saw a little queueing heading into town in the morning, but these were the only cases where the speeds are not clustered around 30mph. (That's "around" - little sign of a limit in these data.)

From the data, we can split vehicles into cars, light commercial (LCVs), heavy commercial (HCVs - this is a trunk route with large lorries), public-service vehicles (PSVs) and a rare few others that we'll ignore. The scary thing is that there's no sign that trucks are travelling any more slowly than cars. Only public-service vehicles are noticeably slower, given the 3 bus stops on this section. (All days of data are included here.)

In this large survey, at peak hours there were typically 40 vehicles per hour sustaining 33mph or more over the 1 mile route. The top sustained speed was over 60mph, and one van driver managed to get 3 times into the top 20, each time doing more than 50mph.


Definitely some room for improving safety here.


Code is available on github, data from BathHacked.org.