Sunday 21 February 2016

Handling odd data

In the avia_par_xx data from Eurostat there are a number of odd values, particularly from France (missing seat counts in some years) or Greece (fewer seats than passengers, which sounds uncomfortable!).

I would like to follow Eurostat's lead and not just throw such observations in the datasets away - they are official, after all. So these numbers are included in total seat and total flight counts, but I've had another try at making the load factor and seats/flight calculations more robust to such issues.

Just a reminder that I use the load factor formula that is not weighted by distance (passengers/seats, not RPK/ASK).

Now I've tried to trap the odd data values, and set them to NULL in the numerator, and 1 in the denominator, so that they have little effect when aggregated into region averages, for example. 

But you'll still see France and Greece graphs doing some odd things.