Monday 31 October 2016

Cycling into the wind

So now I've merged the wind data that I was using with the thorn plots and the cycling data on the Tableau public dashboard. It uses an average wind speed and direction based on the two metar observations nearest to the mid-time of the ride. 

I wondered how to show wind direction, but in the end a simple grouping by quadrant was easy to do and good enough to separate the data into meaningful groups.

The effect of windspeed on the way home, when the wind is in the SW is pretty clear, I think.

I just wish I could get a sensible data between start&end calculated field in the title. Could not get this to work in Tableau.



Sunday 9 October 2016

Routes of A350 and B787 - updated visualisation

The latest version of the routes map has been updated to use great circle links between airports. There seem to be 2 options for this - getting Tableau or R to do the work. Getting R to do the work was easier for me (just an application of gcIntermediate()) than getting to grips with interpolation functions in Tableau. The cost is that you pass a lot of extra data. I suppose I could have used 2 files and done a join in Tableau instead, but the volumes are not huge. 

There are tableau workbooks around to do the work, but from tableau public it doesn't seem to be possible to open and copy the calculations as suggested.


Saturday 8 October 2016

Tableau visualisation of routes by new aircraft types: B787 and A350

The previous post gave a glimpse of the technical, R, side of processing some data compiled by spotters of where the new aircraft, B787 and A350, are flying. The R took a while to get right, but once the data were compiled, getting them into Tableau was a breeze.

There's a beta version here. Beta, mostly in the sense that there are still 1 or 2 glitches in the geo-coding of the data (so in R), but I thought it might be easier to see them given a Tableau map. I like the way the graph on the left acts as a filter for the map - for example comparing the routes of ANA and Japan Airlines.

And beta too, probably because the viz could also be improved. Comments welcome.

 

Interpreting loosely formatted aviation text

This is a short, technical account of using the NLP package of R to interpret some 'loosely-formatted' text about services offered by airlines. In the text, airports are sometimes mentioned by IATA 3-letter code, sometimes by a text name, and not always the same name.

It took a while to get my head around how NLP (and OpenNLP) work. The manual entries are accurate, but that's with hindsight. So I hope the following helps.

I built a natural entity labeller, but first had to work out how the entity annotator syntax works. This is an example of perhaps the simplest possible one, that labels everything as the entity 'all'.

identity_tokenizer <- function(s) {
  #log all words together as one single entity, type 'all'
  #the extent is measured here in words, not characters
  Annotation(1, "all", 1L, length(s))
}

identity_entity_annotator <- Simple_Entity_Annotator(identity_tokenizer)


For each line of text, eg "Delhi DEL – HKG Hong Kong – KIX Osaka Kansai (Commenced)" I used the sledgehammer approach of OpenNLP

AnnSentWord <- list(Maxent_Sent_Token_Annotator(),

                           Maxent_Word_Token_Annotator())
el <- NLP::annotate(as.String(r), AnnSentWord)

        
to get r annotated as sentence and words. Then I invoked my new annotator, which looks for airports (3-letter IATA codes, or 4-letter ICAO codes, or 1-n word names) and labels these as 'airport', with the ICAO code as a 'feature'.

el2 <- NLP::annotate(as.String(r), AP_entity_annotator, el)

        
For some reason, adding this to 'pipe' of sentence and word annotation always failed. The annotator itself is fairly mechanical, using lookup environments (see list2env()) since the guidance on-line is that these are relatively fast - and in my own tests they were about 30 times faster than simple subsetting of dataframes.

The answers are deterministic, not probabilistic, but are based on a likelihood order: 4-letter and 3-letter upper case matches are unlikely by chance; longer name matches 'Santiago de Chile' take precedence over shorter ones 'Santiago'; airports with more traffic are more likely than quieter ones. On average, I had about 4 names per airport.

It's not the most sophisticated application of NLP, but you have to start somewhere!



Sunday 25 September 2016

Thorn Plot, for visualising head winds, final version

Continuing the short series on plotting headwinds, perhaps you prefer the 'thorn plot' version, in which the relative frequency of wind from a particular direction is represented both by transparency of colour and also by the width of the triangle.


For this version, I finally remembered to normalise the number of observations between locations. There are more records for some airports than others, but this is irrelevant for the graph, so I rescale.


Saturday 24 September 2016

Final daisy plot for visualising head wind when cycling.

Got there in the end. I know he guidance is that colour is not the best way to show variation, but this 'daisy plot' works for me, and is less untidy than the 'thorn' variation.

The curved line comes from those explicitly drawn by ggplot in the geom_poly statement, so repeating the start point of the poly gave me a balanced curve on both sides (which gives a clue as to how to get straight lines, but I'm sticking with this version).

Quite a lot of ggplot in the end:

ggplot(y8p , aes(x, y, group = windHead, alpha = numObs)) + 
  scale_alpha(range = c(0.3, 1), breaks = seq(0, 24000, by = 4000)) + 
  geom_polygon(fill = "blue") +
  xlab("Angle = Wind heading (deg) ") + 
  ylab("Mean wind speed (km/h)") +
  labs(alpha = "Observations") +
  coord_polar(start = -pi/8) + 
  scale_y_reverse(limits = c(maxKMH, 0)) + 
  scale_x_continuous(limits = c(-22.5,337.5), breaks = NULL) +
  geom_label(aes(x=0, y = maxKMH, label = station), colour = "black",
             show.legend = FALSE) +
  #theme(legend.position = c(1,0), legend.justification = c(1,0)) + 
  theme(strip.background = element_blank(),
        strip.text.x = element_blank()) +
  facet_wrap( ~ station)

For which, the y8p data looks like this. Note that I've created 4 lines for each real line of data - and x and y give the 4 points used in the polygon for each real point of data.

  station windHead   windKPH numObs     x         y
    <chr>   <fctr>     <dbl>  <int> <dbl>     <dbl>
1    EBBR        0  9.167982   7190   0.0  9.167982
2    EBBR        0  9.167982   7190  22.5  0.000000
3    EBBR        0  9.167982   7190 -22.5  0.000000
4    EBBR        0  9.167982   7190   0.0  9.167982
5    EBBR       45 10.177946   7358  45.0 10.177946
6    EBBR       45 10.177946   7358  67.5  0.000000

More Pies, Roses and Thorns

Continuing the hunt for ways to show headwind, rather prettier are these version, using an explicit geom_polygon to construct triangles.

In the first version, I'm using both transparency and thickness to code the number of observations (frequency that the wind was in this direction). I think this looks like thorns. Haven't worked out why the axis transform should put the twist on the triangles. Suggestions welcome!

In the second version, I've stuck to fixed width. This is more like the diaphragm of a camera, or a daisy.

Thursday 22 September 2016

Pies and Roses

On a slight tangent from my usual data, I always have the impression that there is more of a headwind when I'm cycling home, than when I go to work. Is it true?

Happily someone has kindly archived weather observations from the nearby Brussels airport, so I've been looking at the last 5 years' or so.

The question is, how to show it? For an R user, this is the fairly obvious answer - using coord_polar() in the lovely ggplot. To a trained eye, perhaps it's clear that going to work (heading North-East), typical winds are low whereas in the reverse direction the average wind is much stronger (the upper teens). 

But to me, the impact is back to front; it feels like it's easier, somehow, to cycle south-west. And there's no indication of how frequently the wind is in each direction.

So I want to reverse the scale. Now the length of the shape gives the strength of the wind, with the baseline on the outside, and the width of the base of the shape gives the frequency of the wind being in this direction.

Ok, so it looks like a messed up version of the dreaded pie chart, but is the impact of the message clearer?



Tuesday 2 August 2016

Eurostat air passenger data updated again

I've been having issues loading the Eurostat data with the R package RJSDMX - always timing out at work, and a different configuration error at home. 

In the meantime, the R package 'eurostat' has come along, so I've converted to using this to exploit Eurostat's newish bulk download interface. (Again time-out problems trying to use anything like a filter on this - suggestions are welcome. Could be linked to firewall issues at work but no time to investigate this.)

Anyway, the pax data are now updated (using avia_par_xx). 2015 would be complete, but Poland is missing Q4. 




Sunday 3 July 2016

Tableau URLs for aviation stats.

The Pax (& PaxTextVn), dashboards now work with URLs to pass parameters, but it has been a dull process. Basically, in spite of what I read on forums and Tableau help, it does not seem to be possible to pass a space as %20. After many different attempts, I've just had to remove spaces from some data fields, and some parameter and field names.

So now, you can access the data directly with a URL such as:

https://public.tableau.com/profile/david.marsh#!/vizhome/Pax/LoadFactors&Year=2014&GeoDepLevel=Region&GeoDepLevelSelected=EU28&GeoDesLevel=Region

or 

https://public.tableau.com/profile/david.marsh#!/vizhome/Pax/DeparturePatterns&Year=2012

In the latter case, it's not yet simple to work out how to pass the airport parameter. Watch this space for improvements here.


Thursday 5 May 2016

Air passenger data updated

I've also updated the passenger data. Again, all states seem to have provided data up to at least June 2015. See the link above.

For those of you using SDMX, there are more code changes to download these data:
  • PASS_BRD_DEP has become PAS_BRD_DEP
  • PASS_ST_DEP has become ST_PAS_DEP
  • PASS_CAF_DEP has become CAF_PAS_DEP

Cargo updated

It's been a while since I updated the cargo dashboard in Tableau.
Now it has the latest 2015 data. These are complete for 31 states from Jan 2008 to June 2015. There is already data for 20 states in December 2015, but we'll have to wait for the full 2015 data.

See the link at the top of the page for access to Tableau.

If any of you are using SDMX to access these Eurostat aviation data (avia_gor_xx), note that they've changed a code for one of the dimensions: from FRM_CAF_DEP to CAF_FRM_DEP. Obvious! Apart from that the update process was smooth (thanks to the RJSDMX package maintainers in R, and all the other R and Tableau folk).


Monday 28 March 2016

Tableau Hex Map for Europe

Solved the problem I was having with different parts of the dashboard having different colour schemes - after filtering.  I had been assigning colours per worksheet - you need to reset these if you've done it and go back to the original field in the data and set the colour there. (Hint was from the kind answer to this question.)

Obvious, really.

I also managed to get rid of the persistent axis. Right clicking in the row or column shelf and choosing format, click on the 'line' option (a brush) in the format and here was a case where it had not already been set to 'none' (in spite of trying!).


Sunday 27 March 2016

Tile-based maps and Service Unit Forecasts

I have been experimenting with tile-based maps in Tableau.

The use-case is showing growth of total en route service units (actual and forecast), and using the map as a way to select a subset of a large table. The data are from the EUROCONTROL/STATFOR forecast (Feb 16, Annex 5).

The place to start for hints on how to do this in Tableau is Sir Vizalot, though some of the links in this seem to be broken.

My specific steps were:
1) I'm interested in a specific set of European States. I used 'keynote' (powerpoint) to create hex tiles and move them around until I had an initial layout of European countries I thought was reasonable.
2) I converted this into a table of coordinates, using the ISO 2-letter country codes (and using IA and IC for Azores and Canaries, respectively).
3) I pulled these data into Tableau, including a translation table from ISO to EUROCONTROL's ICAO-based State codes. Tableau does a nice job of choosing the right joins.
4) Setting up the hex map follows smoothly using Sir Vizalot's advice, though it took me a while to realise I shouldn't start worrying about size & shape of the 'map' until it was embedded in a dashboard. And the alignment of the labels defaults to off the hex - fixed using the label:alignment option.

A couple of things I'm not yet happy with: the colour scale for the growth legend doesn't seem fully coordinated between the map and bubble plot; and I'm left with an axis which should have been turned off, but is still there. Hints welcome!

Here's what it looks like:



and here's a hex you can load into your Tableau.





Sunday 21 February 2016

Handling odd data

In the avia_par_xx data from Eurostat there are a number of odd values, particularly from France (missing seat counts in some years) or Greece (fewer seats than passengers, which sounds uncomfortable!).

I would like to follow Eurostat's lead and not just throw such observations in the datasets away - they are official, after all. So these numbers are included in total seat and total flight counts, but I've had another try at making the load factor and seats/flight calculations more robust to such issues.

Just a reminder that I use the load factor formula that is not weighted by distance (passengers/seats, not RPK/ASK).

Now I've tried to trap the odd data values, and set them to NULL in the numerator, and 1 in the denominator, so that they have little effect when aggregated into region averages, for example. 

But you'll still see France and Greece graphs doing some odd things.

Sunday 24 January 2016

Flow-control in the flight data!

In tableau, I've not got the functionality out of geographical hierarchies that I was expecting (maybe I'm not doing it right). So instead, I've built a set of prompts that now allow you to look at flows of flights at a variety of different levels of aggregation such as airport-airport, airport-region, airport-country, or country-country.

Not as difficult as I thought it would be, but hat-tip to vizpainter for putting me on the right track.

So check out the new tab in the 'pax' dashboard, that focuses on load factors and aircraft size!

European Flight Data updated

The 'pax' dashboard of European flight data from Eurostat has been updated with 2015 - not all available yet, and I had some difficult loading EL, FR and NO, so these countries only go to 2014.


More energy

I've updated the energy dashboard and it shows clearly the benefits of the new boiler, but also the mild Winter we've had so far. Looking forward to the bill this year!