Showing posts with label Eurostat. Show all posts
Showing posts with label Eurostat. Show all posts

Saturday, 2 September 2017

Passenger update, finally

It seems it's always October before the last calendar year is complete. 2017 seems no exception. 

I have updated the passenger data from Eurostat in the dashboard. Congratulations to the Netherlands for already providing data as far as April 2017. Six laggards are still short of data in 2016. October, maybe.


Saturday, 8 October 2016

Interpreting loosely formatted aviation text

This is a short, technical account of using the NLP package of R to interpret some 'loosely-formatted' text about services offered by airlines. In the text, airports are sometimes mentioned by IATA 3-letter code, sometimes by a text name, and not always the same name.

It took a while to get my head around how NLP (and OpenNLP) work. The manual entries are accurate, but that's with hindsight. So I hope the following helps.

I built a natural entity labeller, but first had to work out how the entity annotator syntax works. This is an example of perhaps the simplest possible one, that labels everything as the entity 'all'.

identity_tokenizer <- function(s) {
  #log all words together as one single entity, type 'all'
  #the extent is measured here in words, not characters
  Annotation(1, "all", 1L, length(s))
}

identity_entity_annotator <- Simple_Entity_Annotator(identity_tokenizer)


For each line of text, eg "Delhi DEL – HKG Hong Kong – KIX Osaka Kansai (Commenced)" I used the sledgehammer approach of OpenNLP

AnnSentWord <- list(Maxent_Sent_Token_Annotator(),

                           Maxent_Word_Token_Annotator())
el <- NLP::annotate(as.String(r), AnnSentWord)

        
to get r annotated as sentence and words. Then I invoked my new annotator, which looks for airports (3-letter IATA codes, or 4-letter ICAO codes, or 1-n word names) and labels these as 'airport', with the ICAO code as a 'feature'.

el2 <- NLP::annotate(as.String(r), AP_entity_annotator, el)

        
For some reason, adding this to 'pipe' of sentence and word annotation always failed. The annotator itself is fairly mechanical, using lookup environments (see list2env()) since the guidance on-line is that these are relatively fast - and in my own tests they were about 30 times faster than simple subsetting of dataframes.

The answers are deterministic, not probabilistic, but are based on a likelihood order: 4-letter and 3-letter upper case matches are unlikely by chance; longer name matches 'Santiago de Chile' take precedence over shorter ones 'Santiago'; airports with more traffic are more likely than quieter ones. On average, I had about 4 names per airport.

It's not the most sophisticated application of NLP, but you have to start somewhere!



Tuesday, 2 August 2016

Eurostat air passenger data updated again

I've been having issues loading the Eurostat data with the R package RJSDMX - always timing out at work, and a different configuration error at home. 

In the meantime, the R package 'eurostat' has come along, so I've converted to using this to exploit Eurostat's newish bulk download interface. (Again time-out problems trying to use anything like a filter on this - suggestions are welcome. Could be linked to firewall issues at work but no time to investigate this.)

Anyway, the pax data are now updated (using avia_par_xx). 2015 would be complete, but Poland is missing Q4. 




Sunday, 3 July 2016

Tableau URLs for aviation stats.

The Pax (& PaxTextVn), dashboards now work with URLs to pass parameters, but it has been a dull process. Basically, in spite of what I read on forums and Tableau help, it does not seem to be possible to pass a space as %20. After many different attempts, I've just had to remove spaces from some data fields, and some parameter and field names.

So now, you can access the data directly with a URL such as:

https://public.tableau.com/profile/david.marsh#!/vizhome/Pax/LoadFactors&Year=2014&GeoDepLevel=Region&GeoDepLevelSelected=EU28&GeoDesLevel=Region

or 

https://public.tableau.com/profile/david.marsh#!/vizhome/Pax/DeparturePatterns&Year=2012

In the latter case, it's not yet simple to work out how to pass the airport parameter. Watch this space for improvements here.


Thursday, 5 May 2016

Air passenger data updated

I've also updated the passenger data. Again, all states seem to have provided data up to at least June 2015. See the link above.

For those of you using SDMX, there are more code changes to download these data:
  • PASS_BRD_DEP has become PAS_BRD_DEP
  • PASS_ST_DEP has become ST_PAS_DEP
  • PASS_CAF_DEP has become CAF_PAS_DEP

Cargo updated

It's been a while since I updated the cargo dashboard in Tableau.
Now it has the latest 2015 data. These are complete for 31 states from Jan 2008 to June 2015. There is already data for 20 states in December 2015, but we'll have to wait for the full 2015 data.

See the link at the top of the page for access to Tableau.

If any of you are using SDMX to access these Eurostat aviation data (avia_gor_xx), note that they've changed a code for one of the dimensions: from FRM_CAF_DEP to CAF_FRM_DEP. Obvious! Apart from that the update process was smooth (thanks to the RJSDMX package maintainers in R, and all the other R and Tableau folk).


Sunday, 21 February 2016

Handling odd data

In the avia_par_xx data from Eurostat there are a number of odd values, particularly from France (missing seat counts in some years) or Greece (fewer seats than passengers, which sounds uncomfortable!).

I would like to follow Eurostat's lead and not just throw such observations in the datasets away - they are official, after all. So these numbers are included in total seat and total flight counts, but I've had another try at making the load factor and seats/flight calculations more robust to such issues.

Just a reminder that I use the load factor formula that is not weighted by distance (passengers/seats, not RPK/ASK).

Now I've tried to trap the odd data values, and set them to NULL in the numerator, and 1 in the denominator, so that they have little effect when aggregated into region averages, for example. 

But you'll still see France and Greece graphs doing some odd things.

Sunday, 24 January 2016

Flow-control in the flight data!

In tableau, I've not got the functionality out of geographical hierarchies that I was expecting (maybe I'm not doing it right). So instead, I've built a set of prompts that now allow you to look at flows of flights at a variety of different levels of aggregation such as airport-airport, airport-region, airport-country, or country-country.

Not as difficult as I thought it would be, but hat-tip to vizpainter for putting me on the right track.

So check out the new tab in the 'pax' dashboard, that focuses on load factors and aircraft size!

European Flight Data updated

The 'pax' dashboard of European flight data from Eurostat has been updated with 2015 - not all available yet, and I had some difficult loading EL, FR and NO, so these countries only go to 2014.


Saturday, 24 October 2015

Improved Passenger Dashboard

To celebrate the completion of the 2014 data from Eurostat, I've reorganised the passenger dashboard so departures from Europe (EU28/EFTA) and arrivals outside Europe are available in a single tab. (Previously they were in 2 separate tabs.)

So if you select Copenhagen, say, you'll see all the flights departing Copenhagen (with passenger and flight counts, seat counts and load factors). If you select JFK, say, you'll see all arrivals at JFK from European airports.

Unfortunately, this data set does not include Turkey as a departure point, so for Turkish airports you can only see arrivals from EU28/EFTA airports.

(I plan to make the same simplification for cargo too, in future.)


Sunday, 4 October 2015

2014 data are complete, finally!

The passenger and cargo data from Eurostat have finally been updated, so I've updated them in the main dashboards. 2014 data are complete, but 2015 is still very patchy so it has not been loaded at all.

I have to say, refreshing the data in Tableau is a doddle!



Saturday, 14 March 2015

European airports flight departures - 2014 updated

The passenger data in the tableau dashboard has been updated with the latest data from Eurostat (avia_par_xx).

and for cargo too - see the cargo dashboard.

Sunday, 18 January 2015

European Cargo airports (2003-2013)

Finally, I have extended the data that's available in the cargo dashboard to cover 2003-2013, where underlying monthly data are available from Eurostat.

Check it out!

Seems to be some missing data for Sweden 2005-2007 - the data availability tab gives a clear account of which years are available for which States.




Sunday, 7 December 2014

European Cargo airports - new dashboard. (2010-2012)

Turned out to be quite easy to adapt the passenger dashboard to do the same for the freight & mail data from Eurostat (avia_gor_xx). So see the link to the new dashboard above.

Flights: are commercial freight & mail flight departures (FRM_CAF_DEP in Eurostat terms)
Tonnes: are tonnes of freight & mail on board on departure (including direct transit cargo that stayed on the aircraft - FRM_BRD_DEP)

Friday, 5 December 2014

European airport departures - 2013 & 2014 data updated.

A delay in new posts, while I work on extracting a new dataset and turning it into some useful dashboards. Watch this space.

But meanwhile, the good news is that Eurostat have published updated data for 2013 and 2014, sorting out some earlier problems and adding more months. Check out the data in the 'production version' - link is at the top of the page: I've renamed the two dashboards into something more meaningful (I hope):

  • Departure Patterns shows departures from the selected EU28 airport in terms of destinations, load factors, distances, aircraft sizes, pax, flights.
  • Departure Points lets you select any airport (including those outside Europe) and see details of where flights leave EU28 airports to reach that point.
I've also re-size so it fits on the ipad better - but discovered Tableau menus work better in Safari than Chrome on the ipad.

Tuesday, 11 November 2014

From EU to you: new flight dashboard

I've added a second dashboard 'DepFromAP' that looks at the Eurostat data. 

For destination airports (intended to be outside the EU, but I think you can still select inside) you can see which are the main EU28 departure points, the nature of the traffic (passengers, flights, load factors, aircraft size) and how these rankings have changed with time.

For example, you can see how in 2014 YTD, both Gatwick and Manchester have overtaken Frankfurt ranked in terms of passenger departures to Dubai.

The new dashboard is in the development and full versions (still with a caveat on 2013 & 2014 data). See links at the top of the page.

I did work out that there's an easier way to pick the top N for the seasonality graph - but I'll explain that in a later post.


Saturday, 8 November 2014

European flight data dashboard - updated to 2014

The main dashboard is now loaded with all of the data to 2014 - as much as Eurostat currently has.

Check the 'DataAvailability' tab for a rough indication of what's available.

Beware, the 2013-2014 numbers look to be twice the size of the 2012; Eurostat are looking into this as a potential upload problem (probably safe just to halve the values, for a rough estimate - and it means load factors & seats/flight are probably still correct).

I'll freeze APFlows now and work on a different view.


Thursday, 6 November 2014

Eurostat - air passenger flow data completeness

Check out the latest version of the full dashboard - now with bump chart to show main destinations varying over time.

To get a sense of the completeness of the avia_par_xx data series that I've loaded from Eurostat, here's a quick chart of the counts of pax from the data that are loaded into the dashboard.

Shows quickly that it's Greece (EL) and France (EF) where delivery seems to have stopped after 2010 - and France missed data in 2004-2006. But other States are pretty complete, even if some joined the process only recently.

The issue with French data seems to be that they reported pax on board and commercial pax flights in 2005, but not departing seats - I'll need to work out how to patch up the gap and check if it's the same issue with the Greek data.






Saturday, 1 November 2014

European airport flows - now complete to 2012

So the APFlows dashboard is now complete, with all the data to 2012 at least. There do seem to be some gaps when monthly data has not been provided by some States (Greece, France for some years). There was a problem earlier with the 2013 data, so I'll come back to this at a future date, after I've linked APComp to a more complete set of data.

There are two versions of the workbook 'Pax' has the full data in - 'Pax Dev' uses only a subset, so if you find only a few airports, you should check.

Here's the Pax. (The seasonality doesn't look as if it's airport of departure specific - I'll check).


[For links to dashboard - production and development versions - see top of page.]

European Airports dashboard - total versus flows

Done some more analysis of the data, and its clear that the thresholds that apply when States send data to Eurostat mean that adding up the flows doesn't give very helpful totals. Which is a shame.

So with the avia_par_xx datasets, focus on showing the flows, like this. Will come back and derive the totals from another dataset.

[For links to dashboard - production and development versions - see top of page.]