Monday 6 August 2018

SWDChallenge - Dot Plot

Eurocontrol's CODA team publishes regular stats on airline punctuality in Europe, based on data provided directly by airlines.

Some of the graphics are showing their age, so the dot plot challenge was a good excuse to try something slightly different.

I picked the 'punctuality' graph. CODA doesn't name individual airlines, because this is about performance benchmarking, not pointing the finger. Airlines are told their own punctuality, then the graph allows them to compare against the other largest airlines. 


I wanted to achieve two things:
1) Make it easier to compare quarters (currently an airline will be in one horizontal position in one quarter, and another in the other).
2) Highlight the airlines with the most flights. Each airline's punctuality challenges are different, depending on where they fly, their fleet, whether they offer connecting flights etc. Improving punctuality provides a better service to passengers, whatever the size of airline - but larger airlines have more effect on the average performance of the whole network.

The data look something like this (with airline names anonymised..), after some ranks have been added, with a mutate_at(vars(Flt1,Punc1), funs(rank = min_rank(desc(.)))).  I fiddled with the data a bit to get some good tests (of decline, improvement, and stable), but in any case they are not final or official.


I tried two options. 

In both, I went for using triangles and tails to indicate the change since the previous quarter. But the triangles tend to point to the wrong place. Rather than work out how to nudge the triangles, I used a dot to emphasise the actual value.

The first option used size to distinguish larger airlines from smaller. I tried various options for dot, triangle size and transparency (alpha), but in the end, felt that there was just too much going on, and the big airlines did not stand out.


So I switched to using colour instead. It's still quite busy, but I think the result is better, and a step forward from the current 'legacy' graphs.


For the record, this was the code.

#calculate some values for the plot
w <- punc %>% 
  #edit some values to get good test
  mutate(Punc0 = if_else(Punc1_rank %in% c(14, 32,44), Punc1, Punc0)) %>% 
  #calculate shapes and size
  mutate(col = if_else(Flt1_rank <= bigRank, "black", "cyan3"),
         shape = case_when(
           Punc1 == Punc0 ~ 1, #circle
           Punc1 >= Punc0 ~ 24, #up arrow
           TRUE           ~ 25))

#where to label
annoX <- min(w$Punc1)

ggplot(w) + 
  #last quarter
  geom_segment(aes(x = Punc1_rank, y=Punc0, xend= Punc1_rank, yend=Punc1, colour = col), alpha=0.5)+ 
  #this quarter - direction triangles
  geom_point(aes(x = Punc1_rank, y=Punc1, shape = shape, colour = col), size = 3) +
  #this quarter - points
  geom_point(aes(x = Punc1_rank, y=Punc1, colour = col), size = 0.7) +
  annotate("text", x = 1, y = annoX, size = 2.5, 
           hjust = 0, vjust = 0,
           label = paste("Black = Top 10 airlines by flights in",q1,
                          "\nArrow tail = punctuality in",q0)) +
  scale_shape_identity() + #use the value directly as a shape
  scale_colour_identity() + #use the value directly as a size
  labs(x="Rank of airline by punctuality (best to worst). Top 50 airlines by flights are shown.",
       y=paste("Arrival Punctuality in",q1,"(Delay < 15 minutes)")) +
  scale_x_continuous(breaks = c(1, seq(5, maxRank, by = 5)),
                     minor_breaks = NULL) +
  scale_y_continuous(labels= scales::percent) +
  theme_minimal()


No comments:

Post a Comment