Graphic of the Week Minimize
An irregular (probably not weekly) series of hopefully illuminating graphics relating to all things parkrun.

 


#14 previous topics revisited


 

#14 previous topics revisited


I have been revisiting some old Graphics of the Week, and I am considering introducing the following graphs onto the live parkrun.com results pages - I need to do some work before that's possible, so in the mean time these are just snapshots.

The following graphs represent every single run from the last week.

The first is a scatter plot of runner ages by their run times - this produces a lovely 'exploding bathtub plot' - it's intriguing to note that in engineering, bathtub plots have been used to explain or describe reliability of machines after manufacture. The implications of this are depressing as I'm definitely on the upward sweep of the curve...

The second and third graphs extract out the best runs by age group and the average (mean) times by age group from the first graph. The shape of these curves correlate very closely to the ideal curve used to generate the WAVA age grading tables.

And the fourth graph represents a plot of age against age grade.

Times

Times

 

Times

Times



Wed 02/22/2012

 
RATING:   COMMENTS (0)  
#13 The Growth of parkrun - #2 revisited


 

#13 The Growth of parkrun - #2 revisited


I have been playing with Google's excellent Chart API and have redone my second ever Crispy Graphic of the week, but this time the underlying data is live and the graph is interactive (try clicking on the links top left, resizing or scrolling the bottom overview graph or hovering over the graph.

Try the experimental chart here.

Note: it currently only shows regular UK events run on a Saturday (i.e. it excludes overseas runs as well as special events such as non-Saturday Christmas or New Year runs).



Wed 02/08/2012

 
RATING:   COMMENTS (0)  
#12 Comparing Events - Revisited


 

#12 Comparing Events - Revisited


Way back in Graphic of the Week #5 I looked at a way to work out how 'fast' one parkrun is compared to another. In this edition I revisit the topic and hopefully come up with a better set of conversion factors.

First a recap

What do I mean by 'conversion factor'? - a conversion factor enables us to compare a run time at one event with a run at another. Runners often talk about how event A is quicker than event B - this could be because of relative hilliness, technicality (i.e. lots of twists and turns), surface quality (e.g. tarmac, grass or gravel), or, dare I whisper this, small differences in distance.  The conversion factor is a way to objectively quantify the combined differences. Given a run time at event A we would multiply it by the A-to-B conversion factor to give us an equivalent run time at event B.

How were the conversion factors previously calculated? - For each pair of events (obviously I need to calculate one conversion factor for each pair of events), I first got a list of all the runners who have run at both events and worked out their best times at each event; each of these runners provide one data pair, one time is divided by the other to produce that runner's own personal conversion factor for the two events, and then, I averaged the conversion factors for each of the runners to give the overall conversion factor.

But... The main problem with this method is that it doesn't adequately handle the concept of 'form'. A runners best time at event A might have been six years ago, when the runner was on absolutely tip-top form, but the same runners best at event B may only have been recently when coming back from injury; in other words we're not necessarily comparing like with like - ideally we should really only compare times if the runner is on the same form.

New and improved?

To overcome the problem mentioned above I decided to only create data pairs for runners who run at the two different events on consecutive weeks - the assumption being that it is much more likely that a runner is on the same form on consecutive weeks, and so we would be more likely to be comparing like with like. This has the downside that it eliminates many potential data pairs from our analysis because they happened at completely different times.  But on the other hand it adds (a few) compensating extra data pairs - any particular runner could theoretically alternate between event A and event B on a week by week basis contributing one additional data pair for each run into the analysis (previously they only contributed the one data pair based on their best performance).

Show me the data! Sorry, it's a bit big so I've prepared a PDF available here. The most obvious question will be "why are there so many gaps in the table?"  The answer is that I have only included conversion factors for event pairs that I am reasonably confident stand a good chance of indicating at least whether one run is faster than another.  I have set the minimum number of data pairs for a conversion factor to be included on the table at 100 (i.e. there are at least 100 instances a specific runner running at both events on two successive weeks).  To be really confident, I would want to compare 400 data pairs, and the gold standard is 1000 data pairs.  So far, Richmond Park and Bushy Park are the only two events to have enough runners attending both events on successive weeks to meet this standard.  Dark green indicates more than a thousand data pairs, light green indicates more than 400 (i.e. the high confidence conversion factors) at the other end of the confidence spectrum, light red indicates fewer than 150 and dark red indicates fewer than 120. Plain white represents all the moderate confidence factors with between 150 and 400 data pairs.  Empty cells indicate that not enough (if any) runners have attended both events to be at all confident about the results, and if your event is missing that indicates that not enough of your local runners have got out and about to any other runs (or had visitors from elsewhere) in significant numbers yet - at least not on successive weeks.

How to use the table - At a basic level you can tell whether one event is faster than another. The percentages indicate how much faster or slower the event at the top (FROM) is compared with the event at the left (TO).  If  the value is 100% then the two events are exactly the same speed.  If the value is greater than 100% then the FROM event is faster than the TO event, and if less than 100% it's slower.

Let's take an example - say you've just run at Strathclyde parkrun in 25:37 and want to know what the equivalent performance would be at Glasgow parkrun.  It's easier if we convert the time to seconds: 25:37 is 1537 seconds. Next look up Strathclyde along the top of the table under FROM, find Glasgow at the right hand side next to TO (we're converting from Strathclyde to Glasgow) and where the Strathclyde column and Glasgow row meet is the figure 102.8% - we multiply the 1537 by this conversion factor to give 1580 seconds or  26:20. In other words on the same form you would expect to take an extra 43 seconds to run Glasgow parkrun (faster runners would have fewer seconds difference, slower runners more - you need to run the calculation for each run time).  The pale green of the box means that at least 400 data pairs contributed to the calculation of the conversion factor and so we can be pretty confident about the conversion.

We could also use the factor to compare records at different events. The Strathclyde record currently stands at 14:51 whereas Glasgow stands at 15:01; however after applying the conversion factor we see that the Strathclyde record would be equivalent to 15:15 at Glasgow, or doing the opposite calculation (from Glasgow to Strathclyde the conversion factor is 97.3%), we can see that the Glasgow record would be equivalent to 14:36 at Strathclyde.

Unleash the beautiful fish

And finally, I know how much Graphic of the Week readers look forward to a proper graph, and this one's a corker even if I do say so myself.

Scatter plot of all calculated conversion factors against the number of data pairs that contributed to the calculation of the conversion factor.  The data set contains both conversion factors for each event pair (i.e A to B and B to A); The Y axis is logarithmic (base 2). The two points at the extreme right represent the Bushy to Richmond / Richmond to Bushy  high confidence conversion factors.

I'll leave you to work out what this graph is saying, but hopefully you'll see why I set the threshold for inclusion in the table at 100 data pairs.  And finally, if you are not already familiar with Ben Goldacre's work, may I recommend his column in The Guardian every Saturday.  This week's column is pertinent to this analysis - even though it's on a very different topic (see The Bad Science website).



Tue 11/01/2011

 
RATING:   COMMENTS (3)  
#11 Gender Balance in Registrations and Runs


 

#11 Gender Balance in Registrations and Runs


I was curious to know what the gender differences between parkrunners were and came up with the following graph - it's interesting to see that while registrations are broadly similar between men and women, there is a stark difference when it comes to the total number of runs: While women account for 46.5% of all registrations, they account for only 33.7% of all runs.


Graph showing the total number of registering males and female registrations each month (thin blue and red lines) and the number of runs by male and female parkrunners each month (thick blue and red lines).



Wed 08/03/2011

 
RATING:   COMMENTS (0)  
#10 Multiple Runs - Multiple Locations


 

#10 Multiple Runs - Multiple Locations


Lots of parkrunners have run lots of times at one event, and lots of runners have run at lots of different events, but there's another sub-species of parkrunner who like to run lots of times at lots of different events. As you can see in the following graph 55 parkrunners have run at 4 different locations on at least 4 occasions each.  But, 1 parkrunner has run at least 5 times at 7 different parkruns.


Graph showing number of runners who have run at least 4 times at 4, 5, 6 or 7 different parkrun locations.
Runners are not counted multiple times in each vertical slice (this explains the apparent tangle between the 5 and 6 location ribbons).
The graph was made from a data snapshot on 15th June 2011.



Wed 06/15/2011

 
RATING:   COMMENTS (0)  
< BACK  1 of 3  NEXT >