Graphic of the Week Minimize
An irregular (probably not weekly) series of hopefully illuminating graphics relating to all things parkrun.

 


#12 Comparing Events - Revisited


 

#12 Comparing Events - Revisited


Way back in Graphic of the Week #5 I looked at a way to work out how 'fast' one parkrun is compared to another. In this edition I revisit the topic and hopefully come up with a better set of conversion factors.

First a recap

What do I mean by 'conversion factor'? - a conversion factor enables us to compare a run time at one event with a run at another. Runners often talk about how event A is quicker than event B - this could be because of relative hilliness, technicality (i.e. lots of twists and turns), surface quality (e.g. tarmac, grass or gravel), or, dare I whisper this, small differences in distance.  The conversion factor is a way to objectively quantify the combined differences. Given a run time at event A we would multiply it by the A-to-B conversion factor to give us an equivalent run time at event B.

How were the conversion factors previously calculated? - For each pair of events (obviously I need to calculate one conversion factor for each pair of events), I first got a list of all the runners who have run at both events and worked out their best times at each event; each of these runners provide one data pair, one time is divided by the other to produce that runner's own personal conversion factor for the two events, and then, I averaged the conversion factors for each of the runners to give the overall conversion factor.

But... The main problem with this method is that it doesn't adequately handle the concept of 'form'. A runners best time at event A might have been six years ago, when the runner was on absolutely tip-top form, but the same runners best at event B may only have been recently when coming back from injury; in other words we're not necessarily comparing like with like - ideally we should really only compare times if the runner is on the same form.

New and improved?

To overcome the problem mentioned above I decided to only create data pairs for runners who run at the two different events on consecutive weeks - the assumption being that it is much more likely that a runner is on the same form on consecutive weeks, and so we would be more likely to be comparing like with like. This has the downside that it eliminates many potential data pairs from our analysis because they happened at completely different times.  But on the other hand it adds (a few) compensating extra data pairs - any particular runner could theoretically alternate between event A and event B on a week by week basis contributing one additional data pair for each run into the analysis (previously they only contributed the one data pair based on their best performance).

Show me the data! Sorry, it's a bit big so I've prepared a PDF available here. The most obvious question will be "why are there so many gaps in the table?"  The answer is that I have only included conversion factors for event pairs that I am reasonably confident stand a good chance of indicating at least whether one run is faster than another.  I have set the minimum number of data pairs for a conversion factor to be included on the table at 100 (i.e. there are at least 100 instances a specific runner running at both events on two successive weeks).  To be really confident, I would want to compare 400 data pairs, and the gold standard is 1000 data pairs.  So far, Richmond Park and Bushy Park are the only two events to have enough runners attending both events on successive weeks to meet this standard.  Dark green indicates more than a thousand data pairs, light green indicates more than 400 (i.e. the high confidence conversion factors) at the other end of the confidence spectrum, light red indicates fewer than 150 and dark red indicates fewer than 120. Plain white represents all the moderate confidence factors with between 150 and 400 data pairs.  Empty cells indicate that not enough (if any) runners have attended both events to be at all confident about the results, and if your event is missing that indicates that not enough of your local runners have got out and about to any other runs (or had visitors from elsewhere) in significant numbers yet - at least not on successive weeks.

How to use the table - At a basic level you can tell whether one event is faster than another. The percentages indicate how much faster or slower the event at the top (FROM) is compared with the event at the left (TO).  If  the value is 100% then the two events are exactly the same speed.  If the value is greater than 100% then the FROM event is faster than the TO event, and if less than 100% it's slower.

Let's take an example - say you've just run at Strathclyde parkrun in 25:37 and want to know what the equivalent performance would be at Glasgow parkrun.  It's easier if we convert the time to seconds: 25:37 is 1537 seconds. Next look up Strathclyde along the top of the table under FROM, find Glasgow at the right hand side next to TO (we're converting from Strathclyde to Glasgow) and where the Strathclyde column and Glasgow row meet is the figure 102.8% - we multiply the 1537 by this conversion factor to give 1580 seconds or  26:20. In other words on the same form you would expect to take an extra 43 seconds to run Glasgow parkrun (faster runners would have fewer seconds difference, slower runners more - you need to run the calculation for each run time).  The pale green of the box means that at least 400 data pairs contributed to the calculation of the conversion factor and so we can be pretty confident about the conversion.

We could also use the factor to compare records at different events. The Strathclyde record currently stands at 14:51 whereas Glasgow stands at 15:01; however after applying the conversion factor we see that the Strathclyde record would be equivalent to 15:15 at Glasgow, or doing the opposite calculation (from Glasgow to Strathclyde the conversion factor is 97.3%), we can see that the Glasgow record would be equivalent to 14:36 at Strathclyde.

Unleash the beautiful fish

And finally, I know how much Graphic of the Week readers look forward to a proper graph, and this one's a corker even if I do say so myself.

Scatter plot of all calculated conversion factors against the number of data pairs that contributed to the calculation of the conversion factor.  The data set contains both conversion factors for each event pair (i.e A to B and B to A); The Y axis is logarithmic (base 2). The two points at the extreme right represent the Bushy to Richmond / Richmond to Bushy  high confidence conversion factors.

I'll leave you to work out what this graph is saying, but hopefully you'll see why I set the threshold for inclusion in the table at 100 data pairs.  And finally, if you are not already familiar with Ben Goldacre's work, may I recommend his column in The Guardian every Saturday.  This week's column is pertinent to this analysis - even though it's on a very different topic (see The Bad Science website).



Tue 11/01/2011

 
RATING:   COMMENTS (3)  
#11 Gender Balance in Registrations and Runs


 

#11 Gender Balance in Registrations and Runs


I was curious to know what the gender differences between parkrunners were and came up with the following graph - it's interesting to see that while registrations are broadly similar between men and women, there is a stark difference when it comes to the total number of runs: While women account for 46.5% of all registrations, they account for only 33.7% of all runs.


Graph showing the total number of registering males and female registrations each month (thin blue and red lines) and the number of runs by male and female parkrunners each month (thick blue and red lines).



Wed 08/03/2011

 
RATING:   COMMENTS (0)  
#10 Multiple Runs - Multiple Locations


 

#10 Multiple Runs - Multiple Locations


Lots of parkrunners have run lots of times at one event, and lots of runners have run at lots of different events, but there's another sub-species of parkrunner who like to run lots of times at lots of different events. As you can see in the following graph 55 parkrunners have run at 4 different locations on at least 4 occasions each.  But, 1 parkrunner has run at least 5 times at 7 different parkruns.


Graph showing number of runners who have run at least 4 times at 4, 5, 6 or 7 different parkrun locations.
Runners are not counted multiple times in each vertical slice (this explains the apparent tangle between the 5 and 6 location ribbons).
The graph was made from a data snapshot on 15th June 2011.



Wed 06/15/2011

 
RATING:   COMMENTS (0)  
#9 Number of Female Victories


 

#9 Number of Female Victories


Overall victories by women at parkrun events
(up to and including events on Saturday 19th february 2011)

Athlete Event Number of Victories
Alexandra COOK Swindon parkrun 1
Aly DIXON Middlesbrough Albert parkrun 1
Aly DIXON Sunderland parkrun 1
Amy CHALK Swindon parkrun 1
Angela HIBBS Newcastle parkrun 1
Ann NIXON Forest of Dean parkrun 3
Ashley FINAUGHTY Rolf Valley parkrun 1
Bente CHRISTENSEN Nibe parkrun 3
Carolyn SUMMERSGILL Middlesbrough Albert parkrun 4
Claire HALLISSEY Cardiff parkrun 1
Clare ELMS Bromley parkrun 4
Eleanor MATTHEWS Gorleston Cliffs parkrun 2
Emma RAVEN Braunstone parkrun 1
Emma WILSON Richmond parkrun 1
Gladys CHEMWENO Bushy parkrun 1
Holly GILBERT Hackney Marshes parkrun 2
Jo EMERY Coventry parkrun 1
Julia BLEASDALE Bushy parkrun 1
Justina HESLOP Newcastle parkrun 1
Katie CLARK Hull parkrun 1
Katy MOORE Brighton & Hove parkrun 1
Kim MAZZUCCA Forest of Dean parkrun 4
Lara HAWKINGS Greenwich parkrun 1
Lucy HASELL Richmond parkrun 1
Lydia TURNER Sunderland parkrun 1
Mollie WILLIAMS Woodbank parkrun 1
Natalie HARVEY Banstead Woods parkrun 1
Rachael ELLIOTT Old Deer Park parkrun 1
Samantha AMEND Black Park parkrun 1
Sarah PETERSON Hackney Marshes parkrun 2
Sarah TUNSTALL Leeds parkrun 3
Sonia O'SULLIVAN Bushy parkrun 2
Stacey Lee Naomi WARD Bramhall parkrun 1
Stacey Louise SMITH Middlesbrough Albert parkrun 1
Susie BUSH Richmond parkrun 4
Tara ANDERSON Rolf Valley parkrun 1
Vicky GILL Bedfont Lakes parkrun 1
Vicky GILL Finsbury parkrun 1

 



Tue 02/22/2011

 
RATING:   COMMENTS (0)  
#8 Day of Registration


 

#8 Day of Registration


We know that we have most visitors to our website on a Saturday and Sunday - obviously runners are checking up on their times, but I was curious to see what day of the week parkrunners tend to register, so here we go:


Pie chart showing what day of the week parkrunners originally registered with parkrun.
This chart represents all of the 83,067 runners who registered since 1st June 2009 when we started recording the date of registration.



Wed 01/26/2011

 
RATING:   COMMENTS (0)  
< BACK  1 of 3  NEXT >