Has the NFL Combine’s 40 yard dash gotten faster ?

Last week on one of my favourite podcasts, ESPN’s Football Today, Matt Williamson & Kevin Weidl discussed the standout prospects from the NFL Combine. A lot of the conversation was around how the 40 yard dash times have improved year on year due to better training technique and specific training for the combine activities.

I wanted to see for myself and found Combine results for all participants going back to 1999 at nflcombineresults.com including last week’s 2013 results. This data set has key data for all 4,283 participants during this period and is a gold mine for analysis. The data needed a bit of cleaning up to get it into a data frame but if you’d like a copy then leave a comment or message via twitter (@minimalrblog) – I haven’t spent the time to work out how to use github to share datasets.

I compared the 40 yard dash times of 1999 and 2013 and initally didn’t see real improvements as the 5 best times were:

Name College Position Draft Year 40 Yard Time
Rondel Menendez Eastern Kentucky WR 1999 4.24
Marquise Goodwin Texas WR 2013 4.27
Champ Bailey Georgia CB 1999 4.28
Jay Hinton Morgan State (MD) RB 1999 4.29
Karsten Bailey Auburn WR 1999 4.33

The Combine class of 1999 had 6 of the best 10 times. However looking at the quartiles and plotting the 2 distributions showed a real improvement over the 14 years – while the fastest runners didn’t get faster, the rest of the field did benefit from improved training and technique.

Draft Year Fastest Time 1st Quartile Median 3rd Quartile Slowest Time
1999 4.24 4.61 4.78 5.09 5.84
2013 4.27 4.55 4.71 4.99 5.65


The overlapping distribution was generated using the ggplot2 library.

CombineData19992013 <- data.frame(CombineData[CombineData$Year == 1999 | CombineData$Year == 2013,])
ggplot(CombineData19992013, aes(X40Yard., fill = Year)) + geom_density(alpha = 0.2)

More visualisation of 2012 NFL Quarterback performance with R

In last week’s post I used R heatmaps to visualise the performance of NFL Quarterbacks in 2012. This was done in a 2 step process,

  1. Clustering QB performance based on the 12 performance metrics using hierarchical clustering
  2. Plotting the performance clusters using R’s pheatmap library

An output from the step 1 is the cluster dendrogram that represents the clusters and how far apart they are. Reading the dendogram from the top, it first splits the 33 QBs into 2 clusters. Moving down, it then splits into 4 clusters and so on. This is useful as you can move down the diagram and stop when you have the number of clusters you want to analyse or show and easily read off the members of each cluster.


An alternative way to visualise clusters is to use the distance matrix and transform it into a 2 dimensional representation using R’s multidimensional scaling function cmdscale().

QBdist <- as.matrix(dist(QBscaled))
QBdist.cmds <- cmdscale(QBdist,eig=TRUE, k=2) # k is the number of dimensions
x <- QBdist.cmds$points[,1]
y <- QBdist.cmds$points[,2]
plot(x, y, main="Metric MDS", type="n")
text(x, y, labels = row.names(QBscaled), cex=.7)


This works well when the clusters are well defined visually but when they’re not like in this case then it just raises questions why certain data points belong to one cluster versus another. For example, Ben Roethlisberger and Matt Ryan above. Unfortunately Mark Sanchez is still unambiguously in a special class with Brady Quinn and Matt Cassel.