More visualisation of 2012 NFL Quarterback performance with R

In last week’s post I used R heatmaps to visualise the performance of NFL Quarterbacks in 2012. This was done in a 2 step process,

  1. Clustering QB performance based on the 12 performance metrics using hierarchical clustering
  2. Plotting the performance clusters using R’s pheatmap library

An output from the step 1 is the cluster dendrogram that represents the clusters and how far apart they are. Reading the dendogram from the top, it first splits the 33 QBs into 2 clusters. Moving down, it then splits into 4 clusters and so on. This is useful as you can move down the diagram and stop when you have the number of clusters you want to analyse or show and easily read off the members of each cluster.


An alternative way to visualise clusters is to use the distance matrix and transform it into a 2 dimensional representation using R’s multidimensional scaling function cmdscale().

QBdist <- as.matrix(dist(QBscaled))
QBdist.cmds <- cmdscale(QBdist,eig=TRUE, k=2) # k is the number of dimensions
x <- QBdist.cmds$points[,1]
y <- QBdist.cmds$points[,2]
plot(x, y, main="Metric MDS", type="n")
text(x, y, labels = row.names(QBscaled), cex=.7)


This works well when the clusters are well defined visually but when they’re not like in this case then it just raises questions why certain data points belong to one cluster versus another. For example, Ben Roethlisberger and Matt Ryan above. Unfortunately Mark Sanchez is still unambiguously in a special class with Brady Quinn and Matt Cassel.


Visualising 2012 NFL Quarterback performance with R heat maps

With only 24 hours remaining in the 2012 NFL season, this is a good time to review how the league's QBs performed during the regular season using performance data from KFFL and the heat mapping capabilities of R.

#scale data to mean=0, sd=1 and convert to matrix
QBscaled <- as.matrix(scale(QB2012))

#create heatmap and don't reorder columns
pheatmap(QBscaled, cluster_cols=F, legend=FALSE, fontsize_row=12, fontsize_col=12, border_color=NA)


Instead of using the R's default heatmap, I've used the pheatmap function from the pheatmap library.

The analysis includes KFFL's data on Passes per Game, Passes Completed per Game, Pass Completion Rate, Pass Yards per Attempt, Pass Touchdowns per Attempt, Pass Interceptions per Attempt, Runs per Game, Run Yards per Attempt, Run Touchdowns per Attempt, 2 Point Conversions per Game, Fumbles per Game, Sacks per Game.

#cluster rows
hc.rows <- hclust(dist(QBscaled))


This cluster dendrogram shows 4 broad performance clusters of QBs who started at least half the regular season (8 games) plus Colin Kaepernick (7 games). It's important to remember this analysis does not include any playoff games. Our assessment of playoff QBs is also easily biased by the results of these games – just because Joe Flacco makes SuperBowl XLVII does not mean he has consistently outperformed Tom Brady.

Cluster 1 – The top tier passers

#draw heatmap for first cluster
pheatmap(QBscaled[cutree(hc.rows,k=4)==1,], cluster_cols=F, legend=FALSE, fontsize_row=12, fontsize_col=12, border_color=NA)


Pass first QBs with good passing stats and who kept out of trouble (low interceptions, sacks & fumbles). Within the group – Brees, Peyton Manning, Brady and Ryan have the best results with Carson Palmer a surprise in this group.

Cluster 2 – Successful run & pass QBs

#draw heatmap for second cluster
pheatmap(QBscaled[cutree(hc.rows,k=4)==2,], cluster_cols=F, legend=FALSE, fontsize_row=12, fontsize_col=12, border_color=NA)


Strong outcomes in both the passing and running game including the 3 QBs who led in run attempts per game – Newton, RG III and Kaepernick. RG III & Kaepernick also had surprisingly few interceptions per game given their propensity to aggressively throw deep.

Cluster 3 – The Middle

#draw heatmap for third cluster
pheatmap(QBscaled[cutree(hc.rows,k=4)==3,], cluster_cols=F, legend=FALSE, fontsize_row=12, fontsize_col=12, border_color=NA)


Not great but not the worse either including Joe Flacco.

Cluster 4 – A year of fumbles, interceptions and sacks

#draw heatmap for fourth cluster
pheatmap(QBscaled[cutree(hc.rows,k=4)==4,], cluster_cols=F, legend=FALSE, fontsize_row=12, fontsize_col=12, border_color=NA)


As a NY Jets supporter this is painful.