Always the same issue, huh? How to start something like this… Well, I’ve wanted to start contributing to the football analytics discussion since a long time back. I wrote my master’s thesis on player evaluation a while back, and received an excellent grade for it. This was between about 2012-2013. After completing my thesis, I took something of a hiatus from independent thinking about football numbers, instead immersing myself in what was an emerging blogging scene. This scene has now reached what to me feels like something of a bottleneck – there’s so many interesting things going on in the public sphere that it feels like the dam is going to have to burst at some point. There are so many intelligent thinkers and creative writers and brilliant coders and designers out there, researching and producing things that I could only have dreamt about when last writing about football, that it feels doubly daunting to even get started. Like I said, I’ve been wanting to write something, but I just never knew what that was going to be – and now I’m not sure if there’s space for whatever I have the capacity to produce.
Therefore, here we are. What this blog is going to be about is nothing particularly groundbreaking or brilliant – it’s going to be less about pushing the envelope and more about applying whatever thoughts are out there to (I think) pastures anew. The majority of football writing is about the top five or six leagues in Europe. Over the past year or so, the MLS has also increased its status to the extent that it is probably one of the more covered leagues in terms of analytics. This is, I feel, because of two things that equal a third: data availability + general interest = exposure. This is obviously quite natural, and far from a criticism – I would write about the Premier League as well if I had any original ideas I wanted to develop.
This does however leave some questions unanswered – especially when it comes to what is the hottest of potatoes at the moment: Expected Goals. Michael Caley, developer of probably the most well known, and well renowned, Expected Goals model, when testing his model, achieved great success in the top bracket of the elite leagues, but when he tried it on the Ligue 1 ExpG as a metric seemed to fade a little. The conclusion: ExpG works better as a metric at the elite end of the spectrum, but is less descriptive the closer we get to the bottom. Now, Ligue 1 is obviously still quite close to the top end so there was no real indication of where the cut off point for usability is. Is there any point in having an interest in ExpG in leagues that are below Ligue 1, for example? Or is Ligue 1 only an outlier? Hard to tell, and since the data is difficult to get your hands on, even harder to find out.
This was my premise when I started thinking about what I could do within analytics, what I could write about. Being a Finn, I have a strong footing in one of the weaker leagues in Europe albeit one which seems to be showing signs of positive growth. But the problem is: there’s no data. The Finnish league – the Veikkausliiga – provides some rudimentary stats (goals, assists, cards etc.) on its website, and you can get minutes played from some other sites, but above and beyond that, there’s nothing publically available. What the Veikkausliiga does have, however, is a partnership with InStat (a football data provider) and what they provide is an online interface with video data of a complete set of particular actions from every match played in the Veikkausliiga. Now I’m guessing that the clubs in the league have access to a more advanced dataset than the public does. Based on what the interface shows visually, I’d say they probably have location data for the different actions they collect data on and almost definitely things like Key Passes and different shot stats. This is obviously the kind of data that could be enough to start a rudimentary discussion about football analytics at a sub-elite level, but unfortunately it’s for certain eyes only.
So what I set out to do was to put this video data into a spreadsheet, and make it workable. It started out far less ambitious – at first I just wanted things like shot numbers and key passes – than it ended up. I now sit on what I believe is the most comprehensive database of the 2015 Veikkausliiga season available outside of the actual clubs and InStat. I have some defensive numbers, I have key passes, shots, saves. I have headers and volleys, free kicks and penalties. I have a rudimentary ExpG and ExpA model based solely on shot location (because I haven’t had the energy to develop it further – collecting data can be a pain in the ass). I have team stats and player stats. So what now?
Well, the 2016 season started last weekend, so a whole lot of data collection probably. I’ve refined my spreadsheet to make it less CPU-heavy (I’m doing this on a MacBook Air, which means that my final spreadsheet for the 2015 season barely opens anymore because it’s so cluttered) and more intuitive, I’ve added some new stuff that I think is going to be interesting, like pre-assists and key pass locations, shot direction, corner routines, player footedness etc. I hope to get into a rhythm whereby I can do a bit of actual analysis as well, I had hoped to produce a preview for this season but I ended up kind of drained after having completed the 2015 data collection. I’ll get to it eventually, just need to get the show on the road first.
What I hope to do is to provide some kind of balance to the discussion, a different perspective. Once I get the machinery firing I also hope to produce some cool analysis and graphics pertaining to the Veikkausliiga, hopefully it’ll interest someone. Either way, my personal belief is that the greatest advantage to be found in analytics is in the lower leagues where outfits aren’t already squeezing every piece of competitive advantage for whatever little nectar is left. Whether it’s recruitment, or preparation or tactical planning there are almost certainly low hanging fruits to be found.