The start of the season is right around the corner and I’ve been preparing for it by playing around with some simulations, trying to figure out who’s going to do well and who isn’t. This post is going to serve as a general view of the league as a whole, so if you’re interested in reading my thoughts on any of the individual teams, I urge you to go through my 2016 season reviews, as they contain most of what I considered interesting last season. With this post my hope is to put 2016 behind me and sharpen my focus on what lies ahead.
When trying to look into the future, what we are forced to use is what happened in the past. In the Veikkausliiga, we would expect HJK to do well for a multitude of reasons, chief of which is that that’s what they usually do. It’s why we expect promoted teams to do poorly and mid tier sides to stick to mid table. Most times however, the league table this season isn’t a very good indicator of what’s going to happen next season. Football is much messier than that.
First of all, the season is short. 33 games in the Veikkausliiga is short compared to 46 in the lower leagues of England, which is short compared to 82 in the NHL which is short compared to 162 in MLB. Statistically, things take time to settle, and in football, time is in short supply. In football, you can start being sort of confident in some data patterns after about 10-12 games – in baseball, no-one would use a 10 game sample for anything. Different sports have different parameters, and this is just something we’ll have to contend with.
Second, football is high in variance due to being a low scoring game. Over a hundred games, the ‘best’ team would be more likely to accumulate the most points, over 33 games the story isn’t quite as simple. It isn’t unlikely to see a team win a game having scored from only a few chances while the losing team creates a tonne while scoring none. Have a couple of those games within a short time span and you’re going to have a leg up on your opponents just because they’ll be in a hurry to catch up – see Burnley 16/17 as an example of the above.
Third, things change. Using numbers from the 2016 season to predict the 2017 season without taking into account what has happened in between is a fool’s errand. This is especially true of the Veikkausliiga where player contracts are short and almost every other league is considered an upward career trajectory. Take SJK as an example. From last season they’ve had to replace their manager, three defenders, two midfielders (one to a long term injury) and two attackers in their approximate starting XI. Do their numbers from last season reflect what their going to do this season, at all? It’s impossible to know, but at this point we’re going to have to work with the assumption that they do.
There are obviously more reasons why predicting isn’t an exact science, but these are the primary ones that pertain to the Veikkausliiga in particular. They remain true regardless of how you go about doing your predictions, so in that sense we’re all in the same boat.
So with the caveats out of the way, let’s get down to business. Using Veikkausliiga numbers from 2016 – specifically shot amounts and Expected Goals – I’ve done a Monte Carlo-simulation of the 2017 league season. If you’re interested in the nitty gritty of this form of simulation, then – and I shouldn’t have to be telling you this – I would recommend doing some googling.
Basically, what a Monte Carlo-simulation entails is using a random number generator to account for the real life randomness of a football game. Using different metrics we can estimate the individual probabilities of certain events happening in a game, and a random number generator to generate outcomes based on these events (e.g. shot amounts, goal amounts, a simplified example would be: if random number is 0.5 or above, then outcome Y, otherwise outcome Z). Calculate outcomes for all of the games in the season, repeat 10 000 times (with the random numbers changing every time, of course), average the results, and voilà.
But does it work? Well, it depends on how well it has to work to qualify. I did some test runs with the data I’ve accumulated from 2013 and 2015 and used the simulation to ‘predict’ what would happen the following seasons. The good news: both simulations were better predictors than the league table from the same season. The bad: they still weren’t very good predictors.
|League Table r-values||0,38||0,71||0,14||0,43|
What that means is that the predictions for 2017 shouldn’t be taken as gospel, but merely as an improved alternative to the 2016 league table. Since the simulations are based on Expected Goals, they naturally adjust teams that have either been lucky or unlucky, levelling the playing field and offering an ‘all-things-being-equal’ version of events. Needless to say, all things are seldom equal, and therefore, even if we had a perfect model, it still wouldn’t be able to properly account for the random swings in luck that end up affecting a season. What it can do, though, is give us an approximation of how likely each outcome is – from goals to points and league positions – given that we have a sample of 10 000 for every game. Once information comes trickling in from the upcoming season, the prediction can also be adjusted with a more representative set of data.
If the format looks familiar, it’s because it’s a pretty standard Excel conditional formatting that I’ve seen used a fair bit among the more popular analytics accounts on Twitter for the same purpose (not sure whom to credit for the overall concept, but mine looks almost identical to the ones @GoalImpact produces).
As this is more of an introductory post for the upcoming season, I wouldn’t pay much attention to the exact positioning of the teams at this stage, as this simulation lacks quite a lot of context. A word of caution about JJK as well: I have practically zero data to work with from Ykkönen, so what I’ve done with them is to average all promoted team seasons in my data set and use that as a proxy for a standard season for a promoted team.
What this chart does show, however, are broader groupings. The simulation likes SJK, IFK Mariehamn and HJK – as it should – and dislikes JJK and PS Kemi. The rest of the teams are bunched in the middle with little definitive separation between them, except for maybe Inter and Lahti who create a mini-lower-mid-table-tier. Compared to 2016, HIFK and SJK are the major upward movers – but, like I outlined earlier, SJK in particular are difficult to predict due to the organisational churn they’ve gone through.
So this is what the data suggests will happen, which I hope will prove an interesting talking point. Personally, I think the major problems with the prediction is that it doesn’t take into account what has happened during the offseason and that it doesn’t account for managerial changes. For what it’s worth, if I would adjust the simulation in accordance with these factors, I would have HJK and possibly Inter and Ilves a bit higher, and RoPS a bit lower.
If you found this interesting, and/or you’re interested in updates as the season progresses, please follow me on Twitter @Minor_LS.