Using Network Science to Quantify the Identifiability of Football Teams 4n2v38 Stats Perform

Javier Buldú and David Garrido exhibited a poster presentation at the 2020 OptaPro Forum, which introduced a Network Science approach to quantifying how the playing style of a team is maintained over the course of a season, based on the persistence of ing patterns, using Opta data.

In this guest blog they outline the methodology behind their presentation, together with a summary of the key findings.

Click here to view their poster.

Introducing Football Teams as Complex Systems h586r

“A complex system is a system composed of many components which may interact with each other. Complex systems are systems whose behaviour is intrinsically difficult to model due to the dependencies, competitions, relationships, or other types of interactions between their parts or between a given system and its environment”

The brain, the earth climate and ecosystems are obvious examples of complex systems. We would also argue that football has a strong case for being categorised in this way as well.

Why?

Well, during a football match twenty-two components, called players, interact with each other in a complex way, creating dependencies, competing and, more importantly, generating emerging properties such as “playing patterns”. For these reasons, complexity sciences are a viable alternative to analyse football datasets, introducing new perspectives on the analysis of the beautiful game.

The reasons for this lie in the complex nature of football, which, paraphrasing the foundational paradigm of complexity sciences, “cannot be analysed by looking at its components individually (i.e, players) but, on the contrary, considering the system as a whole”. Even the most successful player in a match recognises that “it’s not just me, it’s the team.”

Translating team activity into a complex network is one of the many approaches based on complexity sciences. The organisation of a team can be analysed considering the interaction between its players through es. We can construct ing networks, which contain information about how the ball has been moved, from player to player, during the whole match.

ing networks are “complex networks” for two main reasons:

1. They are composed of nodes (players) and links (es) between them;

2. The interplay between nodes follows certain “complex” rules.

Furthermore, these networks are not easy to analyse given they are directed (i.e, links between players have a certain direction), weighted (the weight of the links are the number of es between players), spatially embedded (i.e, the Euclidean position of the ball and players is highly relevant) and time evolving (i.e., the network continuously changes its structure).

Figure 1 below shows an example of the Real Madrid ing network during a match against Barcelona from the 2017/18 season. In the plot, player sizes are proportional to their importance in the ing network. Players are placed in the average position from where their es were made. The widths of the links are proportionate to the number of es between two players. Finally, substitutions are highlighted in green.

From this graphic, we can rapidly get an idea of how Real Madrid played, how they occupied the field, how their players were interacting between each other, and how the organisation changed following substitutions.

Figure 1: A schematic illustration of a ing Network, Real Madrid vs Barcelona, 2017/2018 season.

This is just a snapshot of how translating team activity into a network can help us to understand team organisation. A diversity of metrics can be extracted from the network structure at different spatial and temporal scales, leading to a better comprehension of how a team is organised and how players contribute to the team’s performance (Buldú et al., 2018).

This is a task for Network Science, the branch of complexity sciences that analyses network structures and dynamics. Network Science is yet to be fully adapted for data analysis in football, but has the potential to provide new perspectives on performance in years to come.

Pitch ing Networks 211l3m

There are other ways of creating ing networks. If we are more concerned about the spatial organisation of a team, instead of the role of the players, we can construct and analyse pitch ing networks. In this instance, the nodes of the network are not player specific, but specific regions of the field, which are connected through es made by the players occupying them.

Figure 2 shows examples of Barcelona’s pitch ing networks against Real Madrid.

Figure 2: Plots, from L to R, are the 3×3, 5×6 and 10×10 ing networks for Barcelona, where nodes are regions of the pitch and links for the number of es between them.

Why do we plot three networks instead of one? The reason is that the pitch can be divided into areas of different sizes, leading to pitch networks of different scales. In this way, the three networks of Figure 2 correspond to the same team during the same match and the only difference is the size of the partitions. However, note that the structure of the network is different depending on the number of divisions, indicating that analysis of the network properties at different scales is required.

The Identifiability of Football Teams 2i4s6j

Now that we have defined the framework, it is time for the questions:

Is it possible to quantify to what extent a team has a defined playing style?
What teams adapt to their opposition and what teams remain loyal to their style?
Can we quantify which teams impose their style on an opponent during a match?
Which teams behave differently when playing away?

To answer all these questions, we applied Network Science to analyse the organisation of pitch ing networks.

Using the event data from a match, we constructed the multi-scale ing networks associated to each team and analysed their structures using different methodologies coming from Network Science.

In this way, we were able to identify:

Which teams imposed their playing style over their opponents;
What to expect from their opponents before a match;
How to evaluate whether a team played in line with what was expected.

Our only source of information was the way each team ed the ball, disregarding the number of shots, goals, tackles, dribbles or any other action. However, as we will see, ing patterns are still able to capture the essence of a team’s organisation.

We divided the pitch into n x m regions (with n =1,2,3,…10 and m =1,2,3,…10) and constructed the pitch ing networks, where nodes corresponded to the N=(nxm) regions of the pitch and a_ij ed for the number of es from region i to region j.

We analysed, across the whole season, the properties of the resulting connectivity matrices A{a_ij} at different spatial scales. The elements of the connectivity matrices are the number of es between the regions of the pitch, i.e, the mathematical abstraction of the ing patterns of each team. We calculated the consistency parameter (C) of each team by quantifying how similar the connectivity matrices were of a given team during the season.

In short, teams with a high consistency maintained the structure of their ing networks throughout the season, while teams with a low consistency changed their organisation from match to match.

Next, we quantified how unique the pitch ing networks were of each team. This can be done by comparing the structure of the ing networks of a given team, with those of the rest of the teams in the competition. We call this parameter the rival similarity R. Finally, we defined the identifiability parameter (I) of a team as the consistency parameter C minus the rival similarity R, i.e., I=C-R.

Teams with a high identifiability parameter are those who are consistent and, at the same time, different from the rest.

Our methodology has both descriptive and prospective applications. On the one hand, we were able to identify which teams maintained their playing style (“high identifiability”) throughout the season and those that, on the contrary, did not have a consistent style (“low identifiability”).

In collaboration with LaLiga, we computed the identifiability parameter of the 2017/18 Spanish top-flight teams.

Figure 3 shows the values of the identifiability of Barcelona and Málaga, the teams who finished top and bottom of the table respectively. On the horizontal axis, we have plotted the number of nodes into which the pith is divided since, as we explained, all scales must be analysed. Interestingly, we observed how pitch divisions of around 50 areas (nodes) were the ones leading to a better identification of the playing style of Barcelona. Concerning Málaga, we can see how their identifiability was rather low at all scales.

Figure 3: The graphic on the left plots the identifiability parameter of Barcelona, based on the number of divisions (nodes) of the pitch. On the right, the same analysis is displayed for Málaga.

Application 2d6525

Crucially, this information can help coaching teams prepare for a match through identifying the expected approach of their opponents.

For example, it is possible for a team to evaluate the identifiability of its next opponent and decide whether or not to adapt their own approach based on the opposition’s playing style (when the opposing team has a high identifiability) or try to impose their own style on them (in the event of facing a team with a low-identifiability).

We can also use identifiability to quantify, for every single match, which team played most similarly to their own style.

Table 1 shows the match-by-match identifiability difference between home and away teams during the 2017/2018 LaLiga season.

The matches where the home team, listed on the vertical axis, imposed its own playing style (i.e., had a higher identifiability) are highlighted in yellow, while the green cells correspond to away teams imposing their styles. The teams have been ordered based on the final league standings, with the aim of showing the connection between identifiability and the performance of a team. The yellow cells mainly appear above the diagonal matrix line, indicating that when two teams play, the one ranked in a higher position has a higher probability of imposing its own playing style.

If we highlight individual teams, we can see that Barcelona won the “identifiability contest” in more matches, both at home and away, followed closely by Real Madrid. However, it is worth stressing that differences in identifiability is not always an indicator for the match result, since there are some matches where identifiability was higher for the team that lost the match. This is the nature of football, where playing your way does not always guarantee success.

Table 1: Home teams are listed on the vertical axis, arranged by their final league position. Away teams are on the horizontal axis, arranged in the same way. The match result displays inside each cell. In yellow, the home team imposed its style; In green, the visiting team. Cells in blue correspond to matches where there was no clear difference between the identifiability of both teams.

To conclude, it is also worth highlighting that it is possible to obtain a real-time estimation of the identifiability parameter as a game is taking place, highlighting when a team, or their opponent, is playing as expected. This is valuable information, which could potentially inform key in-game decision making from the bench.

Further applications of this methodology would also allow analysts the opportunity to evaluate which teams behave differently when playing at home or away, or identifying those regions of the pitch where deviations from a team’s expected ing patterns occur during a game.

References

Buldú, J. M., Busquets, J., Martínez, J. H., Herrera-Diestra, J. L., Echegoyen, I., Galeano, J., & Luque, J. (2018). Using network science to analyse football ing networks: dynamics, space, time and the multilayer nature of the game. Frontiers in Psychology, 9, 1900.

Buldú, J. M., Busquets, J., & Echegoyen, I., & F. Seirul.lo (2019). Defining a historic football team: Using Network Science to analyze Guardiola’s FC Barcelona. Nature Scientific Reports, 9(1), 1-14.

Possessing a PhD in Applied Physics, Javier Buldú is the coordinator of the Complex Systems Group at the King Juan Carlos University in Madrid, as well as being the Principal Investigator of the Laboratory of Biological Networks at the Center for Biomedical Technology.

He can be ed via email at: [email protected]

David Garrido is a PhD student, studying at the Center for Biomedical Technology & King Juan Carlos University, Madrid, Spain.

Using Network Science to Quantify the Identifiability of Football Teams 3w4n4e

Introducing Football Teams as Complex Systems h586r

Pitch ing Networks 211l3m

The Identifiability of Football Teams 2i4s6j

Application 2d6525

Sign up to The Scoreboard 4k393k