Ir al contenido

Documat


Resumen de Machine learning and multivariate statistical tools for football analytics

Pilar Malagón Selma

  • This doctoral thesis focuses on studying, implementing, and applying machine learning and multivariate statistics techniques in the emerging field of sports analytics, specifically in football. Commonly used procedures and new methods are applied to solve research questions in different areas of football analytics, both in the field of sports performance and in the economic field. The methodologies used in this thesis enrich the techniques used so far to obtain a global vision of the behaviour of football teams and are intended to help the decision-making process. In addition, the methodology was implemented using the free statistical software R and open data, which allows for reproducibility of the results.

    This doctoral thesis aims to contribute to the understanding of the behaviour of machine learning and multivariate models for analytical sports prediction, comparing their predictive capacity and studying the variables that most influence the predictive results of these models. Thus, since football is a game of chance where luck plays an important role, this document proposes methodologies that help to study, understand, and model the objective part of this sport. This thesis is structured into five blocks, differentiating each according to the database used to achieve the proposed objectives.

    The first block describes the most common study areas in football analytics and classifies them according to the available data. This part contains an exhaustive study of football analytics state of the art. Thus, part of the existing literature is compiled based on the objectives achieved, with a review of the statistical methods applied. These methods are the pillars on which the new procedures proposed here are based.

    The second block consists of two chapters that study the behaviour of teams concerning the ranking at the end of the season: top (qualifying for the Champions League or Europa League), middle, or bottom (relegating to a lower division). Several machine learning and multivariate statistical techniques are proposed to predict the teams' position at the season's end. Once the prediction has been made, the model with the best predictive accuracy is selected to study the game actions that most discriminate between positions. In addition, the advantages of our proposed techniques compared to the classical methods used so far are analysed.

    The third block consists of a single chapter in which a web scraping code is developed to facilitate the retrieval of a new database with quantitative information on the game actions carried out over time in football matches. This block focuses on predicting match outcomes (win, draw, or loss) and proposing the combination of a machine learning technique, random forest, and Skellam regression model, a classical method commonly used to predict goal difference in football. Finally, the predictive accuracy of the classical methods used so far is compared with the proposed multivariate methods.

    The fourth block also comprises a single chapter and pertains to the economic football area. This chapter applies a novel procedure to develop indicators that help predict transfer fees. Specifically, it is shown the importance of popularity when calculating the players' market value, so this chapter is devoted to propose a new methodology for collecting players' popularity information.

    The fifth block reveals the most relevant aspects of this thesis for research and football analytics, including future lines of work.


Fundación Dialnet

Mi Documat