Book of Abstracts
ordered alphabetically by corresponding author

by Tobias Berger, Frank Daumann, and Björn A. Kuchinke
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Generating Competitive Advantages in the NBA using Data Analytics

Professional sports – as a highly competitive field – has recognized the data analytics trend and seems to invest more and more money, time and faith in it. In the pursuit of sustainable financial and on-field success, many ambitious and forward-thinking sports organizations, leagues and teams are implementing data analytics in their management processes to gain potential benefits.

This paper is taking first steps in researching the issue of data analytics application in sports with a special focus on the franchises of the National Basketball Association. “Competing on analytics” has become a serious path to success for many NBA teams. Still, the question remains: what kind of impact does this development have? Can the franchises use data analytics as an organizational resource to gain a competitive advantage over their opponents? If so, in what areas can they apply data analytics and can we measure potential benefits that result from implementing this approach into the management process? This paper is looking for answers to these questions.

The theoretical part provides a framework regarding how data analytics should be viewed in the sports context. After defining it as a structured process with four steps and proving that it can be a source of competitive advantage, concrete areas of application for analytics in sports and especially in basketball will be discussed.

Deriving from the Sports Policy Analytics Functional Relationships Model it can be shown that data analytics could be of use for NBA franchises to improve their management choices in talent acquisition, maximizing the on-field output of their athletes and lower their player salary costs.

Empirical Research

First, a model is formed to research the issue and to measure the data analytics involvement and competency of every individual NBA franchise. On the basis of the theoretical thoughts of Liberatore and Luo of data analytics as a process, we came up with the factors: analytics human capital, analytics experience, analytics open-mindedness of the management, potential analytics excellence of the employees and unique data acquisitions by teams for our research design. The score derived from these elements shows the competence of data analytics for each franchise.

In a second step, we picked out player contracts and the NBA draft as further research fields because theory suggests that an advantage could be reached in both areas through decision improvement by data analytics. After mining the player’s biographical, performance and financial data for every NBA player from 2009 to 2016 from an NBA statistics platform, we evaluated every contract and draft decision of every team during this time period.

After that we used regression models to put these two main components together and test if analytics savvy teams get more on-field performance for every contract dollar they pay using the metric value above replacement player and if they acquire better players in the draft while controlling for draft position.

The results show that our models prove to be highly significant in both instances. Data analytics can be the source for slight competitive advantages of NBA franchises regarding the player contracts and their NBA draft choices.


Davenport, Thomas H., Harris, Jeanne G. und Gary Loveman (2010): Competing on analytics - The new science of winning, Boston, Mass.

Evans, Richard (2014): Sports Policy Analytics for Professional Sports Leagues, London.

Liberatore, Matthew J. und Wenhong Luo (2010): The Analytics Movement: Implications for Operations Research, in: Interfaces, Vol. 40, p. 313–324.

by Paul Bradshaw and Alex Homer
Data Journalism Session II - 14.00 - 15.30, Room: RAA-G-01 AULA

Closed shop' Premier League clubs £100m richer than EFL before bumper TV rights

In 2017 the BBC Data Unit set out on its most ambitious sports data journalism project to date: to provide a club-by-club guide to the financial health of the 92 clubs in the Premier League and English Football League. Over months we have pored over accounts published in scanned PDFs to create our own database and unlock data that we feel is hiding in plain sight about our national game. Containing around 17,000 individual pieces of data so far, our mastersheet was input manually and covers the accounts for the clubs from 2014-16 inclusive.

We have analysed those three years of accounts so far and our findings broadly cover: the clubs’ webs of ownership (both domestic and offshore/in tax havens), how they make their money, where that money goes and whether spending equates to success. We have also discovered the clubs piling up the biggest debts, those whose wages cost the most and their directors’ pay.

The initial results of this work can be seen in the story ""'Closed shop' Premier League clubs £100m richer than EFL before bumper TV rights"" ( This identified how the gap between Premier League and Championship clubs grew in each season from 2014-16, to as large as £100m by 2016. The story used median figures to avoid averages being distorted and was intended to effectively draw a line in the sand before the impact was revealed of the 2015 television rights deal for Premier League games (71% higher than the previous deal). That is not expected to be clear until the full 2017 accounts are published.

We revealed the impact of that growing gap was most clearly seen by 2016, when 16 out of 24 Championship clubs spent more than 100% of their turnover on staff costs - two thirds of those clubs were spending beyond their means to chase the promotion dream. We approached an expert on football finances with our findings, who claimed that spending constraints brought in under the banner of protecting clubs’ futures and the integrity of our domestic lagues – initially called Financial Fair Play (FFP) rules, now Profitability and Sustainability rules - were in fact contributing to a hierarchy of big money clubs which was becoming increasingly entrenched.

The project is still ongoing with plans to publish a comprehensive, interactive database of football finances that allows fans to better scrutinise the operation of their favourite club: we hope to complete inputting the 2017 accounts before re-running our analysis and publishing our eventual series of analytical reports later in 2018.

The project also hopes to better inform reporting on the sport's finances. Historically sports reporters have not been as financially literate as would be expected, given that half of clubs in English football have had a period within administration since 1992.

Many lessons can be drawn from the experiences of this project, including how best to open up data locked in PDFs; the effective coordination of multiple team members on such a project and the role of data audits; issues of inconsistency in company accounts (we had to make our own judgements on categorising details from cash flow statements into three main categories of income: tickets, broadcasting, commercial); planning for interactivity; financial literacy; and negotiating internal dynamics when coordinating between subject specialists and data specialists (in this case, the sports team and the data unit – our publication has been delayed because the well-known Price of Football project chronicling the cost of tickets/merchandise/food etc was recommissioned for another year)."

by Bruno Caprettini
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Does hosting a sports team boost the visibility of a city among tourists?

I answer this question by looking at the effect of playing soccer’s UEFA Champions League on air travel. I compare routes across cities that had their teams randomly drawn into the same group in the first phase of the competition to routes across cities hosting teams randomly allocated to different groups. The random allocation of teams to groups allows to credibly estimate the causal effect of playing in the same group. The average effect of being drawn into the same group is between 5 and 8 percent more arrivals for the three months following the group stage, a period which coincides with a break in the competition.

by Dmitry Dagaev, Akash Adhikari , and Stanislav Anatolyev
Academic Session IV - 16.00 - 17.30, Room: RAA-E-27

Tilt in Chess: Mistakes Provoke New Mistakes

Even the best chess players make mistakes during the game. Each mistake has a direct impact on the further play by worsening the position of player who made the suboptimal move. We suggest that there exists another way in which mistakes influence the outcome. After making the suboptimal move and realizing this, a player experiences psychological pressure on himself. The player's stress results in more mistakes made by the same player. The explanation of this phenomenon could be that the player wastes time and mental resources by rethinking the position where he or she made the mistake.

To test this hypothesis, we analyze several available chess databases. The games are encrypted in PGN format which is a golden standard for chess databases. We analyze the games using Stockfish 8 engine which is widely regarded as one of the strongest computer chess programs. It works in the following way. Basically, Stockfish tries to search the full tree of the game and evaluate each possible position. Given the time limit (which is set to 30 seconds per move by default), Stockfish has to reduce the search because it is not possible to go till the end of each possible game having 30 seconds only. There is a certain set of rules that describes which lines could be analized to a lesser extent. For example, if one gives away the rook, there is no need to search for 20 moves in order to understand if there is a compensation for the rook. Thus, Stockfish quickly looks at the best possible position after 5-7 moves, and if it is obviously bad, then Stockfish forgets this line.

One can change the number of lines (between 3 and 9) that must be analyzed by Stockfish more deeply. By raising the number of lines from 3 to 9, one asks Stockfish to analyze deeply more options for the same time of 30 seconds. Thus, the depth of the search will be lower. The evaluations could be different for different sets of search parameters. We fix the search time at 30 seconds per move and the number of lines at 9.

The best move is the move which corresponds to the best possible evaluation. We extract from the Stockfish analysis the best line for every move and the numerical evaluation of the position for the corresponding best line. Thus, we have a sequence of position scores after each move. Also, we know whether the player chose his or her best move. Let b_{t} be a dummy variable which takes value 1 if the best move was played at move t and 0 otherwise. We are interested in the internal structure of sequence b_{t}. That is, does making suboptimal move t-1 influence the probability of making optimal move t? To answer this question, we use autoregressive model. We start with AR(1) model: b_{t}=c+ab_{t-1}+\varepsilon_{t}, where a and c are estimated parameters. Under the hypothesis that there is a psychological pressure on the player who realized that he or she made suboptimal move, we would expect that making the best move t-1 increases the probability of making the best move at move t. That is, coefficient a should be significant and positive. Our results support this hypothesis.

by Stefan Feuerriegel, Markus Weinmann, and Oliver Müller
Academic Session II - 10.30 - 12.00, Room: SOE-E-2

Outcome vs. performance? Biased decisions behind management changes: Evidence from soccer


Success of organizations, and especially firms, depends on its management. Hence, some of the most important organizational decisions concern the selection of qualified managers. When a manager—in particular, the team he or she is responsible for—delivers not the desired outcomes, organizations often react by replacing the manager. However, the measurable outcome must not necessarily reflect the true performance of decision-making, but can be subject to various unobserved effects or even luck. For instance, when a company operates in a shrinking market, its sales managers might still have done an excellent job by maintaining the company’s market share, though their outcome of constant sales might not be as desired. Likewise, in soccer, the measurable outcomes—e.g., match victories—yield a misleading indicator of the true in-game performance, as the latter must incorporate the strength conditional on the opponent.

Using a unique soccer dataset, we empirically test whether management changes are driven by measurable outcomes (e.g., match results, table position) or the in-game performance (i.e., team strength during games). We find that a board decision to replace a manager is primarily based on outcome metrics and less on the actual in-game performance. Our work has direct managerial implications for better evaluations of managers and avoiding unfavorable biases behind management changes.


We draw upon an extensive dataset consisting of 1,140 matches played in the English Premier League in the three seasons between 2013/14 and 2015/16. The dataset contains 21 management changes where the board dismissed the team manager. For each match, the dataset further entails the direct outcomes; that is, the match result and the position in the table. It also includes performance metrics as given by the attack and defense strength based on a time-dependent version of the established Dixon and Coles (1997) model.


Since we are interested in management changes, our dependent variable management change is a binary variable indicating whether a team manager was dismissed (1 = dismissed). We argue that management changes are either influenced by the measurable outcomes or the in-game performance (or both). The former category of match outcomes includes variables such as scored goals scored, conceded goals, runs of subsequent losses and the position in the table. The latter performance variables comprise of the attacking and defense strength of a team, as well as the same variables for the opponent.

Model development/specification.

A logit model was specified such that the effect of in-game performance and match outcomes was estimated on the binary variable indicating management changes. We used a Bayesian estimation procedure. Among other advantages, it allows us to better check for potential multicollinearity issues and makes no explicit distributional assumptions. Consistent with earlier research, we applied oversampling to address the imbalances in our dataset.


We find that outcome-related variables consistently stronger impact management changes than performance-related variables. Clubs have a higher propensity to replace managers after few goals scored (coefficient of −0.718; SE=0.001) or larger number of goals conceded (−0.616; SE=0.001). Similarly, losing streaks (i.e., several games lost in a row) have a positive relationship with management changes (0.890; SE=0.001). Conversely, we observe smaller effect sizes from the own strength of the attack (−0.445; SE = 0.001) and the defense (0.250; SE=0.001). The corresponding coefficients from the opponent are even higher in magnitude.

Altogether, we show that management changes are mainly driven by measurable outcomes and less by performance-related variables.

by Raphael Flepp and Egon Franck
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Wise and unwise in-season head-coach dismissals

Most recent studies find that in-season head-coach dismissals in professional football teams do not improve team performance (e.g., Balduck, Buelens, & Philippaerts, 2010; van Ours & van Tuijl, 2016). Even though the team performance generally improves after a dismissal, the effect is spurious because the performance of a control group of teams who kept their coach also increases. However, no study differentiated between dismissals in which the work of the coach and thus the true team playing quality on the pitch was indeed below expectations (wise dismissals) and dismissals in which the team had a sequence of bad results due to bad luck (unwise dismissals). Thus, the effect of wise dismissals might be disguised if the average effect of all dismissals is analyzed. Using in-season coach dismissals in the English Premier League, the French Ligue 1, the German Bundesliga, the Italian Serie A, and the Spanish La Liga within the four seasons between 2013/14 and 2016/2017, we replicate the methods employed by van Ours and van Tuijl (2016). To distinguish between wise and unwise dismissals, we draw on expected goals as a performance evaluation measure. This measure is less prone to random variation and reflects the true team playing quality on the pitch better than actual match results (Brechot and Flepp, 2018). In particular, we employ the difference between a team’s rank in the official league table and the rank in a table based on expected goals. We define a dismissal as wise if the rank based on expected goals is equal or worse than the rank in the official league table. In this case, a team performed below expectations due to poor playing quality on the pitch. By contrast, we define a dismissal as unwise if the rank based on expected goals is better than the rank in the official league table. Here, a team performed below expectations due to bad luck. The counterfactual non-dismissals of the control group teams are defined reversely. A non-dismissal is wise if the rank based on expected goals is better than the rank in the official league table and unwise otherwise.

When we compare all actual dismissals to all counterfactual non-dismissals, we find that both groups improve their performance similarly. This result is in line with van Ours and van Tuijl (2016) who conclude that there is no positive performance effect after the replacement of a head-coach. However, when we compare wise dismissals to unwise non-dismissals, we find that the performance after wise dismissals significantly improves, whereas there is no effect on performance for unwise non-dismissals. Furthermore, the performance after both unwise dismissals and wise non-dismissals improves similarly. Thus, there is no benefit from changing the coach if the team was only performing below expectations due to bad luck. These results contribute to the debate on whether coach dismissals affect team performance by showing that dismissing a coach is only beneficial if the string of bad results was indeed due poor playing quality on the pitch.

by Garry Gelade
Academic Session I - 10.30- 12.00, Room: SOE-E-1

Are fixtures between "rival" teams more hotly contested than fixtures between non-rivals?

It is widely thought that fixtures between ""rival"" teams are more hotly contested than fixtures between non-rivals. This paper examines whether this conjecture holds in English elite soccer.

The research consisted of two studies

Study 1: Structure of rivalry relationships

Rivalries were determined from a survey of 1,200 soccer fans, who were asked to identify their tems main and secondary rivals. Three types of rivalry relationship were found. An R0 (null) rivalry exists between Team A and Team B when neither group of fans identifies the other as team as rivals. An R1 (one-way) rivalry exists between Team A and Team B when Team A fans identify Team B as rivals, but Team B fans do not identify Team A as rivals. An R2 (reciprocal) rivalry exists between Team A and Team B when both groups of fans identify the other team as rivals. Network analysis revealed 10 rivalry clusters which were found to be largely geographically organised. This mirrors findings from the organizational research literature indicating that geographically proximate firms compete more intensely than distant ones do (Baum & Mezias, 1992; Porac, Thomas, & Badenfuller, 1989; Yu & Cannella,2007). Rivalry was also more salient between teams that had met more frequently.

Study 2: Consequences of Rivalry

I examined whether matches between rivals differed from other matches using on-the-ball performance metrics (OPTA event-level data). A (generalized) linear mixed model was specified for each metric. The independent variables were random effects for both teams, and a fixed effect for rivalry type. Comparison of the rivalry effects revealed statistically significant differences. R2 fixtures are more violent and more ill-disciplined than either R0 or R1 fixtures; there are more discplinary offences and more tackles. Although there are fewer passes in R2 matches, there is a higher percentage of forward passes. This suggests a less flowing but more urgent style of play, which would be consistent with heightened tension amongst the players. Overall the conjecture that matches between rivals in elite soccer are more hotly contested than other matches was supported."

by Andreas Groll
Academic Session IV - 16.00- 17.30, Room: RAA-E-27

A Comparison of Covariate-based Prediction Methods for FIFA World Cups

Many approaches that analyze and predict the results of international matches in soccer are based on statistical models incorporating several potentially influential covariates with respect to a national team's success, such as the bookmakers' ratings or the FIFA ranking. Based on all matches from the four previous FIFA World Cups 2002-2014, we compare the most common regression models that are based on the teams' covariate information with regard to their predictive performances. Furthermore, an alternative modeling class is investigated, so-called random forests (Breimann, 2001).

Within the framework of Generalized Linear Models (GLMs), the most frequently used type of regression models in the literature is the Poisson model. It can easily be combined with different regularization methods such as penalization (see, e.g., Groll and Abedieh, 2013; Groll et al., 2015) or boosting (Groll et al., 2018). Moreover, we analyze different predictor structures, including team-specific ability parameters and extensions to smooth, non-linear effects for metric covariates, which also can be tackled by suitable boosting techniques (compare, e.g., Bühlmann and Hothorn, 2007).

Random forests can be seen as mixture between machine learning and statistical modeling and are known for their high predictive power. Here, we consider two different types of random forests depending on the choice of the response. One type of random forests tries to predict the precise numbers of goals while the other type considers the three match outcomes win, draw and loss using a special algorithm for ordinal response recently proposed by Hornung (2017).

For all these different modeling techniques the predictive performance with regard to several goodness-of-fit measures is compared. Based on the estimates of the best per-forming method all match outcomes of the FIFA World Cup 2018 in Russia are repeatedly simulated (1,000,000 times), resulting in winning probabilities for all participating national teams.

1. L. Breimann. Random Forests (2001). Machine Learning, 45, 1, 5-32.
2. P. Bühlmann and T. Hothorn (2007). Boosting Algorithms: Regularization, Prediction and Model Fitting (with Discussion). Statistical Science, 22, 4, 477-505.
3. A. Groll and J. Abedieh (2013). Spain retains its title and sets a new record – generalized linear mixed models on European football championships. Journal of Quantitative Analysis in Sports, 9, 1, 51-66.
4. A Groll, T. Kneib, A. Mayr, and G. Schauberger (2018). On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA European Football Championship 2016, Journal of Quantitative Analysis in Sports, to appear.
5. A. Groll, G. Schauberger, and G. Tutz (2015). Prediction of major international soccer tournaments based on team-specific regularized Poisson regression: an application to the FIFA World Cup 2014. Journal of Quantitative Analysis in Sports, 11, 2, 97-115.
6. R. Hornung (2017). Ordinal Forests. Technical Report, Department of Statistics, LMU, 212.

by Jessica Kunert
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

The Phantom Menace or a New Hope?: How German Sports Journalism is Affected by Automation in the Newsroom

How German Sports Journalism is Affected by Automation in the Newsroom

This paper analyses how automated text and video generation affects sports journalism practices. Automated journalism has made its way into newsrooms and news agencies worldwide (Fanta 2017). This specific form of journalism is defined as “algorithmic processes that convert data into narrative news texts [and videos] with limited to no human intervention beyond the initial programming” (Carlson 2015: 417). Thus, texts and videos may be generated from structured data, such as the number of goals and yellow cards in football [soccer]. Automated techniques are especially of value for the amateur level of sports, which has been mostly neglected in media coverage due to lack of resources and time.

While the impact of automated practices on journalism in general have been studied widely (e.g. Dörr 2015,), the sports beat has not yet received much attention (one exception is Galily 2018). Sports journalism is, despite its richness in data, often sidelined in these discussions. It is claimed that sports texts and videos need (human) creativity and display a range of emotions (Boyle 2006: 9-14), especially when compared to other data-intensive beats (e.g. finance), which are more suited for a neutral tone. However, research has shown that the audience cannot differentiate between human-written and automatically created texts, even in the case of sports (Graefe et al. 2016). Nevertheless, common pitfalls of automated journalism also apply to sports, e.g. the need to predict news angles in advance that may go beyond a simple win/loss situation, and the reliance on possibly erroneous data streams (Clerwall 2014).

This paper studies the perception of sports journalists in Germany regarding automation of text and video in their newsroom with guideline-based interviews. Building Thurman, Dörr and Kunert’s research (2017) on British journalists from a plethora of beats, this study focuses exclusively on sports journalism and its specificities (e.g. emotional content, dramatization). The data collection is two-fold, as journalists from a wide range of outlets (e.g. reach, publication type) are interviewed, but also the technology providers in Germany. Taken together, the results offer a wide range of viewpoints from different news outlets and companies.

Preliminary results show that sports journalists who cover professional sports, while acknowledging potential, are not regarding automated journalism as a fixture in their beat anytime soon. Mostly the constraints of automated journalism are named, such as a lack of creativity of the stories created by the software. Also, that contextual events are not factored in by the software is criticized, as, for example, the result of a match may be secondary when compared to other factors such as protests in the stands. Data collection is ongoing.

We argue that sports journalism cannot be treated the same as finance or crime journalism with regard to automation. In addition, even though rich data is available for some sports in Germany, many other sports might not be able or want to provide an ongoing stream of data. So, automation may not be achievable or even desirable for all sports, which is also a topic the interviews cover. Moreover, with the arrival of automated journalism, long-standing traditions of reporting are under scrutiny – with an automated newsfeed, articles on all matches of that day may be created at once, without attending a single one of them.

This paper sheds new light on how automated journalism techniques are used in sports reporting, and what the future may hold for the relationship between data and sports journalism.

by Michael Lechner, Gabriel Okasa, Michael Knaus, Daniel Goller, and Alex Krumer
Academic Session IV - 16.00 - 17.30, Room: RAA-E-27

SEW Soccer Analytics

We develop a flexible machine learning estimator for the probabilities of ordered ordinal outcomes, based on the random forest algorithm. The estimator generalises common estimators like ordered probit or ordered logit maximum likelihood and is able to recover essentially the same output as the standard estimators, such as probabilities and marginal effects. In particular, we use the resulting predictions of the probabilities for a draw, a home win, and an away win, for the games of the German Football Bundesliga (BL1) beginning 2006 to simulate a league table for every game day given the (large) information available up to that date. This combination of a simulation approach with machine learning finally allows us to come up with statements about the likelihood that a particular team is reaching specific places in the final league table (i.e. champion, relegation, etc.).

by Jeremy Losak
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Daily Fantasy Sports: Chance or Skill, Applying The Efficient Market Hypothesis

A major legal question in the United States the past few years is whether daily fantasy sports are games of chance or skill, and therefore whether or not they constitute gambling. This paper presents an maximum likelihood estimation approach to estimating the efficiency of DraftKings' player pricing mechanism for the National Football League. The results of the paper show strategies that can be employed to take advantage of their pricing mechanism. This violates the efficient market hypothesis assumption, providing evidence that certain elements of daily fantasy sports involve skill, and that a long-run strategy exists for participants to win money.

by Haluka Maier-Borst
Data Journalism Session I - 10.30 - 12.00, Room: RAA-G-01 AULA

Can A Machine Spot the Next James Rodríguez?

Four weeks can change a career. James Rodriguez was a young talent that played for AS Monaco before he came to the world cup. After an impressive performance with Colombia and a goal that was called “a gift to football”, his market value skyrocketed and he was eventually bought by Real Madrid. But how much of this hype can be calculated? And how much of this is the pure irrationality of football? To determine this, I firstly gathered players’ stats from the last world cup and data on their market value before and after the world cup. And then I trained a machine learning algorithm with this data to determine whether it can understand the craziness of the player´s market.

The results will be an explainer-article on what were the best indicators for pricey player in 2014 and a machine based market-watch during the current world cup. The idea is to rank players based on their performance and to predict changes in their market value. On the conference, I would like to talk about the possible success or failure of this method.

by Iuliia Naidenova, Petr Parshakov, and Sofia Paklina
Academic Session I - 10.30 - 12.00, Room: SOE-E-1

Determinants of football fans’ happiness in Russia

One of the main purposes of professional football games is entertainment for match attendees and television audiences. This social role of football is generally considered in investing decisions as building new stadiums (Castellanos & Sánchez, 2007; Coates, 2015; Mondello & Kellison, 2016) or sponsoring of football clubs by local authorities (Castellanos et al., 2011; Storm & Nielsen, 2012). We assume that fans’ happiness or satisfaction reflects the level of entertainment quality of the match. Following this idea, we plan to investigate the determinants of happiness during the football matches of top division clubs in Russia. Fan preferences are generally analyzed through the demand for sports concept (see, for example, Gasparetto & Barajas, 2017; Nüesch & Franck, 2009; Feddersen & Rott, 2011; Coates et al., 2014, 2017). However, this approach focuses on expectations about the game, not its real characteristics. In the current research, we plan to investigate fan’s satisfaction ex-post. Another approach to analyze fans’ interest for a football game is based on surveys related to willingness-to-pay for a game ticket (Forrest & Simmons, 2002; Nalbantis et al., 2017). This approach gives detailed data but is criticized for biased results.

Research methodology is based on regression analysis, where the dependent variable is a measure of football match attendees’ happiness. This happiness measure is the result of collection and analysis of pictures in the most popular Russian social network VKontakte (meaning InContact). Using web pages of home stadiums of five top football clubs, we identify the photos related to football games. Then, the dataset of photos analyzed with the Emotion Recognition software developed by Microsoft Face API. The software takes a facial expression in an image as an input and returns the confidence across a set of emotions for each face in the image. Emotion Recognition software was successfully used in Boychuk et al. (2016) to recognize emotions and their level during football games with fights. Then we will try to identify the main drivers of fan's emotions related to game and team features. The contribution of the research lies in the new way to analyze the drivers of football fans’ satisfaction that potentially increase fans’ inclination to attend the following matches.

by Cornel Nesseler, Iuliia Naidenova, Petr Pershakov, and Aleksei Chusovliankin
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Political Discrimination in Football: Evidence from Russian and Ukrainian Leagues

This paper examines the issue of employee discrimination based on political reasons. Previous studies concentrate mostly on racial or gender discrimination. The recent political crisis in Russian-Ukrainian relations, which began on 21 November 2013, provides us with a setting to test if political issues might cause a discrimination (we describe the Crimea Crisis in more detail in the paper). We assess the presence of discrimination by examining the data on the minutes played by Ukrainians in the Russian Football Premier League and vice versa. Thus, do either Ukrainian or the Russian consciously clubs choose not to hire Russian or Ukrainian players after the Crimea crisis?

Simple descriptive statistics show a sharp decrease in the season 2015-2016 and 2016-2017 for Russian players in the Ukraine league (thus after the crisis). No similar decrease is visible for Ukrainian players in Russian (we visualize this result with a graph in our paper).

However, such a decrease might be just of a result of the lower skill of a particular Ukrainian players. It is necessary to analyze minutes played conditional on the skill of a player. For this reason, we estimate the following regression equation for Russian Football Premier League:

log⁡(minutes)=β_0+β_1⋅ transfermarkt.value+β_2⋅ age+β_3⋅ Ukrainian+ β_6⋅ after.crisis+β_5⋅ Ukrainian.after.crisis+γ⋅SEASON+δ⋅TEAM.

Here transfermarkt.value is an estimate of players current transfer value from the, age is player age, Ukrainian is the binary indicator of nationality, after.crisis is a binary indicator of seasons started after December 2013, Ukrainian.after.crisis is an interaction term of last two indicators. SEASON represents season dummies and TEAM is for team dummies. Such an identification strategy is similar to difference-in-difference: our treatment, which is the political crisis, is exogenous and we have control and treatment group of players. The model for Ukrainian league is the same with a dummy indicator for being Russian instead of being Ukrainian.

Our dataset includes 4384 players from Russian Football Premier League and 3981 players from Ukrainian Premier League. Our first model is for Russian league, second is for Ukrainian league. Our results show that, while controlling for the above mentioned covariates, Russian players play significantly less after the crisis than before. However, the playing time for both Russian and Ukrainian players in the Russian league did not change.

Those results fit to the sociological and political explanation that the Ukrainian government and most citizens valued the Crimea crisis as an attack on its sovereignty while the Russian government framed the Crimea Crisis as a "natural annexation".

by Johannes Orlowski, Helmut Dietl, and Jorge García-Unanue
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Management dismissals and their effect on employee effort and ability: Evidence from the German Bundesliga

Previous literature has examined the effect of management changes on organizational performance (e.g., Shen & Canella, 2002). Providing empirical evidence for such effects proves, however, to be burdensome due to limited data availability. Therefore, several authors have turned to the field of sport, to examine the impact of head coach dismissals on (short-term) teams’ performance (see van Ours and van Tuijl, 2016 for a recent review). Results remain, however, inconclusive. While some authors found positive, significant (e.g., Audas et al., 1997) or negative significant (e.g., Salomo & Teichmann, 2000) effects, the majority of research found no evidence of an effect of management dismissals on team performance (e.g., de Paola & Scoppa, 2008).

This research argues, in line with tournament theory (e.g., Lazear & Rosen, 1981) and the literature on promotions (e.g., Prendergast, 1993), that the change in management can be associated with a change in individual incentives to perform which might not necessarily be beneficial to team performance. Building on Höffler and Sliwka (2003) individual performance is assumed to be a function of effort, ability. The change in management is expected to have an influence on individual performance via two mechanisms. First, the new manager likely has less information regarding the perfect allocation of players to the respective positions. This might lead to an ability/task mismatch and consequently to lower performance. Second, the players have a higher incentive to perform well in front of the new coach in order to present themselves and get assigned to their preferred task, i.e., get a spot in the starting formation.

The aim of this research is, therefore, to analyse how a change in management, i.e., football head coaches, effects the effort and ability of employees, i.e., players. This research argues that results will potentially shed more light on the relationship between management changes and organizational performance. Further, findings might illustrate in how far changes in organizational performance can be attributed to employee effort and ability, respectively. allowing to gain a better understanding of the relationship between managerial changes and team performance. Results could have implications both for human resource management in general, and sports team owners specifically.

The underlying player-game day data stems from 957 players playing in 1,224 German Bundesliga matches covering seasons 2012/2013 to 2015/2016. The final sample for analysis consists of n= 22,748 observations. Data was collected from various public accessible data sources including the official website of the German Bundesliga ( for performance and match data,, and for contextual data such as substitutions and player information. In total eight fixed effects regression models are estimated using various measures of player effort and ability as dependent variables. Effort measures include total distance covered (in km), number of intensive runs, and sprints. Player ability is measured via possessions, shots on target, crosses, successful tackles and the pass completion rate. The main independent variable, i.e., management change is measured as a dummy variable being equal to 1 on the first game day under the new coach. Further, four dummy variables leading up to and following the event of the management change are included to uncover lead/lag effects.

Initial results reveal that while the level of effort within the team significantly increases due to the managerial change, the ability of the team remains unchanged or even significantly decreases. This might support previous literature which found no or negative significant effects of managerial turnover on team performance (e.g., de Paola & Scoppa, 2012).

by Anil Özdemir, Helmut Dietl, Giambattista Rossi, and Rob Simmons
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Are Workers Rewarded for Inconsistent Performance?

Following Lazear (1998) a small body of personnel economics literature has considered whether workers who demonstrate greater performance inconsistency than comparable workers of similar average productivity are rewarded more highly. Lazear conjectured that there would be an ‘upside potential to risky workers’ so inconsistent performers would be more highly rewarded in salary due to their capability of providing extraordinarily high productivity albeit on a few occasions.

Recently, Deutscher et al. (2017) and Deutscher and Büschemann (2016) have studied the relationship between player salaries and performance variation in, respectively, National Basketball Association and German Bundesliga soccer (using player ratings from Kicker). For basketball, the authors found that players were more highly rewarded for consistency in performance rather than inconsistency. More consistent players produced more expected points for their teams and were rewarded with higher salaries. However, the results of Deutscher and Büschemann (2016) for Bundesliga soccer point in a different direction. Deutscher and Büschemann found, contrary to the basketball study but in accord with Lazear’s upside potential of risky workers, that higher variation in player ratings was positively correlated with player valuation (salaries).

Our study takes actual salary data (not a proxy measure of ‘player value’) and actual performance data (not journalists’ assessments) from Italian Serie A soccer. Here, we take match-level player performance data from four seasons from 2009/10 to 2012/13. The novel performance data were purchased from Panini Digital which supplies these and other data to Italian clubs.

We model player salaries as a function of player productivity measures (mean and coefficient of variation). Salary levels at time t are regressed on performance levels and associated coefficient of variation from season t-1, where these performances may come from a different club if the player has switched teams. We can assess player salaries by different contract types. Moreover, we have detailed on-field performance data (e.g., shots, shots on target, assists, passes etc.), which we will use to understand the performance drivers.

This model is estimated by OLS, with and without fixed effects, by quantile regressions and with suitable controls. In our preliminary estimation, we use IVG as a composite performance metric compiled by experts at Panini Digital comprising weights on the key performance measures noted above.

Our focus is on the sign and size of the coefficient of the variation of player productivity. The positive sign on this coefficient in the estimates indicates that performance volatility increases player salary in Italian soccer, thereby supporting Lazear’s hypothesis of upside potential of risky workers. A one-unit increase in the coefficient of variation of aggregated performance increases player base salary by 12.5%. Given this preliminary result, we shall proceed to show in detail which on-field performance indicators are driving this result on volatility.

Bollinger, C.R. and J.L. Hotchkiss, 2003, “The Upside Potential of Hiring Risky Workers: Evidence from the Baseball Industry”, Journal of Labor Economics, 21, 923-944.
Deutscher, C. and A. Büschemann, 2016, “Does Performance Consistency Pay Off Financially for Player? Evidence from the Bundesliga”, Journal of Sports Economics, 17, 27-43.
Deutscher, C., Gürtler, O., Prinz, J. and D. Weimar, 2017, “The Payoff to Consistency in Performance”, Economic Inquiry.
Lazear, E. P., 1998, “Hiring Risky Workers.” I. Ohashi and T. Tachibanaki, ed., Internal La-bour Market, Incentives, and Employment: New York: St. Martin’s.

by Petr Parshakov, Dennis Coates, and Sofia Paklina
Academic Session II- 10.30 - 12.00, Room: SOE-E-2

Video games and unemployment

In this study we use eSports prizes by country as a proxy of video games popularity to analyze its influence on youth unemployment. Video games are treated as an innovation in leisure activity, which makes being unemployed more attractive than before, especially in rich countries because of cohabitation. We use the total prize money won by representatives of a country in a season in a panel regression model with country-year as a unit of observation. Our preliminary results shows positive influence of video games popularity to youth unemployment.

by Surabhi Pasarakonda, Jan B. Schmutz, Patrick Lüthold, Pedro J. Ramos-Villagrasa, and
Gudela Grote
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Never Change a Winning Team—But What if Disruptions Do? Team Familiarity’s Role in Mitigating Harmful Team Based Disruptions

It is the final of the European Football Championship 2016 and Portugal is playing against France. After 24 minutes, Portugal’s star captain and record scorer, Cristiano Ronaldo, collides with an opponent and leaves the game injured—a shock for the Portuguese team. The team has to react to the severe disruption of their team constellation by replacing their star striker. Nevertheless, Portugal remains well-coordinated. The team is able to quickly adapt to the changing circumstances, beat France and win their first European football title. However, such a situation can also go wrong. A current example is the UEFA Champions League Final 2018 in which Liverpool plays against Real Madrid. After 25 minutes, Liverpool’s goal getter and superstar Mohammed Salah gets injured and has to leave the game. But in this case, Liverpool’s team is unable to cope with the pressure of the final game, has troubles to adapt their coordination appropriately to this sudden disruption and ultimately loses the Champions League Final to Real Madrid.

Sports teams need to be able to adapt to changing conditions to remain coordinated and perform well. In our study, we focus on how teams adapt to high impact, unplanned and uncommon triggers which may affect team coordination—so called Team-based disruptions (TBD). We look at football teams where TBD such as injuries or player suspensions can force teams to exchange or withdraw team members during a game. These changes in team constellation or team size are directly linked to coordination and therefore require the teams to actively redistribute roles and responsibilities in order to maintain good team coordination and performance. We are interested in understanding why certain football teams can adapt better to TBD than other teams. We suggest that team familiarity (TF) is an important factor when it comes to TBD. TF is defined as the shared experience a team has working together. We investigate the effect of TF on team performance with coordination as a mediator over time. Further, we hypothesize that TF has a non-linear relationship with coordination. Here we expect to find an inverted U-shape effect between team familiarity and coordination.

We obtained data of all English Premiere League (EPL) games from seasons 2006/2007 until 2011/2012 from Opta Sports. The data contains information on general statistics of the game (e.g., ball possession, shots on target, no. of goals etc.), as well as information on disruptive situations (e.g., suspensions due to yellow-red and red cards and injuries). Additionally, we obtained data of careers and transfers of EPL players from Transfermarkt to be able to identify how long they have been in a given team.

We are currently analysing the data using nonlinear regression techniques and time series analysis. We expect to find that TBD have a stronger negative influence on coordination patterns when team familiarity is too low and too high. We therefore expect to find a turning point where a certain amount of team familiarity is ideal for the teams coordination. Our results should advance our understanding of how structural changes in sports and potentially other types of teams affect their coordination and performance. If our hypothesis holds, team familiarity will be an important factor to consider when managing a team. Our results will allow to give managers of sports teams and teams more generally suggestions on how to capitalize on team familiarity.

by Giambattista Rossi, Luca Andriani, and Antonio Zinilli
Academic Session I - 10.30 - 12.00, Room: SOE-E-1

Networks in the market of football agents

Networks can be thought of as evolving social structures with interdependencies between individuals or groups in society (Fulse, 2008: Poli, 2008). They are differentiated by their size and quality and are constantly developing as opportunities arise for individuals within the network to change their function or make new connections. While a combination of different elements determine the strength of networks including time invested and emotional intensity, reciprocal exchanges in particular are thought to create a sense of trust which increases the probability of future cooperation whilst also helping an individual to develop a good reputation (Granovetter, 1985). In football, networks are particularly important for information to pass between buyers (clubs) and sellers (players). Recognising that information does not necessarily flow directly between these parties the agent has become an entrepreneur who notices and takes advantage of these gaps in networks, known as ͞structural holes͟ (Burt, 1992).


This paper analyses the network of football players’ intermediaries within the big five European leagues (Spain, Italy, Germany, England and France). The dataset covers nine seasons starting from season 2010/11. By cross checking a plurality of sources, we managed to ascertain the intermediary for almost 91% of the players. Additionally, for each player we have players’ market value in order to assess the potential economic power for each agency and its network.


The paper uses two approaches to study the dynamic complex networks: first it identifies the observed distribution of links among agents through distribution models, then it uses a stochastic model to understand how the links between players and agents change over time. The methodology, implemented by Clauset et al. (2009), measures power-law behaviour compared to other distributions (e. g. exponential, stretched exponential and log-normal) aiming at fiting a power law to this data and understanding if there are alternative models fit the data better. The probability distribution of agents has been weighted multiplying the market fee and the number of connections (degree) and examined as a fundamental analysis for understanding the statistical characteristics. After determining the type of distribution that best approximates our data, a set of model specifications are proposed for the analysis of multilevel network structures. A statistical analysis of exponential random graph models (ERGMs) for multilevel network is proposed involving the interactions between nodes’ attributes and network structural effects across levels. We illustrate our methodological proposal using data on hierarchical subordination and informal communication relations between agents and players.


The data does not exhibit exact power-law behaviour, but rather behaves in a lognormal form. We show that lognormal form describe very well the distributions of the strength of agents. The lognormal distribution is characterized by one positive asymmetry (right tail) due to the fact that at a high frequency of low values is associated with a tail of values much less frequent but, at the same time, very high. Investigating in depth we can affirm that only the upper tail of the distribution can be approximated by a power law (we estimateted a power law with cut-off). This means that the upper tail integrates the Yule process, which is able to explain the evolution of some properties of large systems in players’ market. In a Yule process, the agents get increments of the degree in proportion to their present value of that degree (preferential attachment). Thus, the probability of an agent obtaining a new player is proportional to the number of players it already has; the probability that a new player is chosen by well-established agent is higher than the probability of her being chosen with a less known agent.

by Jure Stabuc
Data Journalism Session I - 10.30 - 12.00, Room: RAA-G-01 AULA

Football match reporter

Football Match Reporter (FMR) is an algorithm that generates match reports for the English Premier League (EPL) immediately following the final whistle. With FMR there is no forward planning needed before the match which saves time for reporters to prepare for other aspects of the match. In this paper we discuss the prototype implementation of the FMR algorithm using Python and present the initial results of using the algorithm to generate match reports for a set of historical football matches.

The motivation for this work is driven by a desire to produce accurate, interesting and readable match reports in a timely and efficient fashion. Football fans, usually able to consume only one match at a time live frequently use match reports to gain information on how other games in the league have progressed. Automated text generation is in this case important for news organisations: with the ever-ongoing competition between them over who will publish first, the algorithm plays a crucial role.

To be able to generate articles about EPL, good structured match-event data needs to be provided to the algorithm. Since such data is not open source or freely available, data for the algorithm prototype was scraped from, a website specializing in football analysis. Data for 20 matches of the 2016/2017 season were collected, including score, venue, attendance, etc. To add linguistic variety and additional information to the generated article, the player information database from the same website was also used as input. To give table position reference, match day tables from the official website of the EPL were used.

FMR consists of a collection of parts which form an entity in the form of a written match report. After the collection of data containing information about goal scorers, possible assistants, scoring teams, scoring times and goal type, a number of functions are run in order to produce the match report. These functions can be described as follows: The headline function chooses a template from a group of headlines and fills it with information. The goals function produces the main body of the match report. Depending on the goal event, a template is randomly chosen from the group of templates in and a similarity comparison is carried out with already generated content. This prevents the algorithm from choosing similar templates within the report. Finally, the table function returns a conclusion depending on the result and table position. All functions make use of grammatical functions, in order to produce an accurate report.

The output of this algorithm is a a complete short article which starts with a headline referring either to the teams playing or a player's performance. This is followed by the main body of text where crucial events of the match (goals and assists) are mentioned. It finishes with a conclusion, where it explains what the result brought for the teams in the matter of table standings and how many points each of them has. This algorithm was created by the author as part of an MSc dissertation during the MSc Computational and Data Journalism at Cardiff University. Source code and example output are available, and this talk will include a demo of the system.

by Arseniy Stolyarov and Gleb Vasiliev
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Thirst for Glory: Are Players Motivated by Milestones?


Today the increased media activity along with the high availability of data makes players focus more on individual stats. Several unofficial strikers’ clubs (e.g.Fedotov club in Russia, Blokhin club in Ukraine) collect statistics on the number of goals scored by players. Do players score more frequently when they are close to the inclusion into such ceremonial clubs? Our paper analyzes this issue via statistical and econometric methods. If the hypothesis of better performance of players reaching a milestone is true, then it may be beneficial:

1. To coaches, that may take into account a higher likelihood of a goal by a player that enters the ceremonial club while choosing the line-up.
2. To both gamblers and booking companies as they may better estimate the probability of a goal by a player, who is close to the entrance into a ceremonial club.

The Data

The sample consists of players from top-5 European leagues that scored 100 or more goals in one league during the last 20 years. The data from leagues’ websites is used along with the news from sports newspapers.

Hypotheses and Statistical Tests

The main hypothesis is that players score goals more frequently if the number of total goals scored by them in one tournament approaches a round number (e.g. 100, 200. etc). This hypothesis is tested using following statistical procedures: 1. Assuming exponential distribution of the time between two subsequent goals scored by the player, the estimate parameter lambda 1 of coefficient is obtained. Then, we obtain another estimate lambda 2 of λ2 using the sample of 100th goals and test λ1 = λ2 against λ1 < λ2. We make robustness checks on the different subsamples for the lambda 1 estimation and obtain statistically significant difference between the two.

2. A regression of the form t_i = β_0*d_i β_iXi is estimated, where ti stands for the time between two goals, di stands for the dummy variable which is equal to 1 in case of next goal being 100th in the league by the player, x denote control variables including the strength of the opponent, form of the player, average time between the goals of the player, etc. Several specifications are used and test for β_0 = 0 is performed in each case, in some specifications β0 is significantly greater than 0.

3. The test that checks whether the proportion of individuals, who scored 10th goal in the last game among those, who had 9 goals scored in the tournament before the last game is statistically greater than the proportion of individuals who scored 11th goal in the last game of the tournament among the propotion of individuals who had 10 goals scored before the last game of the tournament. The first proportion is significantly greater than the second one for the English Premier League. Tests for other leagues will be made soon.

by John Templon and Rosalind Adams
Data Journalism Session II - 14.00 - 15.30, Room: RAA-G-01 AULA

The Edge - Top-Level Figure Skating Judges Consistently Favor Skaters From Their Home Countries. Now Many Of Those Judges Are At The Olympics.

Figure skating, one of the most popular sports at the Winter Olympics, has a problem: Scoring is often slanted in favor of the judges' home countries. In this exclusive analysis, BuzzFeed News showed that one-third of the officials selected to judge the 2018 Winter Olympics had, in recent seasons, demonstrated a home-country preference so strikingly consistent that the odds of it occurring by random chance were less than 1 in 100,000.

In addition to the ground-breaking analysis, BuzzFeed News reporters John Templon and Rosalind Adams spent three months interviewing more than 20 current and former judges, coaches, and skaters around the world. These interviews revealed that home-country preference has been a well-known problem within figure skating and that the International Skating Union (ISU) — the organization that runs the world’s elite figure skating competitions — has done little to combat it.

After a vote-trading scandal at the 2002 Winter Olympics in Salt Lake City, the ISU made two major changes. First, it anonymized judges’ scores, which the ISU said would keep judges from feeling pressured by their national federations. Second, it replaced the old system where judges gave just two marks per performance — artistic and technical — with a more complicated formula, which the ISU said would put more emphasis on a skater’s performance and less on the judges’ subjectivity.

Many criticized the shift to anonymous judging, arguing that it decreased transparency. In 2016, the members of the ISU council voted to abolish anonymized judging. Starting with the the 2016–17 season, the public could finally know which judges provided which scores.

The return to transparency also provided the first chance in more than a decade to detect home-country preferences. And, indeed, our analysis showed that judges — despite the changes to the scoring system — were still giving higher marks to their countries’ skaters.

To perform our analysis, we extracted reams of judgment data from 17 high-level competitions between October 2016 and December 2017 — scores that the ISU only publishes only as PDFs. Then we deciphered the ISU’s complex scoring system and calculated the total score each skater would have received from each individual judge — something that even the ISU does not do.

Our analysis found that home-country preference was widespread.

To identify judges who favored their countries’ own skaters to a highly statistically significant degree, wrote custom Monte Carlo simulations. Using that method, we found 27 judges who up-scored their home-country skaters so consistently that the odds of it occurring by chance alone were less than 1 in 100,000. Sixteen of those 27 judges were among the 48 officials selected to judge the 2018 Winter Olympics in Pyeongchang, South Korea.

We also programmed a replica of the International Skating Union’s own judge-evaluation system, which showed how ineffective the system was for detecting errors or favoritism. We found that the ISU’s system would have flagged barely 1% of all scores for technical elements and an even smaller fraction of scores for artistic components.

Data alone can’t explain why these patterns emerged. The judges — who are chosen by their national federations — might not even have been aware that their scoring showed a consistent pattern, and their judging could simply reflect a preference for a regional style of skating. But in close competitions, judges’ home-country preferences can boost their skaters up in the final standings.

by Gleb Vasiliev and Arseniy Stolyarov
Academic Session IV - 16.00 - 17.30, Room: RAA-E-27

Fantasy Football Meets Machine Learning: the Dynamic Game Case and a Note on Strategy

Fantasy football is a popular game in which participants assemble a squad of real-world footballers and gain points for their successful performance during matchdays. In this paper, we continue our search for optimal ways of playing one of the most popular fantasy football competitions, the Fantasy Premier League, previously started in [3]. In [3], we presented a model for selecting the best possible team for the upcoming gameweek in terms of expected points for all the players predicted by machine learning algorithms.

This article extends this basic myopic model from [3] in several ways. Firstly, we modify our objective function in order to include information on a number of future gameweeks. To this end, we change the target variable in our algorithm and predict the sum of points to be scored by a player in N future gameweeks, where optimal N is estimated empirically. Secondly, we transform the whole integer programming problem into a dynamic one. This requires modifying both the objective function and the set of constraints. Note that this approach allows us to calculate the maximum possible amount of points to be scored by a manager if all the results of the matches are known beforehand [1]. Thirdly, we reformulate the problem of predicting the number of expected points for a set of all players in the league as binary or multi-class classification problem in order to maximize the number of high scorers in the selected fantasy squad.

The remaining part of the work addresses the issue of optimal playing against a specific opponent. For example, consider the league of two players, A and B, where one of them has more points. What is the optimal strategy for A and B with only one gameweek left and when all chips have already been played? In such situation, the optimal choice of one player may depend on the opponent’s squad. For instance, A may try to replicate the B’s squad; in this case, B will have no chance to outperform A in all possible scenarios. In order to be able to incorporate such strategic choices into the playing algorithm, we build a separate predictive model that accepts previous manager’s squad as input and predicts his squad for the upcoming gameweek. The main result of the paper is the significant improvement in the predictive power of the models in comparison to the ones from [3].

[1] Jeroen Belien, Dries Goossens, and Daam Van Reeth. Optimization modelling for analyzing fantasy sport games. INFOR: Information Systems and Operational Research, 3:1–20, January 2017.
[2] Daniel G Goldstein, Randolph Preston McAfee, and Siddharth Suri. The wisdom of smaller, smarter crowds. In the fifteenth ACM conference, pages 471–488, New York, New York, USA, 2014. ACM Press.
[3] Arseniy Stolyarov and Gleb Vasiliev. Predict to Succeed: Optimal Fantasy Football Squad Formation Using Machine Learning Tools. Presented at the International Conference on Economics of Football, Kazan, 2017.

by Daniel Vogler and Tobias R. Keller
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Diversity in sports coverage - A topic modeling analysis of sports media coverage in Switzerland from 2010 to 2017

Sport plays an important role in the coverage of news media. Sport has its own section with specialized journalists and big sport events regularly dominate the headlines. However, little is known about the content of sports in the news media. Research focused so far mostly on single events (e.g. Olympic Games) or single issues (e.g. gender representation). Up to now big scale content analysis projects with focus on sport in news media are very scarce. We fill this gap in research by applying topic modeling to the field of sports coverage and focus on how diverse the news media coverage of sports in Swiss media from 2010 to 2017 is. In our ongoing study, we analyze which sports dominate the coverage and if there is any difference between media types. Additionally, our study shows the proportion of international and national sport events as well as quantitative differences in reporting about male and female athletes in Swiss media.

Our sample included full text print media articles from 11 online news sites from Switzerland from 2010 to 2017 in German language (n=3581). The selected articles stemmed from a larger research project measuring quality in journalism. We therefore worked with pre-structured data and only included articles with sports as the main topic of the article (reference to sport in the headline or lead of the article). We applied a topic model based on LDA (Blei, Ng, & Jordan, 2003) to our text corpus using the “topicmodel” package in R. Before modeling we prepared the text corpus. We transformed all characters to lower case and excluded special characters and common German stop words. We also excluded a customized list of stop words like names of journalists and media outlets or general sport specific terms (e.g. sport, win or lose). To identify the ideal number of topics we ran a perplexity test and identified 66 topics as best fit for our text corpus. We afterwards categorized the topics and manually code if the topic focuses primarily on Switzerland or has a foreign or an international focus. Finally, we measured whether there were quantitative differences in reporting about male and female athletes in the different topics.

We identified topics for specific sports like football, ice hockey or tennis as well as reflexive topics like fan violence, doping and corruption. Preliminary results indicate that football is the most dominant sport in coverage of Swiss media, followed by tennis and ice hockey. We also detect more topics with an international focus. Additionally, our research shows that the names of the athletes define the topics to a great extent and that female athletes are underrepresented in the analyzed media. This is especially true for the coverage of football whereas the coverage of tennis or skiing is more balanced between males and females. From an empirical standpoint, this is the first study which gives insights into the diversity of sports coverage in Swiss news media with focus on differences in the media outlets, a comparison of national and international events and representation of female and male athletes. Methodologically, the combination of data driven identification and manual encoding of topics proved to be a valuable approach for measuring topical diversity in semi-structured datasets of news media articles.

by Pamela Wicker, Johannes Orlowski, and Daniel Weimar
Academic Session II - 10.30 - 12.00, Room: SOE-E-2

Referee behavior and performance in the Football Bundesliga: The role of teams’ running distance and speed

The behavior and performance of referees in professional football are an ongoing discussion. Football referees need diverse skills and characteristics, including stress resistance (Anshel & Weinberg, 1995), communication and player management (Cunningham et al., 2014), deliberate decision-making and physical fitness (Catteeuw et al., 2009; Helsen & Bultynck, 2004). In an effort to make the game of football more attractive, the FIFA has taken several measures to make the game more dynamic, such as sanctions of intentionally delaying the game and changes in the offside rule (Heineke, 2014). With increasing speed and dynamics of the game, the demands for the physical skills of referees have increased as well. For example, top class referees typically run between 10 and 12 km per game, with 10-15% of the distance covered at high intensity (Mallo et al., 2012). Running is important as referees have to be well positioned on the field to improve the accuracy of their decisions (Mallo et al., 2012), meaning that it ultimately affects the quality of refereeing.

Existing research in football has studied a variety of factors affecting referee behavior, including home advantage (e.g., Boyko et al., 2007; Dawson et al., 2007), pressure of the crowd and social forces (Dohmen, 2008; Garicano et al., 2005), nationality (Dawson & Dobson, 2010), and within-game information (Buraimo et al., 2010; Watanabe et al., 2015). Previous studies examining within-game information included several factors, such as the score, injuries, goals, cards, and substitutions. However, existing research has not yet examined whether and to what extent teams’ running performance affects referee behavior. The purpose of this study is to investigate the effect of teams’ running distance and speed on the behavior and performance of referees in the German Football Bundesliga.

The data include all league games played in four Bundesliga seasons (2012/13-2015/16). Information about teams, players, and their running performance was retrieved from and aggregated to the team level. The final sample includes n=2,436 observations on a team-game day basis. Team running performance is measured with the mean value and standard deviation of distance covered (in km), number of intensive runs (>20 km/h and <24 km/h), and number of intensive sprints (>24 km/h). Referee performance is captured with the Kicker grade for each match (1=very good; 6=insufficient) and the evaluation provided by an official Bundesliga referee website (, while awarded free kicks, penalties, yellow and red cards reflect referee behavior. The empirical analysis includes a set of regression analyses to analyze the determinants of referee performance (at game day level; n=1,218) and behavior (at team level; n=2,436). The models also control for home games, attendance, running track, uncertainty of outcome, player nationality, and referee characteristics (height, FIFA referee etc.) and other unobserved characteristics through team and season dummies.

Initial regression results for referee performance (Kicker grades) show a significant negative effect for average runs: the more intensive runs the teams on the pitch perform, the better the Kicker grade. A more detailed analysis reveals that the average number of sprints by away teams has a significant positive effect, meaning the more sprints, the worse the grade. Average distance covered is insignificant, implying that referees have a sufficient level of general physical fitness so that their performance is not influenced by player running. Yellow cards for home and away teams have positive effects and yield a worse grade, respectively.

by Bruno Wüest and Garret Binding
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

Context-driven evaluation of single player performances in Swiss ice hockey using large data sets, expected goal values and elastic net regressions

In team sports, trainers, officials, spectators and journalists are demanding ever more precise information on the performance of individual players. While the skills and motivation of individual players are therefore increasingly in the focus, it is easy to forget that such individual performance must always be placed in the context of the other players in the team and the opponents. Last but not least, a single player is only as good as the line-mates and the opponents allow. However, a reliable context-driven evaluation is not easy to achieve, especially in ice hockey. This sport is particularly demanding because of its rapidly changing game situations and its high playing speed.

We start from proprietary data collected manually by the Swiss Ice Hockey Federation at each game of the Swiss National League in the last two seasons. At each game, two data collectors per team recorded the events on ice in real time. The data includes both game events such as shots, goals, or faceoffs as well as on- and off-ice movements of players. We then merge the source data into a single play-by-play dataset with roughly 1.5 million events and an average of around 2,250 events per game.

By drawing on an expected goals model including factors such as shot position, strength state, and score state, we can estimate a continuous expected goal value for each unblocked shot in the data. This creates a fine-grained indicator of shot quality, allowing us to distinguish shots with a low probability of becoming a goal from those highly likely to end up in the net. In the final step of the preprocessing, we match all the lines of different teams that played against each other and average the expected goals for and against the respective lines. All averages are additionally weighted by the ice-time of the corresponding lines.

We subsequently fit a series of regularized linear regressions on the prepared data, first to estimate the performance of single lines, and then to estimate the performance of single players. Because of our inductive approach -- we do not preselect but use all available indicators --, we implement an elastic net regularization that combines the L1 and L2 penalties of the lasso and ridge methods. Also, we cross-validate our estimations to prevent an overfitting of the models.

We can show how context dependency can be used to gain valuable quantitative insights into individual player performances even in such a fast team sport like ice hockey. We validate our estimates by comparing them with baseline measures such as the number of goals scored or the success of the players' teams. In addition, we discuss the benefits of taking the strength of players and opponents into account. However, because our measurement is less intuitive to understand, it should be seen as a complement rather than a substitute for the baseline measures usually used to evaluate individual player performances in team sports."

by Gabi Wüthrich, Joel Floris, Harald Mayr, and Ulrich Woitek
Academic Session III - 13.00 - 14.00, Room: RAA Entrance Hall

The Effect of Status upon Longevity: Swiss Wrestlers, 1735-1918

There is a body of literature on the finding that rich and famous have longer lives than poor and ordinary individuals, but the direction of causality is not clear. An example are two studies on the Academy Awards: winning actors and actresses live longer than nominees (Redelmeier and Singh, 2001a), but winning screenwriters do not Redelmeier and Singh (2001b). In analyzing these effects, potential immortality bias has to be taken into account (Sylvestre et al., 2006; Wolkewitz et al., 2010). As Rablen and Oswald (2008) put it, “[t]he ideal experiment would be one in which extra status could somehow be dropped upon a sub-sample of individuals while those in a control group of comparable individuals received none.” In their study, they compare the longevity of Nobel Prize winners and nominees, arguably comparable groups, and find that winning the Nobel prize is rewarded with 1-2 years of extra longevity. Other studies look at longevity of popes relative to artists (Carrieri and Serraino, 2005; Hanley et al., 2006), of jazz musicians (Spencer, 1991; Rothman, 1992; Haaga, 1992; Spencer, 1992), and of pop stars (Bellis et al., 2007). For athletes, there is the paper by Abela and Kruger (2005) on baseball players. Besides tackling immortality bias, the analyses also have to disentangle the effect of reputation on longevity from income effects. With these caveats in mind, we want to study longevity of a very specific group of athletes, the Swiss wrestlers: do champions live longer than wrestlers without title?

The national sport Schwingen (Swiss wrestling) is a type of belt wrestling and part of Alpine herdsmen culture. It has a long history, and can be traced back to about 1600 as a specific tournament, especially in parts of Berne and Lucerne (Emmental, Haslital, Entlebuch; Treichler 2010). Schwingen gained more inter-regional significance through the festivals in Interlaken (Unspunnenfeste) in 1805 and 1808. The textbook by Rudolf Schärer (1864) made Schwingen also popular among the gymnasts (Turner), leading to two types of Schwinger: the traditional Sennenschwinger (blue shirts) and the Turnerschwinger (white shirts). The federal organisation (Eidgenössischer Schwingerbund) was founded in 1894. Currently, there are three regular interregional events, the already mentioned Unspunnenfest (every six years), the Eidgenössisches Schwing- und Älplerfest (every three years, first event: 1895) and the Kilchbergerschwinget (every six years, first event: 1927). Successful wrestlers (top 15 per cent) are awarded crowns, and the winner of the Eidgenössisches Schwing- und Älplerfest is proclaimed Schwingerkönig. Before the foundation of the Eidgenössischer Schwingerbund, the winning wrestlers at inter-regional events in Unspunnen and Berne had been proclaimed Schwingerkönig since mid 18th Century (Eidgenössischer Schwingerverband, 1924).

Our dataset consists of 1048 athletes (missing information: 23), born in the period 1735-1918 (main data source: Kuhn and Knoll 1947; Knoll 1948). We deliberately stop in 1918, because the development in the second half of the 20th Century was characterized by an increasing level of professionalisation, making it difficult to disentangle reputation from income effects. Up to then, prizes could be seen as mere compensations for travel expenses and income losses (Eidgenössischer Schwingerverband, 1924). We have in- formation on name, municipality of origin, type (Sennen-/Turnerschwinger), year of birth, year of death, remarks (nickname, cause of death, etc.), and whether the athlete was awarded a crown at one of the events mentioned above. We have 53 individuals in the data set with the title of Schwingerkönig. Our preliminary results show that crowned wrestlers, indeed, seem to live longer than “normal” wrestlers.

by Angelo Zehr, Julian Schmidli, Duc-Quang Nguyen, Tania Boa, Luc Guillemot
Data Journalism Session II - 14.00 - 15.30, Room: RAA-G-01 AULA

Roger Federer - 20 Years 20 Titles

20 years ago, he played his first professional match. Roger Federer has now won his 20th Grand Slam title. A data analysis of all the matches he has played reveals how he became the best tennis player of all time. In Tennis, every move, every point is being tracked. But none of it is made accessible in a machine readable way. What we tried to achieve was, to analyse all availabe data from the last 20+ years of tennis and make it accessible to others as well. We did that by publishing not only the story but also the code that we used ( plus a making of (, in German). The story was published in many different languages at the same time in collaboration with Swissinfo.

We at SRF Data are trying to set new standards when it comes to transparency and reproducibility. Not only do we publish all our code used for the analysis. We also publish the original data so that others can understand how we transformed and analyzed it. Furthermore, data journalism and sports have seldom been combined in a way we did. We tried to keep the dimensions as simple as possible so that everybody (even if a reader has no prior knowledge of tennis) can understand what we are trying to convey.

With this story, we reached more than four times of the visitors we usually reach. By applying the data journalistic approach to a popular topic like tennis, we tried to introduce more people to our way of journalism. The feedback was extremely positive. Not only did important experts in the field of tennis analytics extol our story but so did other data journalists.

Source and methodology
The ATP (world tennis organization) does not offer an API for their data. Also on request, they do not publish or sell any of their collected data in a machine-readable format. That's why we collaborated with Mileta Cekovic, a Serbian computer scientist who scraped all the available information from the ATP website and organized it in a database. Furthermore, we used the deuce package by Stephanie Kovalchik which makes all of Jeff Sackmanns collected historical tennis data available for further analysis.

Technologies Used
The Ultimate Tennis Statistics by Mileta Cekovic were written in Java on Windows. The backend is a PostgreSQL database that we could port also to our Unix (Linux, Mac) computers. From there we used R Markdown and the RPostgreSQL package to read data from the database. In R we used ggplot for visually analyze the data. With jsonlite we exported only the data needed for the final chart into a JSON file that we could import in our frontend which consists of a react-stack and d3. To update the data we can scrape the latest data from the ATP website and simply rerun our R Markdown. It will automatically export new JSON files and update the UI. To translate the story into 10 languages total we collaborated with Swissinfo. They entered their translated texts into a google drive sheet where we downloaded it automatically and entered it into our translate functionality.