Football event data approach
Surely in recent years, the term xG has been familiar in the match statistics through tv or social media.
xG means expected goals and measures the quality of a shot based on several variables such as assist type and originates from a data event such as a shoot.
Expected Goals (xG) measures the quality of a shot based on several variables such as assist type, shot angle and distance from goal, whether it was a headed shot and whether it was defined as a big chance. — Opta.
What is event data?
Event data quantifies what happens on the pitch in relationship with individual player actions. For example, each time a player passes the ball, tackles an opponent, shoots on goal, or saves a shot is one event. (PFSA)
How to collect event data?
In the origins of the collection task they did the humans (analysts) with the help of the software registered the actions. First, the operators see the game, and each action is recorded by pressing a key and contains information about:
- player name or id
- team id
- match
- period (first half or second half)
- event time
- event name
- positions
Currently, the analysts have advanced tech support such as live data enrichment powered by computer vision and AI. According to Opta up to 3000 actions are captured during each match by the trained analyst using collection tools.
Interacting with event data
Opta collects around 60 kinds of event data, while Statsbomb has 30 principal. In the last medium post, I talk about data repositories and vendors.
The variables depend on the vendor, however, we found similarities such as goal, shoot, assist, saved, pass, cross, dribbles, and so on. Each variable contains additional information, they are included nested inside an object named after that event type.
The next information belongs to Statsbomb and we can see the record is about the “Ball Receipt” event of the Jordi Alba player. Your location is an array [x,y].
{
"id": "8bd23deb-1500-46b0-9629-b7bf0c45f911",
"index": 925,
"period": 1,
"timestamp": "00:21:06.246",
"minute": 21,
"second": 6,
"type": {
"id": 42,
"name": "Ball Receipt*"
},
"possession": 36,
"possession_team": {
"id": 217,
"name": "Barcelona"
},
"play_pattern": {
"id": 3,
"name": "From Free Kick"
},
"team": {
"id": 217,
"name": "Barcelona"
},
"player": {
"id": 5211,
"name": "Jordi Alba Ramos"
},
"position": {
"id": 6,
"name": "Left Back"
},
"location": [
93.3,
2.9
],
"related_events": [
"cd62b7d6-c090-4bdf-b745-f80135c852f5"
]
}
A similar format contain Wyscout records:
{
"id": 398601201,
"playerId": 122,
"teamId": 3187,
"matchId": 2852473,
"matchPeriod": "1H",
"eventSec": 3.3039120000000006,
"eventId": 8,
"eventName": "Pass",
"subEventId": 85,
"subEventName": "Simple pass",
"positions": [
{
"x": 50,
"y": 51
},
{
"x": 30,
"y": 63
}
],
"tags": [
{
"id": 1801,
"tag": {
"label": "accurate"
}
}
]
}
The most common applications given to this data are:
- Design advances metrics (xG)
- Player recruitment
- Opposition analysis
- Match analysis (heat maps, passing networks)
- Machine Learning modeling
Laboratory
Fine, it’s time: No Practice, no Party. In the next code section, I explain how to interact with the Statsbomb event data and show visualizations of the actions.
First, to interact directly with Statbombs data I could use statsbombpy library. The ‘mplsoccer’ will help us to visualize and ‘pandas’ to manipulate information.
from statsbombpy import sb
import pandas as pd
from mplsoccer import VerticalPitch, Pitch
Three function it’s important to access specific matches:
sb.competitions()
sb.matches(...)
sb.events(...)