features | coefficients | odds increase / decrease | |
---|---|---|---|
7 | prev_driver_position | -0.169095 | -0.155571 |
8 | prev_construct_position | -0.089435 | -0.085553 |
10 | constructorId_9-strt_len_median | -0.011137 | -0.011075 |
13 | prev_construct_points | -0.007542 | -0.007513 |
30 | driverId_844-aero_reg | -0.006534 | -0.006512 |
5 | constructorId_9-avg_track_spd | -0.003273 | -0.003268 |
6 | constructorId_9-corner_spd_min | -0.003105 | -0.003100 |
11 | constructorId_9-windspeed | -0.001691 | -0.001690 |
25 | driverId_815 | -0.001439 | -0.001438 |
21 | constructorId_9-corner_spd_min-aero_reg | -0.000762 | -0.000762 |
24 | constructorId_9-years_since_major_cycle-round | -0.000129 | -0.000129 |
23 | constructorId_9-round-years_since_major_cycle | -0.000129 | -0.000129 |
22 | constructorId_9-years_since_major_cycle | 0.000005 | 0.000005 |
9 | constructorId_9-strt_len_max | 0.000255 | 0.000255 |
4 | constructorId_9-corner_spd_max | 0.000286 | 0.000286 |
2 | constructorId_9 | 0.000595 | 0.000595 |
17 | constructorId_9-num_fast_corners | 0.000747 | 0.000747 |
18 | constructorId_9-aero_reg | 0.000825 | 0.000825 |
1 | constructorId_9-circuit_len | 0.000994 | 0.000995 |
27 | constructorId_6-corner_spd_min-aero_reg | 0.001186 | 0.001187 |
14 | constructorId_9-num_slow_corners | 0.001399 | 0.001400 |
29 | constructorId_131-corner_spd_min-aero_reg | 0.001531 | 0.001533 |
15 | driverId_830 | 0.002034 | 0.002036 |
0 | constructorId_9-num_corners | 0.002251 | 0.002254 |
16 | prev_construct_wins | 0.003298 | 0.003304 |
19 | prev_driver_wins | 0.003342 | 0.003348 |
20 | constructorId_9-round | 0.003805 | 0.003813 |
28 | driverId_844-strt_len_median | 0.004471 | 0.004481 |
3 | constructorId_9-max_track_spd | 0.006688 | 0.006711 |
26 | constructorId_6-aero_reg | 0.008188 | 0.008221 |
12 | prev_driver_points | 0.014778 | 0.014887 |
About
F1 is a racing series where the best drivers and teams from across the world come together to compete. In fact, F1 is widely recognized as the most prestigious motorsport series in the world. This series has been running for a long time, but has recently experienced an explosion in popularity due to the publication of limited run tv-series, “Drive to Survive” as well as the extended marketing reach provided by social media platforms such as YouTube, Twitter, Instagram, and Reddit (just to name a few).
As the sport has continued to grow, so has the sports betting and fantasy sports scene. For example, on Reddit, users compete in a series of race-by-race prediction challenges. In these challenges, participants predict race-weekend outcomes such as pole position qualifier (which driver achieved the fastest lap time during qualifying), race winner, and other relevant statistics. Moreover, F1 itself provides a fantasy sports competition where users construct teams from the available manufacturers and drivers (with certain restrictions) to achieve the highest point totals over the course of a given year.
Naturally, this extends to sports betting where the prediction of race outcomes and driver performances can have significant financial implications.
A significant amount of research has been conducted towards this end. Namely, autocoder-decoder and MLP networks have been developed for rank position forecasting, Bayesian regression methods developed for predicting race finishing positions, and analytical approaches constructed to account for the combined effect of both driver and car on race performances.
This website has been designed to automatically generate predictions for each race of the 2024 F1 season and potentially beyond (depends on whether I decide to provide continued support for it or not).
Predictions and Problem Formulation
To simplify the problem of race outcome prediction, I established two classes: top 3 and bottom 17. In other words, if a driver is predicted to finish in the top 3 positions (on the podium), then they would be classified as “top 3”, otherwise they would be classified as “bottom 17”. To model this binary classification problem, I used logistic regression under the following setting:
- data from the fastf1 api used for generating track summary information
- minimum corner speed
- average track speed
- etc.
- data from the visual crossing api used for generating weather information (windspeed)
- ergast data compiled on kaggle for the F1 2022 and 2023 seasons
- subjective regulation change significance -
aero_reg
- the higher these scores, the more significant the regulation change was which occurred going into a particular season of F1
Because the modeling problem is imbalanced (14% of results are top 3 finishes), I used SMOTE (synthetic minority over-sampling technique) in order to improve the balance of the data. F-scores were then obtained for the signficance of each candidate feature and the top 30 features were fitted for the final model. A test train split of 80-20 was used.
This model achieves an accuracy of 0.8971, and an f1-score of 0.7308. The features, their coefficients, and the odds-increase / decrease associated with each are provided as follows:
Note that constructor and driver ids listed correspond to the following:
constructorId_9
: Red Bull RacingconstructorId_6
: FerrariconstructorId_131
: MercedesdriverId_830
: Max VerstappendriverId_844
: Charles LeclercdriverId_815
: Sergio Perez