About

F1 is a racing series where the best drivers and teams from across the world come together to compete. In fact, F1 is widely recognized as the most prestigious motorsport series in the world. This series has been running for a long time, but has recently experienced an explosion in popularity due to the publication of limited run tv-series, “Drive to Survive” as well as the extended marketing reach provided by social media platforms such as YouTube, Twitter, Instagram, and Reddit (just to name a few).

As the sport has continued to grow, so has the sports betting and fantasy sports scene. For example, on Reddit, users compete in a series of race-by-race prediction challenges. In these challenges, participants predict race-weekend outcomes such as pole position qualifier (which driver achieved the fastest lap time during qualifying), race winner, and other relevant statistics. Moreover, F1 itself provides a fantasy sports competition where users construct teams from the available manufacturers and drivers (with certain restrictions) to achieve the highest point totals over the course of a given year.

Naturally, this extends to sports betting where the prediction of race outcomes and driver performances can have significant financial implications.

A significant amount of research has been conducted towards this end. Namely, autocoder-decoder and MLP networks have been developed for rank position forecasting, Bayesian regression methods developed for predicting race finishing positions, and analytical approaches constructed to account for the combined effect of both driver and car on race performances.

This website has been designed to automatically generate predictions for each race of the 2024 F1 season and potentially beyond (depends on whether I decide to provide continued support for it or not).

Predictions and Problem Formulation

To simplify the problem of race outcome prediction, I established two classes: top 3 and bottom 17. In other words, if a driver is predicted to finish in the top 3 positions (on the podium), then they would be classified as “top 3”, otherwise they would be classified as “bottom 17”. To model this binary classification problem, I used logistic regression under the following setting:

  • data from the fastf1 api used for generating track summary information
    • minimum corner speed
    • average track speed
    • etc.
  • data from the visual crossing api used for generating weather information (windspeed)
  • ergast data compiled on kaggle for the F1 2022 and 2023 seasons
  • subjective regulation change significance - aero_reg
    • the higher these scores, the more significant the regulation change was which occurred going into a particular season of F1

Because the modeling problem is imbalanced (14% of results are top 3 finishes), I used SMOTE (synthetic minority over-sampling technique) in order to improve the balance of the data. F-scores were then obtained for the signficance of each candidate feature and the top 30 features were fitted for the final model. A test train split of 80-20 was used.

This model achieves an accuracy of 0.8971, and an f1-score of 0.7308. The features, their coefficients, and the odds-increase / decrease associated with each are provided as follows:

features coefficients odds increase / decrease
7 prev_driver_position -0.169095 -0.155571
8 prev_construct_position -0.089435 -0.085553
10 constructorId_9-strt_len_median -0.011137 -0.011075
13 prev_construct_points -0.007542 -0.007513
30 driverId_844-aero_reg -0.006534 -0.006512
5 constructorId_9-avg_track_spd -0.003273 -0.003268
6 constructorId_9-corner_spd_min -0.003105 -0.003100
11 constructorId_9-windspeed -0.001691 -0.001690
25 driverId_815 -0.001439 -0.001438
21 constructorId_9-corner_spd_min-aero_reg -0.000762 -0.000762
24 constructorId_9-years_since_major_cycle-round -0.000129 -0.000129
23 constructorId_9-round-years_since_major_cycle -0.000129 -0.000129
22 constructorId_9-years_since_major_cycle 0.000005 0.000005
9 constructorId_9-strt_len_max 0.000255 0.000255
4 constructorId_9-corner_spd_max 0.000286 0.000286
2 constructorId_9 0.000595 0.000595
17 constructorId_9-num_fast_corners 0.000747 0.000747
18 constructorId_9-aero_reg 0.000825 0.000825
1 constructorId_9-circuit_len 0.000994 0.000995
27 constructorId_6-corner_spd_min-aero_reg 0.001186 0.001187
14 constructorId_9-num_slow_corners 0.001399 0.001400
29 constructorId_131-corner_spd_min-aero_reg 0.001531 0.001533
15 driverId_830 0.002034 0.002036
0 constructorId_9-num_corners 0.002251 0.002254
16 prev_construct_wins 0.003298 0.003304
19 prev_driver_wins 0.003342 0.003348
20 constructorId_9-round 0.003805 0.003813
28 driverId_844-strt_len_median 0.004471 0.004481
3 constructorId_9-max_track_spd 0.006688 0.006711
26 constructorId_6-aero_reg 0.008188 0.008221
12 prev_driver_points 0.014778 0.014887

Note that constructor and driver ids listed correspond to the following:

  • constructorId_9: Red Bull Racing
  • constructorId_6: Ferrari
  • constructorId_131: Mercedes
  • driverId_830: Max Verstappen
  • driverId_844: Charles Leclerc
  • driverId_815: Sergio Perez