How It Works

The math behind the rankings — explained for humans.

1The Question

Who's the greatest F1 driver of all time? It's the pub argument that never ends. Verstappen vs Senna vs Schumacher vs Hamilton — how do you compare drivers who never raced each other, in different cars, across different eras? You can't compare raw stats. Wins depend on the car. Championships depend on the era. Points systems changed. Grid sizes changed. Everything changed. Except one thing: teammates drive the same car.

2The Model: Bradley-Terry

We use the Bradley-Terry model — a statistical framework for pairwise comparisons, widely used in chess, sports analytics, and machine learning. Each driver gets a strength parameter λ. The probability that driver A beats driver B is: P(A > B) = λ_A / (λ_A + λ_B) The model is fit using maximum likelihood estimation: iterate over all teammate comparisons until the parameters converge. Unlike ELO (which is path-dependent and order-sensitive), Bradley-Terry optimizes globally — the final ratings don't depend on which comparison you process first. The killer feature: transitive strength propagation. If Verstappen dominated Ricciardo, and Ricciardo beat Vettel, and Vettel beat Räikkönen, and Räikkönen was teammates with Schumacher... the model propagates strength through the entire chain. Prost benefits from Senna's strength, even though Prost retired before half the grid was born.

3What We Compare

For each race weekend where two drivers share the same constructor: Race result: Who finished ahead? If both finished (or had driver-fault DNFs like crashes), the better finisher wins. Qualifying position: Who qualified higher? This is a clean signal — same car, same track, same conditions. Weighting: - Race comparisons: full weight - Qualifying comparisons: 0.7x weight (slightly less signal) - Mechanical DNF vs finish: 0.5x weight (unreliable — you can't control engine failures) - Both mechanical DNFs: excluded entirely

4Pair Cap: Preventing Volume Farming

A key insight: after ~10 races against the same teammate in a season, the signal saturates. Races 11-22 mostly confirm what we already know. Without capping, a driver who spent 5 years dominating a weak teammate would accumulate hundreds of low-value comparisons, inflating their rating. We cap at 10 comparisons per pair per season to prevent this. This single change dropped Alexander Albon (who dominated Williams backmarkers for years) from #6 to #27 in our rankings.

5Age Adjustment

The model applies a subtle age adjustment to account for the natural performance curve of racing drivers: - Under 22: -5% (still developing) - 22-27: Small penalty decreasing to zero - 28-33: Peak window (no adjustment) - 34-36: -1% (slight decline) - 37-39: -3% (noticeable) - 40+: -5% (significant decline) Additionally, when computing the ELO exchange, we adjust the opponent's *effective* rating based on their age. Beating a 42-year-old Schumacher is worth less than beating a 30-year-old Schumacher, even at the same nominal rating.

6Teammate Chains

The most unique feature: you can trace a connection between any two drivers in F1 history through shared teammates. Verstappen → Ricciardo → Vettel → Räikkönen → Coulthard → Hill → Prost → Lauda Each link represents a real teammate pairing. The chain confidence depends on the number of shared races at each link. Shorter chains with more comparisons = higher confidence. We use Dijkstra's algorithm on the teammate graph, where edge weight = 1/comparisons (more comparisons = stronger link = shorter path).

7Limitations

Pre-1960: Very few teammate comparisons (shared cars, small grids). These drivers are essentially unratable with our method. One-car eras: Some teams fielded a single car, leaving drivers with few comparison opportunities. Team orders: The model can't distinguish between genuine pace differences and team orders (e.g., Barrichello yielding to Schumacher). Car suitability: Some drivers extract more from specific car characteristics. A driver might look worse against a teammate simply because the car didn't suit their style. Sample size: Drivers with fewer comparisons have wider confidence intervals. A 3-season career can't be rated with the same confidence as a 20-season career.

8Data Source

All data comes from the Jolpica F1 API (the community successor to the Ergast database), covering every race result and qualifying session from 1950 to present. The model processes 9,500+ teammate comparisons across 77 seasons involving 500+ drivers. Rankings update after each race weekend by re-running the full Bradley-Terry solve.