Elo vs. Glicko-2 vs. TrueSkill: Choosing the Right Ranking Algorithm
While the Elo rating system pioneered competitive matchmaking, modern Bayesian algorithms like Glicko-2 and TrueSkill offer superior precision by tracking player consistency and handling complex team dynamics.
By Factlen Editorial Team
- Competitive 1v1 Platforms
- Prioritize precise skill estimation and the mathematical handling of inactive players in massive online ladders.
- Multiplayer Game Developers
- Focus on fast convergence and the ability to balance complex, multi-team matches using Bayesian inference.
- Traditionalists & Organizers
- Value absolute transparency, mathematical simplicity, and the ability for players to manually verify their own rating changes.
What's not represented
- · Casual gamers who prefer hidden matchmaking ratings
- · Esports professionals facing algorithmic edge cases
Why this matters
Understanding how these algorithms work reveals why certain competitive platforms feel fair and balanced, while others struggle with uneven team dynamics. For developers and organizers, choosing the right mathematical model is the single most important decision in building a competitive ecosystem.
Key points
- The Elo system pioneered matchmaking but struggles with player inactivity and team dynamics.
- Glicko-2 introduced Rating Deviation, allowing the system to quantify uncertainty and handle inactive players.
- TrueSkill uses Bayesian inference to mathematically combine individual players into team distributions.
- Elo is best for transparent 1v1 games, Glicko-2 for massive online duels, and TrueSkill for team formats.
The invisible engine powering modern competitive sports and online gaming is the matchmaking algorithm. Whether a player is queuing up for a digital shootout or sitting down at a physical chessboard, a mathematical system is quietly calculating their exact probability of victory. The core challenge these systems face is that a player's true skill is a hidden variable; it cannot be measured directly like height or weight. Instead, algorithms must estimate this hidden variable by observing noisy, real-world outcomes like wins, losses, and draws over time. For decades, the industry standard was a single, elegant formula, but the explosion of complex multiplayer gaming has driven a quiet revolution in probabilistic mathematics.[1][6]
The baseline for all modern skill estimation is the Elo rating system, invented in the late nineteen-sixties by physics professor and chess master Arpad Elo. Adopted by the World Chess Federation (FIDE) in 1970, Elo revolutionized competition by replacing arbitrary point-accumulation systems with a zero-sum mathematical model. In the Elo system, every player is assigned a single number representing their skill. When two players face off, the difference between their ratings dictates the expected outcome of the match. If a player with a rating of 1500 plays another 1500, the expected score is exactly 0.5, meaning they are evenly matched. If a high-rated player defeats a low-rated player, very few points change hands, but an upset results in a massive transfer of rating points.[1][4]
When evaluating the Elo system, the argument for the algorithm rests on its absolute transparency and computational simplicity. Players can easily calculate their own rating changes on a napkin, and the system requires virtually no computing power to maintain. However, the argument against Elo centers on its rigidity. The formula assumes that all players are equally consistent and that a player's skill remains static even if they stop competing for a decade. The evidence for this limitation is seen in modern online environments, where new accounts or returning veterans often experience wildly inaccurate matchmaking because the system treats their single rating number with the exact same confidence as a player who competes every single day.[1][4][6]

To solve the problem of uncertainty, Harvard statistician Mark Glickman developed the Glicko rating system in 1995, followed by the highly refined Glicko-2 algorithm. Glicko-2 represents a massive probabilistic upgrade over Elo by acknowledging that a single number cannot capture the full picture of human performance. Instead of just tracking a rating, Glicko-2 tracks three distinct parameters for every competitor: their rating, their Rating Deviation, and their Volatility. The Rating Deviation acts as a confidence interval; rather than stating a player is exactly an 1850, the system calculates that it is ninety-five percent confident the player's true skill lies somewhere between 1750 and 1950.[2][4]
The mechanics of Rating Deviation fundamentally change how points are awarded. If a player competes daily, their deviation shrinks, meaning the system is highly confident in their skill level and their rating will only move in small increments. Crucially, if a player stops competing, their deviation slowly increases over time. When that inactive player finally returns, the system allows their rating to swing wildly, quickly recalibrating them to their current actual skill level. The third parameter, Volatility, measures the degree of expected fluctuation; it spikes when a player suddenly has highly erratic performances, further adjusting how quickly their rating can change.[2]
The mechanics of Rating Deviation fundamentally change how points are awarded.
When evaluating Glicko-2, the argument for the system is its unparalleled precision in one-on-one environments and its elegant handling of player inactivity. By quantifying uncertainty, it prevents rating inflation and ensures that active and inactive players are treated differently. The argument against Glicko-2 is its mathematical opacity; the formula involves complex logistic functions and system constants that are impossible for a casual player to calculate by hand. The evidence of its superiority in one-on-one formats is its near-universal adoption by massive digital platforms, including major online chess servers and competitive video games.[2][4][6]

While Glicko-2 perfected the one-on-one duel, the rise of team-based video games exposed a new mathematical blind spot. Neither Elo nor Glicko-2 were designed to evaluate a match where four players team up against another team of four. To solve this, researchers at Microsoft, including Ralf Herbrich and Thore Graepel, developed the TrueSkill ranking system in 2005. Designed specifically for the Xbox Live network, TrueSkill abandons the logistic curves of its predecessors in favor of pure Bayesian inference and Gaussian distributions. Like Glicko, it tracks both a skill estimate and an uncertainty metric, but it processes them through a complex mathematical framework called message passing on factor graphs.[3][5]
The true breakthrough of TrueSkill is its ability to mathematically combine individual players into a cohesive team entity. When a team of four enters a match, TrueSkill adds their individual Gaussian bell curves together to create a single, massive bell curve representing the team's overall capability. After the match concludes, the algorithm looks at the result and works backward, distributing the rating updates to each individual player based on their personal uncertainty levels. A new player on a winning team will see a massive increase in their skill estimate, while a highly established veteran on the same winning team will see almost no movement, as the system already knows exactly how good they are.[3][5]
When evaluating TrueSkill, the argument for the system is its native, mathematically sound handling of multiplayer dynamics, including multi-team free-for-alls and asymmetric matches. It also boasts incredibly fast convergence; Microsoft data shows TrueSkill needs only forty-six matches to accurately estimate the real skills of all players in a four-versus-four environment. The argument against TrueSkill is its proprietary nature and computational weight. The algorithm is patented by Microsoft, and the Bayesian approximations require significantly more processing power than simple Elo updates. The evidence of its effectiveness is its massive scale, having operated continuously on Xbox Live for nearly two decades, processing millions of complex game outcomes every single day.[3][5][6]

A side-by-side trade-off analysis of these three systems reveals a clear progression in both capability and complexity. Elo requires the least computational overhead but demands the highest number of matches to find a player's true skill, all while failing to account for time away from the game. Glicko-2 requires moderate computational power and fewer matches to converge, offering a massive leap in accuracy for one-on-one games by tracking time-based uncertainty. TrueSkill requires the most computational power and relies on complex Bayesian approximations, but it is the only system capable of untangling the chaotic web of multi-team, multiplayer environments without resorting to mathematical hacks.[1][6]
Ultimately, the choice of algorithm dictates the fairness of the competitive environment. Elo fits well when absolute transparency is required, when players demand the ability to verify their own scores, and when matches are strictly one-on-one, such as in over-the-board chess tournaments. It does not fit when players take long breaks between competitions or when the game involves any form of team dynamics. The rigidity of a single, static number simply cannot capture the fluid nature of modern digital competition.[4][6]
Glicko-2 fits well when managing massive online one-on-one ladders where players have wildly varying activity levels. It is the gold standard for digital duels, ensuring that a veteran returning after a year is quickly recalibrated without ruining the matchmaking pool for active players. It does not fit when matches involve more than two sides, as the core mathematics are strictly designed for pairwise encounters. TrueSkill fits well when matchmaking involves complex team structures, such as four-versus-four shooters or eight-player free-for-alls. It does not fit when developers require a completely unencumbered, public-domain algorithm for simple duels, though open-source approximations of Bayesian systems are becoming increasingly common.[2][3][6]
How we got here
1970
The World Chess Federation (FIDE) officially adopts the Elo rating system, establishing the mathematical baseline for competitive rankings.
1995
Statistician Mark Glickman publishes the Glicko rating system, introducing the concept of Rating Deviation to quantify uncertainty.
2005
Microsoft Research deploys the TrueSkill algorithm on Xbox Live, solving the mathematical challenge of team-based matchmaking.
2012
The refined Glicko-2 algorithm is published, adding a volatility parameter to better handle erratic player performances.
Viewpoints in depth
Traditionalists & Organizers
Advocates for transparency and simplicity in competitive rankings.
For traditional sports organizations and over-the-board game federations, the primary goal of a rating system is not absolute mathematical perfection, but trust. This camp argues that players must be able to understand exactly why their rating changed and, ideally, calculate it themselves. They view complex Bayesian models as black boxes that alienate competitors. From this perspective, the original Elo system remains the gold standard because its zero-sum nature and logistic curves are completely transparent, fostering a sense of fairness that algorithmic complexity can sometimes obscure.
Competitive 1v1 Platforms
Operators of massive digital ladders who prioritize handling player inactivity.
Digital platforms hosting millions of daily one-on-one matches face a unique problem: player churn. This camp argues that a rating system's most important job is quantifying uncertainty. When a player takes a six-month break, their true skill has likely degraded, but a traditional Elo system treats them as if they never left, ruining matches for active players. By championing systems like Glicko-2, these platforms prioritize dynamic confidence intervals. They argue that accepting a slight increase in mathematical complexity is a necessary trade-off to maintain the integrity of a massive, constantly shifting player base.
Multiplayer Game Developers
Engineers building team-based competitive environments who require rapid convergence.
Modern video game developers operate in environments where one-on-one duels are the exception, not the rule. This camp argues that legacy systems are fundamentally broken for team-based games, as they cannot mathematically isolate an individual's contribution to a group win. They advocate for Bayesian inference models like TrueSkill because these systems can sum individual probability distributions into a single team entity. For these engineers, the massive computational cost of message passing on factor graphs is entirely justified by the system's ability to accurately rank a player after just a few dozen matches in a chaotic four-versus-four environment.
What we don't know
- How open-source alternatives to TrueSkill will evolve as Microsoft's patents age.
- Whether traditional sports organizations will ever adopt Bayesian models for physical team sports.
Key terms
- Bayesian Inference
- A statistical method that continuously updates the probability of a hypothesis (like a player's true skill) as more evidence (match results) becomes available.
- Rating Deviation (RD)
- A measure of uncertainty in a player's rating; a higher RD means the algorithm is less confident in the player's exact skill level.
- Volatility
- A parameter specific to Glicko-2 that measures the degree of expected fluctuation in a player's performance over time.
- Zero-sum Game
- A mathematical representation of a match where one player's gain in rating points is exactly balanced by the other player's loss.
Frequently asked
Why don't modern video games use the original Elo system?
Elo cannot handle team-based games and struggles to accurately rate players who take long breaks, as it assumes a player's skill remains static during inactivity.
Can TrueSkill be used for one-on-one games?
Yes. While TrueSkill was designed to solve team matchmaking, its Bayesian inference model works perfectly for one-on-one duels, offering similar precision to Glicko-2.
What happens to my rating if I stop playing for a year?
Under Elo, your rating stays exactly the same. Under Glicko-2 and TrueSkill, your visible rating might remain similar, but your hidden uncertainty parameter will increase significantly, allowing your rating to swing faster when you return.
Sources
[1]arXivCompetitive 1v1 Platforms
Comparing Rating Systems: Elo, Glicko, and TrueSkill
Read on arXiv →[2]Glicko.netCompetitive 1v1 Platforms
The Glicko-2 Rating System
Read on Glicko.net →[3]Microsoft ResearchMultiplayer Game Developers
TrueSkill Ranking System
Read on Microsoft Research →[4]CompetierTraditionalists & Organizers
Advantages and disadvantages of rating algorithms
Read on Competier →[5]GitHubMultiplayer Game Developers
TrueSkill: A Bayesian skill rating system
Read on GitHub →[6]Factlen Editorial TeamTraditionalists & Organizers
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →
Every angle. Every day.
Get data analysis stories with full source coverage and perspective breakdowns delivered to your inbox.







