Elo vs. Glicko-2 vs. TrueSkill: Comparing the Top Competitive Ranking Algorithms
A deep dive into the mathematics and trade-offs of the three dominant matchmaking algorithms powering digital sports and competitive gaming.
By Factlen Editorial Team
- Team-Based Game Designers
- Creators of complex multiplayer titles rely on TrueSkill to parse individual contributions.
- 1v1 Platform Developers
- Developers of digital chess and fighting games favor Glicko-2 for its handling of uncertainty.
- Traditional Matchmakers
- Advocates for the classic Elo system prioritize transparency and simplicity.
What's not represented
- · Casual players who prefer unranked, purely connection-based matchmaking without skill tracking.
- · Esports professionals who advocate for manual tournament seeding over purely algorithmic rankings.
Why this matters
Whether you are a data analyst building a competitive platform or a player trying to understand why your rank won't budge, knowing how these algorithms calculate human skill is essential to navigating modern digital matchmaking.
Key points
- The Elo system offers unmatched transparency but struggles with new players and team dynamics.
- Glicko-2 introduces a Rating Deviation (RD) variable to measure uncertainty, making it ideal for 1v1 online games.
- TrueSkill uses Bayesian inference to handle complex team-based matches and individual performance metrics.
- A 2024 Cambridge study found TrueSkill achieved 62% accuracy in predicting CS:GO match outcomes.
- Choosing an algorithm requires balancing mathematical accuracy against the psychological impact on players.
In the high-stakes ecosystem of competitive gaming and digital sports, the algorithm that decides who plays whom is the invisible engine of player retention. Matchmaking systems must balance the psychological need for fair competition against the mathematical challenge of quantifying human skill. If a system is too slow to recognize a player's improvement, they languish in what the community calls "Elo hell"; if it is too volatile, the rankings lose their prestige and fail to reward consistent mastery. At the center of this data-analysis challenge are three dominant algorithms that power everything from international chess to professional esports: the classic Elo system, the modern Glicko-2 framework, and Microsoft's proprietary TrueSkill model. Each algorithm represents a different philosophical approach to measuring human performance, carrying distinct trade-offs in complexity, accuracy, and player psychology.[3][4][7]
The foundation of all modern competitive rankings is the Elo system, originally developed by physics professor Arpad Elo for the United States Chess Federation. It operates on a zero-sum exchange: when two players compete, the winner takes points directly from the loser. The number of points exchanged depends entirely on the difference in their pre-match ratings. If a grandmaster defeats a novice, the rating change is minuscule; if the novice pulls off an upset, the point transfer is massive. When evaluating the Elo system, the case for it rests entirely on its mathematical elegance and absolute transparency. Players can easily calculate their expected rating change before a match even begins, which builds immense trust in the system. Because it tracks only a single variable—the player's current rating—it requires minimal computational overhead, making it trivial to implement for independent developers, local tournament organizers, or tabletop gaming communities.[3][6]
However, the case against Elo centers on its inability to measure uncertainty or handle irregular play schedules. Elo assumes that a player with a 1500 rating who plays every single day is mathematically identical to a player with a 1500 rating who has not competed in five years. It also struggles significantly with new players, requiring dozens of matches to move a highly skilled "smurf" account out of the beginner brackets. Evidence of Elo's limitations is widespread in modern digital environments; data analysis of online matchmaking shows that basic Elo systems often result in "score creep" or rating inflation over time. This dynamic inadvertently rewards players simply for playing a high volume of matches rather than demonstrating genuine improvement. Furthermore, because Elo was designed strictly for one-on-one zero-sum games, it cannot natively handle team-based results without heavy, often unbalanced modifications.[1][3][5]
Ultimately, the Elo system fits well when a competition is strictly one-on-one, the player base competes regularly, and the community demands absolute transparency in how points are awarded. It remains the gold standard for traditional board games and localized tournaments where the player pool is relatively static and well-understood. Conversely, Elo does not fit when a platform features players who take long breaks, when the system needs to rapidly place new accounts into their correct skill brackets to protect beginners, or when matches involve teams of varying sizes. In these highly dynamic digital environments, the single-variable approach of Elo simply lacks the mathematical dimensionality required to maintain fair matchmaking.[3][5]

To address these exact shortcomings, Dr. Mark Glickman developed the Glicko system, and later its successor Glicko-2, which has rapidly become the gold standard for one-on-one digital matchmaking. Glicko-2 tracks three distinct variables per player: their rating, their Rating Deviation (RD), and their rating volatility. The RD acts as a mathematical confidence interval; if a player has not competed recently, their RD expands, signaling to the algorithm that their true skill is currently uncertain. The case for Glicko-2 is built entirely on this concept of "information gain." When a player with a high RD competes, their rating swings dramatically based on the result, allowing the system to rapidly calibrate new or returning players in a fraction of the time Elo requires. The volatility metric further refines this by tracking how consistent a player is over time, ensuring that erratic performers do not artificially inflate their standing through lucky streaks.[3][6]
Mark Glickman developed the Glicko system, and later its successor Glicko-2, which has rapidly become the gold standard for one-on-one digital matchmaking.
The case against Glicko-2 is that it remains fundamentally limited to two-player interactions and sacrifices the intuitive transparency of Elo. While it is mathematically superior for duels, it cannot natively parse the complexities of team-based games where individual contributions vary wildly. Furthermore, the math is significantly more opaque to the average user, who may be deeply frustrated to see their rating barely move after a hard-fought win simply because the system already had a low RD—meaning high confidence—in their established skill level. Despite this psychological friction, evidence from massive platforms like Lichess and Pokémon Showdown demonstrates Glicko-2's absolute superiority in high-volume, one-on-one environments. Because the system adjusts its confidence dynamically, it effectively neutralizes the problem of inactive players clogging the top of the leaderboards, as their expanding RD will eventually force them to prove their skill again against active competitors.[6]
Glicko-2 fits well when a platform hosts one-on-one matches, experiences high player churn, and needs to quickly identify the true skill of new accounts without ruining the experience for existing players. It is the definitive choice for digital chess, fighting games, and competitive one-on-one strategy titles where individual performance is the only variable that matters. However, Glicko-2 does not fit when a game involves teams of heterogeneous skill levels, or when developers need to predict the fairness of a match involving multiple factions and drop-in mechanics. Attempting to average out Glicko-2 scores across a five-person team often leads to wildly unbalanced matches, exposing the algorithm's inherent limitations outside of a dueling context.[5][6]
This brings the analysis to TrueSkill, a sophisticated Bayesian ranking system developed by Microsoft Research specifically for the chaotic, multi-layered environment of Xbox Live multiplayer games. Unlike Elo and Glicko, TrueSkill was built from the ground up to handle teams, free-for-all deathmatches, and scenarios where players might drop out mid-game. It uses a complex factor graph to model the performance of each individual player within a team context. The case for TrueSkill is its unparalleled flexibility in multiplayer environments. It can take a team of four random players, assess their individual Gaussian distributions of skill, and accurately predict their combined win probability against an opposing team of five. TrueSkill 2, a major upgrade published in 2018, pushes this even further by factoring in granular data like individual kill-death ratios, a player's tendency to quit matches early, and their historical performance in adjacent game modes.[2][4]
The case against TrueSkill revolves around its heavy computational cost, its proprietary nature, and the psychological weight of its extreme accuracy. The algorithm is heavily patented by Microsoft, meaning independent developers and competing studios cannot use it without navigating strict licensing restrictions. Additionally, the "stickiness" of TrueSkill rankings can be incredibly frustrating for players; the system is so confident in its Bayesian models that it can feel virtually impossible for a player to grind their way out of a lower rank once the algorithm has locked in their skill curve. Yet, the evidence supporting its efficacy is overwhelming. A 2024 University of Cambridge study analyzing a massive dataset of Counter-Strike: Global Offensive matches highlighted TrueSkill's power. The researchers found that TrueSkill achieved a 62% accuracy rate in predicting match outcomes, substantially outperforming basic Elo models in both data efficiency and overall performance when dealing with complex five-versus-five team structures.[1][2][5]

TrueSkill fits well when a game features complex team dynamics, asymmetric factions, or free-for-all modes, and when the developer has the computational resources to run Bayesian inference on millions of simultaneous matches. It is the undisputed king of modern team-based shooter matchmaking, capable of parsing signal from the noise of random teammates. Conversely, it does not fit when a project is open-source, when the game is strictly one-on-one, or when the community demands a simple, transparent number that goes up predictably after every win. In these scenarios, the sheer weight of TrueSkill's mathematical machinery is overkill, and the licensing restrictions make it a non-starter for independent developers looking to build a grassroots competitive scene.[2][5]
Beyond the raw mathematics, the success of any ranking system relies heavily on how it manages player psychology and the dreaded phenomenon known as "Elo hell." This term describes the frustrating experience where a player feels their true skill is much higher than their current rank, but they are trapped by the system's mechanics—often blaming poor teammates or an inflexible algorithm. Elo systems are particularly vulnerable to this perception because they require a massive volume of games to correct a misplaced rating. TrueSkill and Glicko-2 attempt to mitigate this by using their uncertainty variables to rapidly adjust ranks during winning streaks, effectively fast-tracking players out of brackets where they do not belong. However, if a player's Rating Deviation shrinks too much, the system becomes stubborn, leading to a different kind of psychological friction where victories feel unrewarded. Designing a ranking system requires balancing the mathematical truth of a player's skill against the psychological need for progression and reward.[1][5]

Looking ahead, the future of competitive data analysis is shifting toward hybrid models that incorporate machine learning to evaluate granular in-game actions rather than just binary win-loss outcomes. While TrueSkill 2 began this trend by analyzing kill-death ratios and quit rates, emerging algorithms are looking at spatial positioning, resource management, and even communication metrics to assess a player's true value to a team. Until these AI-driven models become computationally cheap enough for widespread adoption, the industry will continue to rely on the established triad. Choosing the right algorithm remains a fundamental trade-off: Elo offers unmatched transparency, Glicko-2 provides rapid calibration for duels, and TrueSkill masters the chaos of multiplayer teams. For data analysts and game designers, selecting the correct ranking algorithm is not just a technical implementation—it is the foundational architecture that dictates the fairness, longevity, and competitive spirit of their entire community.[1][7]
How we got here
1960
Arpad Elo develops the Elo rating system for the United States Chess Federation.
1995
Dr. Mark Glickman introduces the Glicko rating system, adding a measure of uncertainty.
2007
Microsoft Research publishes TrueSkill, bringing Bayesian matchmaking to Xbox Live.
2018
Microsoft releases TrueSkill 2, incorporating granular metrics like quit rates and individual performance.
2024
Cambridge researchers publish an empirical analysis proving TrueSkill's superiority in team-based environments.
Viewpoints in depth
Traditional Matchmakers
Advocates for the classic Elo system prioritize transparency and simplicity.
This camp argues that players need to understand exactly why their rating changed after a match. By keeping the math to a simple zero-sum exchange based on a single variable, Elo ensures that players never feel cheated by an opaque algorithm. They maintain that while Elo may be slower to find a true rank, its predictability builds long-term trust in the competitive ecosystem.
1v1 Platform Developers
Developers of digital chess and fighting games favor Glicko-2 for its handling of uncertainty.
For platforms with massive player churn and frequent one-on-one matches, this perspective emphasizes the importance of the Rating Deviation (RD) metric. They argue that Elo's inability to handle inactive players ruins leaderboards. By expanding the confidence interval when a player takes a break, Glicko-2 ensures that returning veterans must re-prove their skill, keeping the top ranks dynamic and accurate.
Team-Based Game Designers
Creators of complex multiplayer titles rely on TrueSkill to parse individual contributions.
This camp points out that neither Elo nor Glicko-2 can natively handle a five-versus-five match where players have wildly different skill levels. They argue that the computational cost of TrueSkill's Bayesian factor graphs is a necessary trade-off to achieve fair matchmaking in chaotic environments. For these designers, the ability to predict match quality and handle drop-in/drop-out mechanics is non-negotiable.
What we don't know
- How effectively next-generation AI models will be able to evaluate granular in-game actions like spatial positioning compared to binary win-loss outcomes.
- Whether Microsoft will eventually open-source TrueSkill 2, which currently restricts independent developers from using the algorithm.
Key terms
- Zero-sum exchange
- A system where the exact number of points lost by the defeated player is gained by the winning player.
- Rating Deviation (RD)
- A metric used in Glicko systems to represent the algorithm's level of uncertainty about a player's true skill.
- Bayesian inference
- A statistical method used by TrueSkill that updates the probability of a player's skill level as more match data becomes available.
- Smurf
- A highly skilled player who creates a new, low-ranked account to easily defeat less experienced opponents.
- Elo hell
- A perceived state where a player feels trapped at a lower rank than they deserve, often blaming the matchmaking algorithm or teammates.
Frequently asked
Why do I lose more points for some matches than others?
In systems like Elo and Glicko, point changes are based on the rating difference between players. Losing to a lower-ranked opponent costs you more points than losing to a higher-ranked one.
Why does my rating barely change after a win?
If you play frequently, systems like Glicko-2 have a low Rating Deviation (high confidence) in your skill, meaning your rating will only move slightly unless you consistently beat higher-ranked players.
Can I use TrueSkill for my indie game?
TrueSkill is patented by Microsoft, so independent developers often use open-source alternatives like Glicko-2 or the Plackett-Luce model to avoid licensing restrictions.
Sources
[1]arXivTeam-Based Game Designers
Skill Issues: An Analysis of CS:GO Skill Rating Systems
Read on arXiv →[2]Microsoft ResearchTeam-Based Game Designers
TrueSkill: A Bayesian Skill Rating System
Read on Microsoft Research →[3]Chess-EloTraditional Matchmakers
ELO vs Other Ranking Systems: Glicko, TrueSkill, and More
Read on Chess-Elo →[4]Janzert Data BlogTeam-Based Game Designers
A Quick Rating System Comparison
Read on Janzert Data Blog →[5]r/gamedevTeam-Based Game Designers
What are the pros and cons of TrueSkill vs Glicko vs Glicko2?
Read on r/gamedev →[6]Lichess1v1 Platform Developers
Rating Systems: Elo vs Glicko-2
Read on Lichess →[7]Factlen Editorial Team1v1 Platform Developers
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →
Every angle. Every day.
Get data analysis stories with full source coverage and perspective breakdowns delivered to your inbox.








