Elo vs. Glicko-2 vs. TrueSkill: The Invisible Math Behind Matchmaking
Competitive gaming relies on complex algorithms to balance matches and rank players. We compare the trade-offs of the three foundational systems dictating your digital rank.
By Factlen Editorial Team
- Mathematical Purists
- Prioritizing predictive accuracy and rapid convergence over user experience.
- Player Psychology Advocates
- Focusing on the emotional reward of the grind and the clarity of the leaderboard.
- System Architects
- Balancing computational overhead with the specific needs of the game mode.
What's not represented
- · Casual players who do not play ranked modes but are still affected by hidden matchmaking ratings (MMR).
- · Esports tournament organizers who must manually seed brackets when automated systems fail.
Why this matters
Every time you queue for a competitive online game, an invisible mathematical tribunal decides your fate. Understanding whether a game uses Elo, Glicko-2, or TrueSkill explains why your rank sometimes feels 'stuck,' why returning after a break is so punishing, and how developers balance fairness against the dopamine of the grind.
Key points
- The Elo system, invented in 1960, remains the transparent standard for 1v1 games but struggles to quickly rank new players.
- Glicko-2 solves Elo's slow convergence by tracking 'Rating Deviation,' allowing inactive or new players to calibrate rapidly.
- Microsoft's TrueSkill uses Bayesian inference to natively handle team games and free-for-alls, deducing individual skill from team results.
- While mathematically superior, complex systems like TrueSkill and Glicko-2 can frustrate players who feel their rank becomes 'stuck' over time.
Every time a player clicks "Find Match" in a competitive online game, they surrender themselves to an invisible, highly complex mathematical tribunal. Behind the spinning loading icon lies a sophisticated algorithm tasked with an impossible balancing act: finding an opponent of exactly equal skill, ensuring the match is fair, and updating a global leaderboard in real-time. This is the domain of skill rating systems, a field of mathematics that has evolved from quiet chess clubs in the mid-20th century to the sprawling server farms powering modern esports. The numbers next to a player's name are not just a reward; they are a predictive model. But how those models calculate human capability varies wildly, sparking intense debate among game developers, statisticians, and the players themselves.[1][2]
The stakes for getting this math right are existential for competitive platforms. If a system is too volatile, players feel their rank is based on luck rather than skill. If it is too rigid, players hit a wall, feeling trapped in a rank where their improvement goes unrewarded. Furthermore, modern gaming has moved far beyond the one-on-one duels of chess. Today's algorithms must account for teams of varying sizes, players dropping out mid-match, and the chaotic variables of free-for-all arenas. To solve this, the industry relies primarily on three foundational frameworks: the classic Elo system, the volatility-aware Glicko-2, and Microsoft's Bayesian powerhouse, TrueSkill.[1][3][4]
The grandfather of all modern matchmaking is the Elo rating system, invented in 1960 by Hungarian-American physics professor and chess master Arpad Elo. Designed specifically to rank chess players in one-on-one settings, Elo operates on a brilliantly simple premise: every match has an expected outcome based on the difference in the two players' current ratings. If a grandmaster plays a novice, the system expects the grandmaster to win. If the expected outcome occurs, very few points change hands. But if the novice pulls off an upset, a massive transfer of points occurs. It is a zero-sum economy where the winner takes exactly what the loser sheds.[2][4]
**The Case For Elo:** The enduring appeal of the Elo system lies in its absolute transparency and computational simplicity. Because it only requires the pre-match ratings of two players to calculate the post-match adjustment, it can be run on a pocket calculator. Players intuitively understand the stakes: beating a higher-ranked opponent yields a massive reward, while losing to a lower-ranked one is disastrous. This creates a highly dramatic, easily readable narrative for leaderboards, which is why it remains the gold standard for traditional sports, tabletop games, and simple digital duels.[1][2]

**The Case Against Elo:** However, Elo's simplicity is also its fatal flaw in the digital age. The system has no concept of "uncertainty." It treats a player who has maintained a 1500 rating over 10,000 matches exactly the same as a brand-new account that just calibrated to 1500 after five lucky placement games. Because Elo adjusts ratings at a fixed, flat rate, it takes an agonizingly long time for a highly skilled player on a new account to climb to their accurate rank, leaving a trail of unfairly crushed novices in their wake.[2][4]
**The Evidence:** A 2024 empirical analysis of rating systems published on arXiv highlighted this exact deficiency. When researchers modeled Elo against modern datasets, they found its predictive accuracy lagged significantly during the early stages of a player's lifecycle. It simply requires too many data points to converge on a true skill level. Furthermore, because Elo was built strictly for 1v1 encounters, attempting to average the Elo of five players to create a team rating mathematically breaks down, failing to account for individual carry performances or weak links.[2]
**Fits well when / Does not fit when:** Elo fits well when you are building a simple, one-on-one competitive environment where computational resources are highly limited, transparency is paramount, and players compete regularly without long absences. It thrives in local chess clubs and straightforward digital duels where players want to manually calculate their stakes before a match begins. It does not fit when your platform features team-based modes, free-for-all matches, or a player base that frequently takes long breaks, as the system mathematically cannot adjust for the rust a player accumulates during time off.[1][2]
Recognizing Elo's limitations, statistician Mark Glickman developed the Glicko system in 1995, and later its successor, Glicko-2. Glickman's breakthrough was introducing the concept of a "confidence interval" to a player's rating. In Glicko-2, your rank is not just a single number; it is a combination of your Rating, your Rating Deviation (RD), and your Volatility. RD measures how confident the system is in your rating. If you play every day, your RD shrinks, and your rating becomes highly stable. If you stop playing for six months, your RD expands, meaning the system admits it no longer knows how good you are.[4][5]
Recognizing Elo's limitations, statistician Mark Glickman developed the Glicko system in 1995, and later its successor, Glicko-2.
**The Case For Glicko-2:** By tracking uncertainty, Glicko-2 solves the "new account" problem beautifully. A new player starts with a massive RD, meaning their first few wins or losses will swing their rating wildly, allowing them to reach their appropriate skill bracket in a fraction of the time it would take under Elo. Furthermore, the Volatility metric tracks consistency. If a highly stable player suddenly starts playing erratically—perhaps trying a new strategy or playing under the influence—the system detects the anomaly and increases their volatility, allowing their rank to adjust more fluidly until they stabilize again.[4][6]
**The Case Against Glicko-2:** The psychological downside of Glicko-2 is what players often refer to as the "two meters of mud" effect. Once a player grinds hundreds of matches and their RD drops to its absolute minimum, the system becomes supremely confident in their rank. At this point, winning a match yields microscopic rating gains. Players can feel trapped, complaining that the algorithm has stubbornly decided their skill ceiling and refuses to let them climb, even if they go on a moderate winning streak. The math is accurate, but the user experience can feel unrewarding.[5]

**The Evidence:** Despite the psychological friction, the mathematical superiority of Glicko-2 has led to massive industry adoption. When Valve overhauled the matchmaking system for Dota 2, they explicitly cited the shift to a modified Glicko algorithm to solve "undesired clumping" in the lower MMR brackets. They noted that returning players were previously ruining matches because their old Elo ratings no longer reflected their degraded skill. Glicko's expanding RD for inactive players instantly solved this, forcing returning veterans into high-stakes calibration matches to prove they still had it.[5][6]
**Fits well when / Does not fit when:** Glicko-2 fits well when you are running a modern, large-scale one-on-one or solo-queue ladder where players have highly variable play frequencies, taking extended breaks and returning months later. It provides incredibly fast convergence for new accounts, protecting your veteran player base from being stomped by under-ranked newcomers. It does not fit when you need to rank heterogeneous teams, as the math becomes exponentially complicated when trying to calculate the overlapping confidence intervals of ten different players in a single lobby.[4][5]
This brings us to the heavyweight champion of team-based matchmaking: TrueSkill. Developed by Microsoft Research in 2007 specifically for the launch of Halo 2 on Xbox Live, TrueSkill abandoned the logistic probability models of Elo and Glicko entirely. Instead, it uses Bayesian inference and complex factor graphs. TrueSkill was built from the ground up to answer a question the other systems couldn't: If Team A beats Team B in a 4v4 match, how much of that victory belongs to the sniper who got 20 kills, and how much belongs to the driver who got zero?[3][5]
**The Case For TrueSkill:** TrueSkill is a mathematical marvel for complex game modes. It natively supports multi-team games, free-for-alls, and asymmetric matches (like 2v3). By using message-passing algorithms across a network graph of the match, TrueSkill can deduce individual skill even when a player is buried inside a team result. It can look at a team's victory, analyze the expected performance of each individual, and reward the standout player heavily while giving minimal points to the teammate who was essentially carried to the win.[2][3]
**The Case Against TrueSkill:** The primary drawback of TrueSkill is its immense computational complexity. Updating a TrueSkill rating requires solving complex integrals, which demands significantly more server processing power than the simple arithmetic of Elo. Furthermore, because the algorithm is so complex, it is entirely opaque to the player. A user might win a match but see their rank barely move—or even drop slightly in edge cases—because the Bayesian model determined their individual performance was statistically poorer than expected despite the team victory. This can lead to intense community frustration.[1][3][5]

**The Evidence:** The real-world performance of TrueSkill is staggering. Microsoft's internal data from the Halo 2 beta showed that TrueSkill could accurately identify a player's rank in just 10 to 20 games, a process that would take Elo over 50 games. More recently, the 2024 arXiv empirical analysis evaluated TrueSkill's performance on a dataset of Counter-Strike: Global Offensive matches. The researchers found that TrueSkill maintained a 62% prediction accuracy in highly chaotic, heterogeneous team environments—a scenario where traditional Elo models essentially devolved into random guessing.[2][3]
**Fits well when / Does not fit when:** TrueSkill fits well when you are building a modern, team-based shooter, a battle royale, or any game where players frequently queue up in groups of varying skill levels. It is the definitive mathematical solution for multi-player chaos and asymmetric lobbies. It does not fit when you are building a simple one-on-one game like digital chess or a traditional fighting game, where the massive computational overhead of Bayesian factor graphs provides almost no predictive advantage over a much lighter, easier-to-maintain Glicko-2 implementation.[1][2][3]
Ultimately, the choice between Elo, Glicko-2, and TrueSkill is not just a mathematical decision; it is a psychological one. Developers must weigh the desire for predictive perfection against the player's need for a rewarding grind. Elo offers the thrill of the climb but fails at accuracy. TrueSkill offers flawless accuracy but can feel like an inscrutable black box. Glicko-2 sits in the middle, knowing exactly how good you are, and knowing exactly when it's time to admit it doesn't know you at all. In the invisible war of matchmaking, the best algorithm is the one that keeps the player hitting "Find Match" one more time.[1][5]
How we got here
1960
Arpad Elo invents the Elo rating system for one-on-one chess matches.
1995
Mark Glickman introduces the Glicko system, adding Rating Deviation (RD) to track uncertainty.
2007
Microsoft Research deploys TrueSkill for Halo 2, introducing Bayesian inference for team matchmaking.
2018
Microsoft publishes TrueSkill 2, incorporating player experience and quit-rates into the math.
2023
Valve overhauls Dota 2's matchmaking, shifting to a modified Glicko system to fix lower-bracket clumping.
Viewpoints in depth
Mathematical Purists
Prioritizing predictive accuracy and rapid convergence over user experience.
For statisticians and researchers, a rating system is purely a predictive model. They champion TrueSkill and Glicko-2 because these systems track uncertainty and volatility, allowing the algorithm to mathematically prove a player's skill in a fraction of the time it takes Elo. To this camp, the fact that a player feels 'stuck' is irrelevant if the math accurately reflects their win probability.
Player Psychology Advocates
Focusing on the emotional reward of the grind and the clarity of the leaderboard.
Community managers and players often push back against hyper-accurate Bayesian models. They argue that games are meant to be fun, and a system that rigidly locks a player into a rank (the 'two meters of mud' effect) kills motivation. This camp often prefers the transparent, zero-sum swings of Elo, where a lucky winning streak yields a massive, dopamine-inducing rank increase, even if it temporarily breaks the system's predictive accuracy.
System Architects
Balancing computational overhead with the specific needs of the game mode.
For the engineers actually running the servers, the choice of algorithm is a resource allocation problem. While TrueSkill is a marvel for team games, its complex integrals require significant processing power. Architects advocate for matching the math to the mode: deploying lightweight Elo or Glicko-2 for 1v1 fighting games, and reserving the heavy computational lifting of TrueSkill strictly for chaotic, multi-team shooters where simpler systems fail.
What we don't know
- How next-generation AI models will alter matchmaking by analyzing in-game telemetry (like mouse movement) rather than just win/loss results.
- Whether the industry will ever settle on a universal, open-source standard for multi-player skill rating.
Key terms
- Rating Deviation (RD)
- A measure of a system's uncertainty about a player's true skill, which increases during periods of inactivity.
- Volatility
- A metric in Glicko-2 that tracks how consistent a player's performance is over time.
- Bayesian Inference
- A statistical method used by TrueSkill that updates the probability of a player's skill level as more match data becomes available.
- Zero-Sum System
- A ranking model where the exact number of points gained by the winner is lost by the loser.
Frequently asked
Why do I lose more points than I gain for a win?
If the system expects you to win based on a higher rating, a loss is considered an upset and penalizes you heavily, while a predicted win yields minimal points.
Why does my rank feel 'stuck' in Glicko-2?
As you play more games consistently, the system's confidence in your rank increases (low RD), meaning it requires a significant streak of unexpected results to shift your rating.
Can Elo be used for team games?
Not natively. While some games average the Elo of team members, the system cannot mathematically determine which individual player contributed most to a team's win or loss.
Sources
[1]Factlen Editorial TeamSystem Architects
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →[2]arXivMathematical Purists
Empirical Analysis of Elo, Glicko2 and TrueSkill through Surrogate Modelling
Read on arXiv →[3]Microsoft ResearchMathematical Purists
TrueSkill Ranking System
Read on Microsoft Research →[4]Glicko.netMathematical Purists
The Glicko and Glicko-2 Rating Systems
Read on Glicko.net →[5]Hacker NewsPlayer Psychology Advocates
Discussion on TrueSkill vs Glicko
Read on Hacker News →[6]RedditPlayer Psychology Advocates
Elo vs. Glicko-2 (Melee Top 25)
Read on Reddit →
Every angle. Every day.
Get meta stories with full source coverage and perspective breakdowns delivered to your inbox.







