Elo vs. TrueSkill: Choosing the Right Matchmaking Algorithm for Competitive Systems
From traditional chess federations to modern multiplayer shooters, competitive platforms rely on complex mathematics to ensure fair matches. Here is how the Elo, Glicko-2, and TrueSkill algorithms compare in speed, accuracy, and team dynamics.
By Factlen Editorial Team
- Multiplayer Game Developers
- Focus on complex team dynamics, asymmetric game modes, and Bayesian inference to sort massive player bases quickly.
- Online Platform Architects
- Prioritize handling player inactivity and rapid convergence through uncertainty tracking, heavily favoring Glicko-2.
- Traditional Matchmaking Advocates
- Favor transparency, historical continuity, and simple expected-outcome math, often preferring Elo for 1v1 environments.
What's not represented
- · Casual players who find visible rating volatility stressful
- · Esports professionals who feel individual performance metrics in team games encourage selfish play
Why this matters
A poorly designed ranking system leads to unbalanced matches, frustrated users, and high platform churn. Understanding the trade-offs between these algorithms is essential for developers building competitive environments and players trying to understand why their rank fluctuates.
Key points
- Elo is simple and transparent but struggles with team games and returning players.
- Glicko-2 introduces uncertainty tracking, allowing for faster convergence and better handling of inactivity.
- TrueSkill uses Bayesian inference to natively support complex team dynamics and free-for-all modes.
- TrueSkill converges significantly faster than Elo, requiring only a handful of matches to rank a new player.
- Choosing the right algorithm depends on whether the game is 1v1, team-based, or requires absolute mathematical transparency.
Matchmaking is the invisible engine of the competitive world. Whether you are queuing up for a ranked match in a modern multiplayer shooter, sitting down for an online chess tournament, or evaluating the performance of cutting-edge artificial intelligence models, a mathematical algorithm is quietly calculating your exact skill level. The goal of these systems is deceptively simple: to find two opponents of equal strength and deliver a fair, engaging contest. However, the mathematics required to achieve that goal are staggeringly complex. A poorly designed ranking system leads to unbalanced matches, frustrated players, and inevitable churn. A well-designed system, on the other hand, keeps competitors engaged by consistently placing them in the elusive flow state where challenges perfectly match their abilities. Over the past six decades, the science of skill estimation has evolved from simple linear adjustments to complex Bayesian inference models. Today, three dominant algorithms power the vast majority of competitive ladders: the classic Elo system, the statistically robust Glicko-2, and Microsoft's proprietary TrueSkill. Understanding how these algorithms compare is essential for anyone building a competitive platform.[6]
The grandfather of all modern matchmaking is the Elo rating system, invented in the 1960s by Hungarian-American physics professor Arpad Elo. Originally designed to improve the ranking of chess players for the United States Chess Federation, Elo fundamentally shifted skill estimation from an absolute measurement to a relative one. In the Elo system, performance is not measured in a vacuum; it is inferred entirely from wins, losses, and draws against other rated players. The core mechanic relies on calculating an expected outcome based on the difference in ratings between two competitors. If a highly rated player defeats a novice, the expected outcome matches the actual outcome, and very few rating points change hands. However, if the novice scores a massive upset, the system recognizes a flaw in its previous estimations and transfers a significant number of points. The sensitivity of these changes is governed by the K-factor, a multiplier that dictates how volatile a player's rating can be after a single match.[2]
When evaluating Elo, the trade-offs are clear. For: The system is mathematically transparent and incredibly simple to implement. Players can easily calculate their own expected outcomes, making it highly trusted in traditional competitive environments. Against: It fundamentally lacks a mechanism to measure uncertainty, struggles to adapt to players returning from long periods of inactivity, and cannot natively handle team-based games. Evidence: A player rated 100 points higher than their opponent is mathematically expected to win 64 percent of the time. However, if that higher-rated player hasn't competed in five years, Elo still treats their rating with the exact same confidence as an active grandmaster. This lack of uncertainty tracking means that returning players often suffer through a long string of unbalanced matches before their rating accurately reflects their current, potentially diminished, skill level.[2]

To solve the limitations of Elo, statistician Mark Glickman developed the Glicko rating system, later refining it into the modern Glicko-2 algorithm. Glicko-2 introduces a critical new dimension to skill estimation: uncertainty. Instead of representing a player's skill as a single absolute number, Glicko-2 tracks a Rating Deviation (RD) and a Rating Volatility metric. The RD effectively creates a confidence interval around the player's rating. When a player competes frequently, the system becomes highly confident in their exact skill, and the RD shrinks. When a player takes a long break from the game, the system's confidence decays, and the RD expands. The Volatility metric further refines this by tracking how consistently a player performs; a highly erratic player who frequently scores upsets will have a higher volatility than a rock-solid veteran who always beats lower-rated opponents and loses to higher-rated ones.[3][5]
Analyzing Glicko-2 reveals a more sophisticated statistical engine. For: It explicitly models uncertainty through Rating Deviation and skill fluctuation through Volatility, allowing for massive rating corrections when a player's actual performance deviates from expectations. Against: The mathematics are significantly more complex than Elo, requiring the concept of rating periods to process batches of games, and it remains fundamentally designed for zero-sum, two-player environments. Evidence: Lichess, a premier online chess server, utilizes Glicko-2 to assign new players a starting rating of 1500 with an RD of 1000. This massive deviation means the system is 95 percent confident the player's true skill lies somewhere in the massive range between 500 and 2500. Because the RD is so high, the algorithm allows the player's rating to swing by hundreds of points in their first few matches, rapidly zeroing in on their true skill level in a fraction of the time it would take a standard Elo system.[4][5]
Analyzing Glicko-2 reveals a more sophisticated statistical engine.
While Glicko-2 perfected the one-versus-one environment, the explosion of online multiplayer gaming in the 2000s created a new mathematical crisis: how do you rank players in a four-versus-four team game, or an eight-player free-for-all? Enter TrueSkill, a Bayesian ranking system developed by Microsoft Research for the Xbox Live network. TrueSkill abandons the traditional rating formulas entirely, instead modeling every player's skill as a Gaussian distribution characterized by a mean and a standard deviation. When a match concludes, TrueSkill uses approximate message passing on a factor graph to update the skill distributions of every participant simultaneously. Crucially, TrueSkill models a team's overall strength as the sum of its individual players' skill distributions. This allows the algorithm to look at a team's victory and mathematically distribute the credit among the individual players based on their prior skill levels and uncertainty.[1]
Microsoft's TrueSkill shifts the paradigm entirely toward Bayesian inference. For: It natively supports any combination of team sizes, free-for-all formats, and asymmetric matches, while converging on a player's true skill with unprecedented speed. Against: The algorithm is proprietary, mathematically opaque to the average player, and heavily patented, restricting its use in commercial indie projects outside of specific licensing agreements. Evidence: According to Microsoft Research, TrueSkill is astonishingly efficient at processing complex match outcomes. It requires only three matches to accurately estimate the skills of players in an eight-player free-for-all, a feat that would take traditional one-versus-one algorithms dozens of paired comparisons to achieve. To prevent players from boasting about unearned ranks, the public-facing TrueSkill leaderboard displays a conservative estimate calculated as the mean minus three times the standard deviation, ensuring that players must reduce their uncertainty through consistent play before climbing the ranks.[1]
When comparing the convergence speed of these three algorithms, the differences are stark. Convergence speed refers to how many matches a system needs to accurately place a brand-new player into their correct skill bracket. Elo is notoriously slow; because it relies on a fixed K-factor, a highly skilled player creating a new account must grind through dozens of low-level matches before their rating catches up to their actual ability. Glicko-2 solves this by using its Rating Deviation to allow massive initial jumps, usually finding a player's true rank within 10 to 15 matches. TrueSkill, however, is the undisputed king of convergence. Because it updates its Bayesian probabilities after every single event, and can extract data from multi-player free-for-alls where a single match provides multiple data points of comparison, it can pinpoint a player's skill in just a handful of games.[1][3][5]

The handling of team dynamics is another major fault line between the algorithms. Elo and Glicko-2 were fundamentally built for duels. Attempting to use them for a five-versus-five competitive shooter requires developers to average the ratings of the team members, treat the match as a single one-versus-one entity, and then distribute the resulting rating change equally among the players. This crude workaround fails to account for the fact that a team might consist of four veterans and one novice. TrueSkill's additive variance model natively understands that a team's performance is a composite of its parts. Furthermore, newer iterations like TrueSkill 2 have begun incorporating individual performance metrics—such as kill counts, objective time, and even a player's propensity to quit a match early—to further refine how much credit an individual deserves for a team's win.[1][6]
Ultimately, choosing the right algorithm requires matching the math to the environment. Elo fits well when transparency is the highest priority, the game is strictly one-versus-one, and the player base expects to understand exactly how many points are at stake before a match begins. It is the gold standard for traditional, over-the-board chess federations where mathematical simplicity and historical continuity are valued above rapid adaptation. Elo does not fit when the game involves teams, or when the platform experiences high player churn and long periods of user inactivity, as it cannot gracefully handle returning players whose skills have decayed.[2][6]

Glicko-2 fits well when managing large-scale, one-versus-one online ladders where players frequently take breaks and return. By explicitly increasing a player's Rating Deviation during periods of inactivity, it ensures that returning competitors are quickly re-sorted to their current skill level rather than their historical peak. This makes it the ideal engine for online chess servers, competitive puzzle games, and digital card games. Glicko-2 does not fit when the core gameplay relies on multiplayer team dynamics, as adapting its confidence intervals for multi-agent scenarios requires messy, non-standard workarounds that dilute its statistical accuracy.[3][4][6]
TrueSkill fits well when powering modern multiplayer video games, from squad-based tactical shooters to massive online racing simulators. Its unique ability to model a team as the sum of its individual players' skill distributions, combined with its lightning-fast convergence in free-for-all game modes, makes it the undisputed leader for complex matchmaking environments where speed is critical. TrueSkill does not fit when developers are building commercial indie products outside the Microsoft ecosystem, as the algorithm is heavily patented and requires strict licensing agreements. Furthermore, it is a poor choice when a competitive community demands absolute mathematical simplicity and transparency in how their public leaderboard is calculated, as the underlying Bayesian math is entirely opaque to the average user.[1][6]
How we got here
1960
The United States Chess Federation officially adopts the Elo rating system.
1995
Statistician Mark Glickman invents the Glicko rating system to introduce uncertainty tracking.
2005
Microsoft Research develops TrueSkill to handle complex multiplayer matchmaking for Xbox Live.
2018
Microsoft introduces TrueSkill 2, incorporating individual performance metrics like kill counts and quit rates.
Viewpoints in depth
The Traditionalist View
Why absolute transparency matters more than statistical perfection.
For traditional chess federations and purist competitive communities, the value of a rating system lies in its legibility. Players want to sit down at a board, look at their opponent's rating, and calculate exactly how many points are on the line using simple arithmetic. While Elo may be slower to converge and blind to inactivity, its zero-sum simplicity ensures that every point gained was visibly lost by someone else, fostering a sense of absolute fairness that complex Bayesian models struggle to replicate.
The Modern Developer View
Why uncertainty tracking is mandatory for online platforms.
Modern online games process millions of matches a day, with players constantly joining, quitting, and taking months-long breaks. Developers argue that treating a returning player's historical rating as absolute truth ruins the matchmaking experience for everyone involved. By embracing uncertainty metrics—whether through Glicko-2's Rating Deviation or TrueSkill's standard deviation—developers can intentionally create volatile rating swings that quickly push misplaced players out of the wrong skill brackets, sacrificing mathematical simplicity for a vastly superior user experience.
What we don't know
- How heavily modern proprietary systems like TrueSkill 2 weight individual performance metrics versus pure win/loss outcomes.
- Whether open-source alternatives to TrueSkill will eventually match its efficiency in handling massive, asymmetric team games.
Key terms
- K-factor
- The multiplier in the Elo system that determines the maximum number of rating points a player can win or lose in a single match.
- Rating Deviation (RD)
- A measure of uncertainty in Glicko-2; a higher RD means the system is less confident in the player's exact skill, allowing for larger rating swings.
- Bayesian Inference
- A statistical method used by TrueSkill to update the probability of a player's skill level as new match data arrives.
- Zero-sum game
- A competitive scenario where one player's gain is exactly equal to another player's loss, which forms the mathematical basis of the Elo system.
Frequently asked
Why do my ratings differ across different chess websites?
Different platforms use different algorithms. Chess.com uses a modified Elo or Glicko system, while Lichess strictly uses Glicko-2, leading to different baseline numbers and volatility.
Can TrueSkill be used for 1v1 games?
Yes. While designed for complex team games, TrueSkill works perfectly for 1v1 matches, though its proprietary license limits commercial use outside of Microsoft.
Why does my rating change so much when I make a new account?
Algorithms like Glicko-2 and TrueSkill start new players with high uncertainty. This causes massive rating swings initially to quickly find your true skill level.
Sources
[1]Microsoft ResearchMultiplayer Game Developers
TrueSkill Ranking System
Read on Microsoft Research →[2]WikipediaTraditional Matchmaking Advocates
Elo rating system
Read on Wikipedia →[3]WikipediaTraditional Matchmaking Advocates
Glicko rating system
Read on Wikipedia →[4]LichessOnline Platform Architects
Lichess Rating Systems FAQ
Read on Lichess →[5]Glicko.netOnline Platform Architects
Example of the Glicko-2 system
Read on Glicko.net →[6]Factlen Editorial TeamMultiplayer Game Developers
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →
Every angle. Every day.
Get data analysis stories with full source coverage and perspective breakdowns delivered to your inbox.





