Factlen Deep DiveRanking AlgorithmsTrade-off AnalysisJun 15, 2026, 7:20 PM· 8 min read· #2 of 2 in meta

Beyond Elo: How Glicko-2 and TrueSkill Are Rewriting the Rules of Competitive Ranking

As competitive gaming and online matchmaking explode in popularity, traditional Elo ratings are being replaced by advanced Bayesian algorithms like Glicko-2 and TrueSkill. We compare the mathematical trade-offs of these three foundational ranking systems.

By Factlen Editorial Team

Share this story

1v1 Platform Engineers 40%Multiplayer Developers 40%Traditionalists 20%

1v1 Platform Engineers: Prioritize rapid convergence and uncertainty tracking for massive online ladders.
Multiplayer Developers: Focus on Bayesian models that can handle complex team dynamics and varying party sizes.
Traditionalists: Value the historical continuity and mathematical transparency of the classic Elo system.

What's not represented

· Casual players who find complex rating systems opaque and frustrating.
· Esports tournament organizers who manually seed brackets outside of automated algorithms.

Why this matters

Whether you are playing online chess, queuing for a multiplayer video game, or being evaluated by a corporate algorithm, the mathematical model judging your skill determines your experience. Understanding how these systems calculate your worth reveals why some platforms feel fair and others feel endlessly frustrating.

Key points

Elo provides a single, easy-to-understand number but fails to track rating uncertainty.
Glicko-2 introduces Rating Deviation, allowing the system to adapt quickly to new or returning players.
TrueSkill uses Bayesian inference to evaluate individual contributions within multiplayer team games.
Online chess platforms have largely abandoned Elo in favor of Glicko-based systems.
TrueSkill remains proprietary to Microsoft, limiting its use in open-source projects.

1500

Default starting rating in Elo and Glicko

95%

Confidence interval tracked by Glicko's RD

Initial mean skill value in TrueSkill

62%

TrueSkill accuracy in CS:GO matchmaking

The quest to mathematically quantify human skill is as old as competition itself. For decades, the gold standard has been the Elo rating system, a formula devised in 1960 by physics professor Arpad Elo to rank chess players. By assigning a single number to represent a competitor's strength, Elo created a zero-sum economy where points are exchanged after every match. If a grandmaster defeats a novice, the rating exchange is minimal; if the novice pulls off an upset, the point transfer is massive. This elegant logic made Elo the foundational algorithm for everything from international chess to early video games and even dating apps.[5]

But the explosion of online gaming has pushed Arpad Elo's mathematical model to its breaking point. Modern matchmaking systems must process millions of games a day, dealing with new accounts, returning veterans, and complex team dynamics. A single number is no longer enough to capture the nuances of digital competition. In response, data scientists and engineers have developed sophisticated successors. Today, the competitive landscape is dominated by a fascinating three-way philosophical battle between the traditional Elo framework, Mark Glickman's highly adaptive Glicko-2, and Microsoft's proprietary TrueSkill algorithm.[1][3]

When evaluating the case for Elo, its primary advantage lies in its absolute transparency and historical continuity. The math is straightforward enough to be calculated on a napkin. Players understand exactly what is at stake before a match begins, and the system requires minimal computational overhead to maintain. This simplicity fosters a sense of trust; there are no hidden variables suppressing a player's climb up the ladder. For traditional over-the-board tournaments where players compete in a closed, highly regulated environment, Elo remains a perfectly calibrated instrument.[4][5]

The case against Elo centers on its inability to measure its own ignorance. In the Elo system, a player who has maintained a 1500 rating over five hundred games is treated exactly the same as a rookie who was just assigned a 1500 rating upon account creation. Elo has no mechanism to track uncertainty. Furthermore, it is notoriously slow to adapt. If a player takes a five-year hiatus and returns with diminished skills, Elo will stubbornly drain their rating one painful loss at a time, ruining the matchmaking experience for everyone involved in those corrective matches.[3][5]

How the three major algorithms represent a player's skill.

The evidence regarding Elo's modern viability is mixed. The International Chess Federation (FIDE) continues to use it, preserving decades of historical records and title qualifications. However, almost every major online platform has abandoned it. Data from massive multiplayer environments shows that Elo requires an impractical number of matches to converge on a player's true skill level. In fast-paced digital ecosystems where players expect fair matches immediately, Elo's slow convergence rate is a fatal flaw that leads to rampant mismatched lobbies and frustrated user bases.[3][4]

To solve these exact problems, the Glicko system—and its successor, Glicko-2—was introduced. The case for Glicko-2 rests on a concept called Rating Deviation (RD). Instead of just tracking a player's skill, Glicko-2 tracks how confident the algorithm is in that skill. A player's true strength is represented as a 95 percent confidence interval. If a player is new or returning from a long break, their RD is high, allowing their rating to swing wildly to quickly find their correct placement. As they play more consistently, the RD shrinks, stabilizing their rank.[3][5]

The case against Glicko-2 is largely psychological rather than mathematical. Because the system locks in on a player's true skill and reduces their RD, highly active players often find themselves in a state of rating stagnation. A hard-fought victory might yield only a fraction of a point if the system is already highly confident in the player's placement. This can feel unrewarding to users who view ratings as a progression system rather than a strict measurement tool. Additionally, like Elo, Glicko-2 is fundamentally designed for one-on-one competition and struggles to parse team dynamics.[3][5]

The case against Glicko-2 is largely psychological rather than mathematical.

The evidence supporting Glicko-2 is overwhelming in the realm of one-on-one digital competition. It is the engine powering the world's largest chess servers, including Lichess and Chess.com, processing millions of matches daily. Empirical analyses demonstrate that Glicko-2 predicts match outcomes with significantly higher accuracy than Elo, particularly in the crucial first fifty games of a new user's lifecycle. By aggressively adjusting the ratings of uncertain players, it effectively neutralizes highly skilled players on new accounts far faster than traditional methods, ensuring that the broader competitive ecosystem remains balanced and fair for genuine beginners.[2][3]

Glicko-2's uncertainty tracking allows it to find a player's true skill significantly faster than traditional Elo.

But what happens when competition moves beyond one-on-one duels? The case for TrueSkill, developed by Microsoft Research in 2005, is built entirely around the complexities of team-based multiplayer games. TrueSkill utilizes Bayesian inference and a Gaussian factor graph to evaluate matches where multiple players form heterogeneous teams. It can look at a four-versus-four match, analyze the individual skill bands of all eight participants, and accurately distribute rating updates based on the specific composition of the teams, even handling scenarios where players quit mid-match.[1][2]

The case against TrueSkill is rooted in its opacity and its proprietary nature. The mathematics behind TrueSkill are intensely complex, making it impossible for a player to manually verify why their rating moved the way it did. This black-box nature can breed community resentment. Furthermore, TrueSkill is patented by Microsoft. While it powers massive franchises like Halo and Gears of War, independent developers and open-source projects cannot legally use it without licensing agreements, forcing them to rely on public domain alternatives or build their own Bayesian approximations.[1][2]

The evidence for TrueSkill's efficacy in team environments is heavily documented. Microsoft's internal data from the Halo 2 beta testing phase showed that TrueSkill substantially outperformed Elo in predicting multiplayer outcomes. Independent researchers analyzing games like Counter-Strike: Global Offensive observed a 62 percent accuracy rate in match prediction using TrueSkill models. In 2018, Microsoft pushed the envelope further with TrueSkill 2, which incorporates granular in-game metrics like individual kills, deaths, and quit rates to accelerate skill convergence even faster than the original algorithm.[1][2]

When comparing these systems directly, the trade-offs become a matter of application rather than absolute superiority. Elo and Glicko-2 treat every match as a monolithic zero-sum event between two entities. If a team of five plays another team of five, these systems must treat the teams as single players, failing to recognize that one player might have carried four weaker teammates. TrueSkill fundamentally deconstructs the team, updating each player's individual Gaussian distribution based on the collective outcome, making it the undisputed leader in multiplayer environments.[1][4]

Unlike Elo and Glicko, TrueSkill deconstructs team performances to evaluate individual contributions.

Conversely, when comparing transparency and computational load, the hierarchy flips. Elo requires almost zero processing power, making it ideal for lightweight applications and physical tournaments. Glicko-2 introduces moderate computational complexity to track volatility and time-based decay. TrueSkill requires heavy statistical processing, running complex message-passing algorithms across factor graphs for every single match outcome. For a solo developer building a simple web game, implementing TrueSkill is akin to using a supercomputer to balance a checkbook, representing a massive over-engineering of a simple problem.[2][4]

Ultimately, the traditional Elo system fits well when transparency is the highest priority, matches are strictly one-on-one, and the player pool is relatively small and highly active. It is the system of choice for physical, over-the-board tournaments where prestige and historical continuity matter more than rapid mathematical convergence. It does not fit when a platform experiences high player churn, massive influxes of new accounts, or sporadic participation, as its inability to track uncertainty leads to prolonged periods of mismatched games that can alienate a modern digital audience.[4][5]

Glicko-2 fits well when running a massive, high-volume one-on-one ladder. It is the perfect algorithm for online chess, fighting games, and digital card games. Its Rating Deviation mechanic ensures that new players are quickly sorted into their appropriate skill brackets, protecting the integrity of the broader ecosystem. It does not fit when the core gameplay loop involves varying team sizes, free-for-all deathmatches, or asymmetrical competition, as its mathematical foundation is strictly bound to pairwise comparisons that cannot untangle individual contributions from a group effort.[2][3]

The trade-off between mathematical accuracy and computational overhead.

TrueSkill fits well when matchmaking involves complex team dynamics, heterogeneous party sizes, and the need to extract individual skill data from collective outcomes. It is the gold standard for modern blockbuster esports and team shooters. It does not fit when building an open-source project, a low-budget indie game, or a platform where players demand absolute transparency in how their points are calculated. In the modern era of matchmaking, choosing the right algorithm is no longer just a math problem; it is the foundational design choice that dictates the entire player experience.[1][2][6]

How we got here

1960
Arpad Elo develops the Elo rating system, which is soon adopted by the US Chess Federation.
1995
Mark Glickman invents the Glicko system to introduce rating reliability into the math.
2005
Microsoft deploys TrueSkill for Xbox Live, revolutionizing team-based matchmaking.
2018
Microsoft publishes TrueSkill 2, incorporating in-game metrics like individual kills and quit rates.

Viewpoints in depth

Traditionalists and Organizers

Argue for Elo's simplicity and historical continuity.

For traditional tournament organizers, Elo's greatest strength is its transparency. Players can calculate their own rating changes with a pocket calculator, fostering trust in the system. This camp argues that while Elo may be slow to adapt, its stability is a feature, not a bug, ensuring that prestigious titles and historical rankings are not subject to the wild volatility of modern algorithms.

Online 1v1 Platform Engineers

Champion Glicko-2 for its rapid convergence and uncertainty tracking.

Engineers running massive online ladders point out that Elo breaks down when dealing with millions of anonymous accounts. By tracking Rating Deviation, Glicko-2 allows platforms to quickly identify 'smurfs' and accurately place returning players without ruining the experience for active users. They argue that in a digital ecosystem, speed of convergence is the most critical metric for user retention.

Multiplayer Game Developers

Rely on TrueSkill and Bayesian models to handle complex team dynamics.

Modern game developers argue that 1v1 math is useless for team shooters and multiplayer arenas. They rely on Bayesian models like TrueSkill to deconstruct team performances and reward individual contributions. While acknowledging the computational cost and opacity of these systems, this camp insists that only factor-graph mathematics can accurately balance a lobby of heterogeneous teams.

What we don't know

Whether open-source Bayesian alternatives will eventually break TrueSkill's dominance in the team-game market.
How future algorithms will account for multi-faceted skills—such as a player who is a great sniper but a poor driver—within a single rating.

Key terms

Rating Deviation (RD): A measure of uncertainty in a player's rating; a higher RD means the system is less confident in the exact skill level and will allow larger rating swings.
Bayesian Inference: A statistical method that updates the probability of a hypothesis as more evidence or information becomes available.
Zero-sum game: A situation where one person's gain is exactly equal to another person's loss, typical in 1v1 matches.
Smurfing: When a highly skilled player creates a new, low-rated account to easily defeat less experienced opponents.

Frequently asked

Why is my chess.com rating different from my FIDE rating?

Chess.com uses the Glicko system and compares you against a massive online player pool, while FIDE uses traditional Elo for over-the-board games. The two systems calculate changes differently and measure against different populations.

Can TrueSkill be used for free?

No, TrueSkill is patented by Microsoft and requires a license for commercial use, though open-source Bayesian alternatives exist.

Why does my rating barely change when I win?

In systems like Glicko-2 and TrueSkill, if your Rating Deviation is very low, the system is highly confident in your skill and will only adjust it slightly after a predictable win.

Sources

[1]Microsoft ResearchMultiplayer Developers
TrueSkill Ranking System
Read on Microsoft Research →
[2]arXivMultiplayer Developers
An Empirical Analysis of Elo, Glicko2 and TrueSkill
Read on arXiv →
[3]Lichess1v1 Platform Engineers
Chess rating systems
Read on Lichess →
[4]Educational Testing ServiceMultiplayer Developers
A Survey of Ranking Systems
Read on Educational Testing Service →
[5]Tom Rocks MathsTraditionalists
The Elo and Glicko Rating Systems
Read on Tom Rocks Maths →
[6]Factlen Editorial Team
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →

Up next

Mixed Reality

Meta Quest 3 vs. Apple Vision Pro: Which Mixed Reality Headset Fits Your Needs?

The mixed reality market has split into two distinct lanes: Apple's premium spatial computer for productivity and Meta's accessible powerhouse for gaming. This side-by-side analysis breaks down the trade-offs to help you choose the right device.

Stay informed

Every angle. Every day.

Get meta stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse meta