Every team starts at 1500. After each match, both teams' ratings move up or down based on the result, adjusted by how they won (point margin, set margin, match outcome), who they played (stronger opponents = bigger reward for a win), where they played (bracket matches count more than pool play), and how much we trust the signal (more matches = more confidence).
The engine processes every match chronologically from the current season (Aug 2025 – Jul 2026). If you re-run the pipeline, the numbers are fully reproducible from the raw match CSV.
Before a match, the engine asks: given the two teams' current ratings, what's the expected win probability for each side? This is the classic ELO formula — a team rated 200 points higher has roughly a 76% expected win rate.
After the match, the engine compares the actual result to this expected result. Beating a team you were supposed to beat gains you a little. Beating a team you weren't supposed to beat gains you a lot. The gap between expected and actual is the engine of the rating change.
A volleyball match isn't just a win or loss — it's a sequence of points inside sets inside a match. The ranking uses all three:
The most granular signal. A 25–10 set transfers more rating than 25–23. Margin is scaled logarithmically so 40-point blowouts don't dominate — beyond a certain margin, extra points give diminishing returns. This protects ratings from being skewed by mercy-rule wins against vastly weaker opponents.
Winning 2–0 is a stronger signal than 2–1. The set layer rewards sweeps. A sweep multiplier of roughly 1.15–1.3× applies based on the set differential.
The binary outcome. You won or you didn't. This is the classic ELO component and guarantees that winning always moves you up, regardless of margin.
K controls the magnitude of the rating change. The bigger K, the faster ratings move. The engine uses an adaptive K:
| Stage | Description | K |
|---|---|---|
| New team | Fewer than 10 matches processed | 48 |
| Established | 10+ matches processed | 32 |
Newer teams calibrate faster — if a true top-tier team starts at 1500, higher K lets them climb to their real rating quickly. Once we have enough data on them, K drops so ratings stabilize and don't swing wildly after a single bad match.
Not every match carries equal weight. The engine applies several multipliers on top of the base rating change:
| Tier | Code | Multiplier |
|---|---|---|
| Open | O | 1.0× |
| USA | U | 0.5× |
| Club | C | 0.5× |
Open-division matches carry full weight — that's the strongest competition and the most reliable signal of team quality. USA and Club matches count for half. A win in Club doesn't tell us as much about a team's ceiling as a win in Open, so it moves the rating less.
Pool-play and bracket matches currently carry the same weight (1.0×). The engine supports a bracket bonus, but it's disabled in the current settings because the sample of scouted matches already skews toward competitive pools.
If two teams meet multiple times in the same tournament (e.g., pool then gold bracket), the engine progressively dampens the rating change so a single event doesn't dominate:
| Meeting | Multiplier |
|---|---|
| 1st | 1.0× |
| 2nd | 0.75× |
| 3rd+ | 0.5× |
Some tournaments draw stronger fields than others. Individual events can be weighted up or down by an admin-configurable multiplier (typically between 0.5× and 1.0×) to reflect field strength. The biggest events currently carry the full 1.0× weight; regional and mid-tier events are weighted lower.
A team that's played 40 matches has a rating we can trust. A team that's played 3 matches does not — those three results might be wildly atypical. Instead of showing raw ELO to everyone, the dashboard shows a confidence-blended display rating:
Confidence ramps linearly from 0 to 1 over the first 15 matches. A team with only 5 matches gets confidence = 5/15 ≈ 0.33, so its display rating is pulled about two-thirds of the way back toward the 1500 baseline. This prevents fresh teams with a lucky weekend from leapfrogging established programs on the leaderboard.
Teams with fewer than 20 matches are flagged as provisional in the rankings — their rating is shown but tagged as preliminary.
Matches are sorted by date before processing. This matters: an October win against a team that later develops into a powerhouse should use that team's October rating, not their April rating. Out-of-order processing would inflate or deflate the signal.
Sets with a 25–0 or 0–25 scoreline are treated as forfeits and excluded from ELO processing entirely. Forfeit scores would corrupt the point-differential layer.
Ratings can't drop below 800. This prevents teams from being driven into statistical oblivion after a rough stretch — they stay in the system so future results can still pull them back toward their true level.
A team that plays in both 16 Open and 16 Club (or similar) is merged into a single entry in the universal rankings. Their ELO history across divisions is stitched together chronologically and their match totals are summed.
The Tournaments page ranks every event in the database with a 0–100 Competitive Index that answers the question: how strong was the field at this tournament?
Four factors feed the score. Each factor is independently percentile-ranked across every tournament in the system, then combined with fixed weights. Percentile-ranking matters — it means the weights are comparable across factors measured in different units (ELO numbers vs. club counts vs. ratios).
| Factor | What it measures | Weight |
|---|---|---|
| Top-10 Field Strength | Average ELO of the top 10 teams that attended | 25% |
| Field Depth | Average ELO across every team at the event | 20% |
| Geographic Reach | Number of distinct clubs represented (proxy) | 35% |
| Field Quality Mix | Log-scaled field size blended with ranked-team ratio | 20% |
Geography carries the heaviest weight (35%) because it's what most separates a genuine national event from a loaded regional. A stacked 12-team invitational can have elite top-end ELO but if every team is from the same area, it's a regional battle, not a national one. The reach weight prevents locally-dominant events from gaming the index.
This factor blends size and data quality into a single 0–1 score:
The log scaling on size means a 64-team event scores roughly 1.0 on size, while 16-team and 32-team events score ~0.67 and ~0.84. The ranked-team ratio penalizes events padded with unknown-quality teams we don't yet have data on.
| Tier | Score threshold |
|---|---|
| National | 75+ |
| Super-Regional | 55 – 74 |
| Regional | 30 – 54 |
| Local | < 30 |
tournament_weights.json. The index is a public-facing signal about field strength; it is not used to compute ratings.
The pipeline is deterministic. Given the same input CSV and the same settings file, running the rebuild script produces the same ratings every time. The inputs are:
combined_all_tournaments.csv — raw match results (teams, sets, division, date, source)elo_settings.json — layer weights, K-factors, tier multipliers, thresholdstournament_weights.json — per-event quality multipliersclub_aliases.json — name normalization for clubs with duplicate spellingsFound a bug or have a suggestion? Use the Contribute page to submit an event, flag a duplicate, or share feedback.
Last updated: April 2026