Ratings System Mission Statement

WGPO Ratings System Mission Statement

  1. The WGPO Ratings System (“the System”) shall be used to maintain ratings for all WGPO members who participate in WGPO events.
    1. The initial primary focus of the System is to rate Scrabble Brand Crossword Game(R) (“Scrabble”) games played under a standard set of playing conditions. These standards may change over time, as determined by the WGPO rules committee.
      1. The ratings committee reserves the option to deem as “standard” any event whose playing conditions are deemed as differing only slightly from what is considered standard. Tournament directors will be encouraged to contact the ratings committee to determine whether a proposed Scrabble event can be rated as “standard”.
      2. The System may handle, when practical, events not played under standard conditions using separate ratings. For example, tournaments using a significantly different word source (e.g. Collins) may be handled under a separate Collins ratings. (A WGPO member having played in both WGPO standard and WGPO Collins events could have two separate ratings.) Further possibilities are to rated different word games such as anagrams, or to rate Scrabble variants including those in which a team may consist of multiple players. Again, directors are encouraged to contact the ratings committee to determine whether a proposed word game event can be rated under an existing or new rating.
      3. The ratings committee reserves the option to maintain combined-conditions ratings that would rate events played under multiple playing conditions. For example, a comprehensive Scrabble rating may be used to rate all one-on-one Scrabble events, regardless of time control, dictionary, or challenge rules, etc.
    2. As much, or as soon, as is practical, the System shall rely solely on WGPO event data to maintain WGPO member ratings.
      1. After a WGPO member has participated in their first WGPO event, their rating shall afterwards be affected only by participation in subsequent WGPO events.
      2. The initial rating of a first-time WGPO event participant may be based on information outside of WGPO (e.g. past performance in non-WGPO events). This is a short-term practical measure in order to avoid a situation where every, or nearly every, event participant is treated as unrated.
  2. A primary goal of the System is to achieve the most “accurate” player ratings possible using all commonly reported WGPO game data.
    1. Players are deemed to be rated “accurately” when the System, along with the ratings of the players, successfully predicts the win probabilities in games between those players. When a potential System improvement is proposed, mathematical study and/or simulation of actual tournament data will be used to objectively determine whether the proposal will lead to more accurate ratings (more accurate win probability estimation). New features that can be objectively shown to increase System accuracy may be implemented, and features that can be shown to decrease System accuracy may be eliminated.
    2. Commonly reported WGPO game data include player names, game date and time (or round number), order of play (who went first), and final game scores. From these data, the System may derive secondary information (e.g. a player’s rating, a player’s rating uncertainty based on their frequency of play within the past year, etc.) that can be used to improve System accuracy.
    3. Tournament directors are strongly encouraged to record and report game scores for every WGPO event game (as opposed to only W/L/T results). Furthermore, directors are encouraged, if possible, to record and report first/seconds, as this may further aid the development of significant System improvements.
  3. In the long-term, the System shall implement, through adoption and implementation of “benchmarks”, a method to maintain ratings system “stability”.
    1. A ratings system is deemed to be “stable” when at least one rating, and at least one rating difference mean approximately the same thing from era to era. Historically, Scrabble ratings systems have experienced both ratings deflation and ratings “sprawl” (ratings have both gone down, and spread out over the course of decades). The Rating System “stability” shall be continuously measured and enforced against “benchmarks” that are adopted by the ratings committee.
    2. The ratings committee shall adopt and enforce ratings system “benchmarks”. One benchmark shall deem a specific rating as having a particular meaning (relative to the strength of players in that era). The other benchmark shall deem a specific rating difference as representing a particular win probability for the higher-rated player.
    3. While any benchmarks chosen may lead to long-term stability of the System, the ratings committee shall strive to choose benchmarks that match the long-term goals of WGPO. For example, if a goal of WGPO is to attempt to welcome many “living room players” into our organization, then the committee shall avoid benchmarks that may result in negative ratings for such players, as negative ratings, while mathematically sound, may be a deterrent to future WGPO participation.
    4. The ideal method for the System to maintain stability is through small-scale post-game corrections for each player’s rating. However, should the ratings committee decide to adopt benchmarks which are significantly different from current ratings, a one-time across-the-board adjustment may be made to all ratings.
  4. The ratings committee reserves the option to develop and implement a future System whose structure may be significantly different from the current Elo-style System.
    1. The current System utilizes a single number to rate each player, with that rating representing an estimated strength relative to other players’ ratings.
    2. Such a proposed new System must be shown to achieve superior accuracy and stability while maintaining an inherent underlying meaning for players.