Paired Comparisons Analysis

Next: Analysis of Results Up: Statistical Relative Strength (RS) Previous: Statistical Relative Strength (RS)

Paired Comparisons Analysis

The goal of paired comparison statistics is to deduce a ranking from an uneven matrix of observed results, from which the contestants can be sorted from best to worst. In the knowledge that crushing all the complexities of the situation into just one number is a large simplification, one wishes to have the best one-dimensional explanation of the data.

Each game between two players (P_i, P_j) can be thought of as a random experiment where there is a probability p_ij that P_i will win. Games actually observed are thus instances of a binomial distribution experiment: Any sample of n games between P_i and P_j occurs with a probability of

$\begin{displaymath} P(\text {sample})=p_{ij}^{w_{ij}}(1-p_{ij})^{n-w_{ij}} \end{displaymath}$

(3.5)

where w_ij is the number of wins by player P_i.

We wish to assign a relative strength (RS) parameter $\lambda _{i}$ to each of the players involved in a tournament, where $\lambda _{i}>\lambda _{j}$ implies that player P_i is better than player P_j.

A probability function F such that F(0)=0.5 and F(x)=1-F(-x)(for all $x\in \Re$ ) is chosen arbitrarily; following [73] we use the logistic function

$\begin{displaymath} F(x)=\frac{1}{1+e^{-x}} \end{displaymath}$

(3.6)

The model describes the probabilities p_ij as a function of the RS parameter $\lambda _{i}$ for each player:

$\begin{displaymath} p_{ij}=F(\lambda _{i}-\lambda _{j}) \end{displaymath}$

(3.7)

so the outcome of a game is a probabilistic function of the difference between both opponent's strengths. The conditions imposed on F imply that players with equal strength are estimated to be equally likely to win or lose, and that the probability of P_i winning is equal to that of P_j losing.

The observed data is a long sequence of games between opponent pairs, each one a either a win or a loss. According to eq. 3.7, the probability of that particular sequence was

$\begin{displaymath} P=\prod _{i,j}F(\lambda _{i}-\lambda _{j})^{w_{ij}}\left( 1-F(\lambda _{i}-\lambda _{j})\right) ^{n_{ij}-w_{ij}} \end{displaymath}$

(3.8)

for any choice of $\lambda$ _i's.The set of $\lambda$ _i's that best explains the observations is thus the one that maximizes this probability. The well known method of maximum likelihood can be applied to find the maximum for eq. 3.8, generating a large set of implicit simultaneous equations on $\lambda _{1},\ldots \lambda _{M}$ that are solved by the Newton-Raphson algorithm.

An important consideration is, the $\lambda$ _i's are not the true indeterminates, for the equations involve only paired differences, $\lambda _{i}-\lambda _{j}$ . One point has to be chosen arbitrarily to be the zero of the RS scale.

A similar method permits assigning a rating to the performance of any smaller sample of observations (one player for example): fixing all the $\lambda$ _i's on equation (3.8), except one, we obtain

$\begin{displaymath} \text {wins}=\sum _{i}F(\lambda -\lambda _{i}) \end{displaymath}$

(3.9)

where $\lambda$ is the only unknown -- all the other values are known. The single indeterminate can be found with identical procedure.

A player's history of games is a vector $(x_{1},\ldots \, x_{N})$ of win/loss results, obtained against opponents with known RS's $\lambda _{i_{1}},\ldots ,\lambda _{i_{N}}$ , respectively. Eq. (3.9) can be solved iteratively, using a ``sliding window'' of size n<N, to obtain strength estimates for $(x_{1},\ldots \, x_{n})$ , then for $(x_{2},\ldots \, x_{n+1})$ , and so on. Each successive value of $\lambda$ estimates the strength with respect to the games contained in the window only.

With this window method we can do two important things: analyze the changing performance of a single player over time, and, putting the games of a group of players together into a single indeterminate, observe their combined ranking as it changes over time.

Altogether, the paired comparisons model yields:

A performance scale that we have called Relative Strength (RS). The zero of the scale is set arbitrarily (to the one of a fixed sample player: agent 460003).
An ordering of the entire set of players in terms of proficiency at the game, as given by the RS's.
An estimation, for each possible game between two arbitrary players, of the win-lose probability (eq. 3.7). With it, an estimation of exactly how much better or worse one is, as compared to the other.
A way to measure performance of individuals or groups over time.
A possible fitness measure: the better ranked players can be chosen to survive.

Paired comparisons statistics are an open research area. Desirable improvements include: better accuracy, lower computational costs, estimation of error, and time sensitivity. New models such as Glickman's [54] may offer improvements on those areas.

Next: Analysis of Results Up: Statistical Relative Strength (RS) Previous: Statistical Relative Strength (RS)

Pablo Funes
2001-05-08