next up previous
Next: Analysis of Results Up: Statistical Relative Strength (RS) Previous: Statistical Relative Strength (RS)

Paired Comparisons Analysis

The goal of paired comparison statistics is to deduce a ranking from an uneven matrix of observed results, from which the contestants can be sorted from best to worst. In the knowledge that crushing all the complexities of the situation into just one number is a large simplification, one wishes to have the best one-dimensional explanation of the data.

Each game between two players (Pi, Pj) can be thought of as a random experiment where there is a probability pij that Pi will win. Games actually observed are thus instances of a binomial distribution experiment: Any sample of n games between Pi and Pj occurs with a probability of

P(\text {sample})=p_{ij}^{w_{ij}}(1-p_{ij})^{n-w_{ij}}
\end{displaymath} (3.5)

where wij is the number of wins by player Pi.

We wish to assign a relative strength (RS) parameter \( \lambda _{i} \)to each of the players involved in a tournament, where \( \lambda _{i}>\lambda _{j} \)implies that player Pi is better than player Pj.

A probability function F such that F(0)=0.5 and F(x)=1-F(-x)(for all \( x\in \Re \)) is chosen arbitrarily; following [73] we use the logistic function

\end{displaymath} (3.6)

The model describes the probabilities pij as a function of the RS parameter \( \lambda _{i} \) for each player:

p_{ij}=F(\lambda _{i}-\lambda _{j})
\end{displaymath} (3.7)

so the outcome of a game is a probabilistic function of the difference between both opponent's strengths. The conditions imposed on F imply that players with equal strength are estimated to be equally likely to win or lose, and that the probability of Pi winning is equal to that of Pj losing.

The observed data is a long sequence of games between opponent pairs, each one a either a win or a loss. According to eq. 3.7, the probability of that particular sequence was

P=\prod _{i,j}F(\lambda _{i}-\lambda _{j})^{w_{ij}}\left( 1-F(\lambda _{i}-\lambda _{j})\right) ^{n_{ij}-w_{ij}}
\end{displaymath} (3.8)

for any choice of \( \lambda \)i's.The set of \( \lambda \)i's that best explains the observations is thus the one that maximizes this probability. The well known method of maximum likelihood can be applied to find the maximum for eq. 3.8, generating a large set of implicit simultaneous equations on \( \lambda _{1},\ldots \lambda _{M} \) that are solved by the Newton-Raphson algorithm.

An important consideration is, the \( \lambda \)i's are not the true indeterminates, for the equations involve only paired differences, \( \lambda _{i}-\lambda _{j} \). One point has to be chosen arbitrarily to be the zero of the RS scale.

A similar method permits assigning a rating to the performance of any smaller sample of observations (one player for example): fixing all the \( \lambda \)i's on equation (3.8), except one, we obtain

\text {wins}=\sum _{i}F(\lambda -\lambda _{i})
\end{displaymath} (3.9)

where \( \lambda \) is the only unknown -- all the other values are known. The single indeterminate can be found with identical procedure.

A player's history of games is a vector \( (x_{1},\ldots \, x_{N}) \) of win/loss results, obtained against opponents with known RS's \( \lambda _{i_{1}},\ldots ,\lambda _{i_{N}} \), respectively. Eq. (3.9) can be solved iteratively, using a ``sliding window'' of size n<N, to obtain strength estimates for \( (x_{1},\ldots \, x_{n}) \), then for \( (x_{2},\ldots \, x_{n+1}) \), and so on. Each successive value of \( \lambda \) estimates the strength with respect to the games contained in the window only.

With this window method we can do two important things: analyze the changing performance of a single player over time, and, putting the games of a group of players together into a single indeterminate, observe their combined ranking as it changes over time.

Altogether, the paired comparisons model yields:

Paired comparisons statistics are an open research area. Desirable improvements include: better accuracy, lower computational costs, estimation of error, and time sensitivity. New models such as Glickman's [54] may offer improvements on those areas.

next up previous
Next: Analysis of Results Up: Statistical Relative Strength (RS) Previous: Statistical Relative Strength (RS)
Pablo Funes