I was at the International Workshop on Computational Economics and Econometrics last week, which I help to organise and is held in Rome each summer. For the second year, I’ve put up a BayesCamp prize for the presentation with the clearest exposition. Last year, I got everyone to vote at the end of the three-day workshop, but was suspicious that recall bias had pushed all the votes toward the most recent talks. Also, anyone who had to leave early didn’t get a vote.
So, this time I had what appeared at first to be a Good Idea: I would put out a voting sheet at the end of each half-day session. Then, it occurred to me that differing turnout in each session’s votes meant that we couldn’t just award to the talk with the most votes. Nor was it obvious that we could give it to the one with the highest proportion of votes within their session, because you end up making it easier to win if you are in a low turnout session. I wanted something that would give me the desired unknown — the chance of being the best in the whole workshop — but there is no information to allow us to judge the between-session quality, only the within-session quality.
I thought about doing something Bayesian. I considered having a beta distribution for each talk, but this gets you into the low-turnout / high-posterior-variance bias. If there were only 2 votes in a session, and you got 1, you could still look better than someone who got 10/20 in their session just because of the wider beta distribution. (Luckily, there was a clear winner in terms of vote share (proportion) within session.)
I looked up electoral systems online (goodness, what a mess they are) but didn’t find any mechanistic way of dealing with it apart from having an electoral college of the organisers picking the final winner from the short-list for each session. That seems unsatisfactory to me (though it might be what we do in future; at least if we tell people that’s the method up front).
What I actually used was a multinomial logit in each session, with flat priors on the log-odds parameters. Those parameters get softmaxed into the multinomial probabilities, and one parameter in each session is fixed to zero so that it’s identifiable. That rather gives the game away: we should just be fixing one for the whole workshop.
Here’s the desirable but currently unattainable data generating process. There is a latent unobservable talk quality for each talk, and therefore a rank of those qualities. Those are related in some monotonic way to the probability of getting approval, and the observed votes are derived from those probabilities, plus-minus noise. You can identify the highest quality in each session and relate that to the votes for that session, but it remains under-identified because several combinations of latent qualities are compatible with the same data; it’s only the relative ranks within each session that matter, not rank across the whole workshop.
And here’s some ways of getting there:
- primaries and final elections; if we are decent judges of quality, we’ll arrive at the right winner, if nothing else, but it is tedious to have to wait for final session results and then have another vote while some participants are heading for the airport and the organisers are saying thank you
- visual analogue scale (VAS) or Likert rating for each talk; this gives us lots of information but is burdensome on the participants
- electoral college; if we are decent judges of quality, we’ll arrive at the right winner, if nothing else
Mathematically, the VAS would be best, but in reality people won’t fill it out. They are not there to vote for my silly prize, nor do they want to offend their fellow participants, and it won’t be 100% confidential. I think we must therefore go for the dreaded electoral college.
This was one of those little mental puzzles in probability and analysis which appear simple, turn out to be tricky and lacking in a universal solution, and quite educational.