One of the most common complaints about K5's story moderation system is
that it approves the wrong number of stories: too many go to the
front page, or too few; too many get posted at all before being sent
back for a re-edit. Currently, k5 posts or dumps stories based on their
"score", which is the absolute number by which "post it" votes exceed
"dump it" votes; when the score goes above a certain positive
threshold, the story is posted, and when it goes below a negative one
it is dumped. So, it seems that by adjusting these thresholds, we
should be turning the faucet of stories up or down, and thus keeping
the balance of the front page about right. Currently, this happens to
some extent dymanically, as the threshold is based on the size of the
But it doesn't work: turning the faucet barely affects the flow of
posted stories, except briefly. This seems counterintuitive at first,
but if you vote on stories regularly you'll have noticed that you can
generally tell within the first few votes whether a story is going to
go forward: a positive early score means success and a negative one
means failure. This is because these early scores reflect the way
people are tending to vote, and these tendancies are reflected in
later votes: a small positive score will always tend to result in a
big positive score if you wait long enough.
This means that the flow of stories posted cannot effectively
be regulated with simple faucet adjustments. Further, a story that
only 51% of K5 voters like (such as "Profanity Reconsidered") will
go forward eventually, after a very long wait in the queue, while
other much more popular stories wait to achieve the necessary score.
The root of the problem is the use of an absolute score, counted in
votes, as the determiner of a story's popularity. The scheme is
simple, but fixing the problems it raises is difficult. Much better
would be to determine the popularity of a story based on the
proportion of yea to nay votes. This genuinely does give the
thresholds extremely fine control over the rate of story approval.
Better yet, we can choose the threshold in software based on the rate
at which we'd like stories to be approved.
An Alternative Moderation Scheme
In the simplest scheme, we have three thresholds: a quorum, a post
threshold, and a dump timeout. A story goes forward if the quorum and
the post threshold is met, and is dumped if it stays in the queue for
more time than the dump timeout. However, there are many ways we can
The first question is: how do we select these thresholds? We
could just set them, and tweak them as necessary. This would
work perfectly well for the dump timeout, but for the other two
parameters, especially the quorum, this could mean a lot of tweaking.
Instead, we can use rolling averages of story behaviour to get much
more appropriate thresholds: the quorum is twice the rolling average
of the number of votes stories in the queue received within, say,
three hours. This should ensure that stories tend to queue for about
six hours before reaching quorum. You could, of course, declare a
quorum after exactly six hours, but this could have problems:
at quiet times, stories could receive disproportionately few votes
after six hours and go ahead when a larger vote would have rejected.
Similarly, we can decide a rate at which we wish stories to be posted,
and automatically calculate a threshold which would achieve that based
on average behaviour over the past, say, week; this is more complex,
but the principle is the same.
We might improve things still further by paying close attention to the
voting patterns. Stories receive a flurry of votes not long after
posting, which tail off over time. Rather than setting strict quorums
and timeouts, we could look for the "knee" in the voting curve which
indicates that voting has tailed off, perhaps by artificially keeping
a story in the queue if it is still receiving n votes per
hour. This is a little like waiting for popcorn to be ready:
you can't always guess in advance how long it will take, but you
know when the frequency of the "pops" is low enough.
However, none of these more complex proposals are needed to make
proportion-based voting work: the basic scheme with which I started
would be straightforward to implement, easy to tweak, and vastly
superior to the one we currently use in a variety of ways.
Well, unless it isn't. Is it?