Given the techno-centric nature of this site, I'm surprised that in the
design of the the current system, as well as in the proposals to fix it, I
haven't seen many ideas based on statistical methods. Of course, I could be wrong in my criticism; I don't claim to be anything close to an expert.
The following is an idea based upon but an elementary understanding of the
practice of statistics. Please correct any of the mistakes which I have
The basic flaw with the current system is that it doesn't seem to be
well thought out from a statistical point of view. The concept of thresholds
being a percentage of the total number of users completely ignores the Law
of Large Numbers. Many people seem to not realize that a sufficiently
large random sample of users could represent the "average" opinion
of all K5 users, to an arbitrary confidence level. So even if K5 had a
million users, votes from a random several hundred would be quite an accurate
indicator of the general opinion! Thus I think there is no reason to use any
kind of increasing thresholds (and good reasons not to).
Something else I don't understand is that the current system
doesn't take into account statistical trends. Suppose rusty posts a story
that everyone can agree belongs on K5. If 14 out of the first 15 voters
want the story posted, then we can say with 99% confidence that more than 75%
all K5 users want the story posted. But the system naively waits for the
number of votes needed to reach a threshold, which is especially bad for
time-sensitive stories, such as news.
My final complaint before I propose a solution is that there are different
post and dump thresholds. If a story's score rises at a certain positive
average rate over time, it will likely be posted; won't increasing the
threshold above a certain amount just delay the inevitable? The reverse
seems to occur for small dump thresholds--the stories that are dumped
quickly with small negative thresholds would probably have been voted down
anyway. N.B. I could be wrong on this point, though.
There are a few other, more minor problems, but I think you get the idea.
So it's easy to spout criticisms of a working system, especially when
some of them might not even be valid. What can be done to fix it? (Sorry
about the lack of symbols in what follows; it's difficult to do math in
HTML. :-( ).
Suppose that for a given story, there is a certain proportion of all
K5 users who want the story posted
(# of supporters) / (# of users eligible to vote); call it Pi. So a proportion
of 1-Pi don't want it posted. (From here forward, the votes of users who
"don't care" will simply be ignored for the purposes of scoring
posts, which seems to be the most natural choice.) Because of an unfortunate
disparity between the languages of K5 and statistics, the "post"
votes will have a value of one, while "dump" votes are represented
It seems reasonable to post a story if and only if at least a certain proportion Pi_0 of the K5 voting population wants it posted (i.e. Pi > Pi_0). Of course, that proportion could be easily adjusted (perhaps on a per-user basis?!) to affect the quality of the stories posted. Let us also assume that people who vote represent a random sample and that votes are not
correlated with the time at which they are submitted (not really true,
but I believe close enough to the truth). Say n people vote, and call the
proportion of voters who want the story submitted (i.e. (# post)/n) p_hat.
Then it is relatively simple to use p_hat to test the null hypothesis that
Pi is greater than Pi_0. If the null hypothesis is accepted, then the story
is posted; otherwise, the hypothesis that Pi < Pi_0 is tested;
if that hypothesis is accepted, the story is rejected (confused yet? :) );
otherwise, voting continues.
Glossing over much of the statistics, the Central Limit Theorem tells us
that p_hat is distributed normally,
and it is not difficult to show that the average of the numerical scores
(remember, post=1 and dump=0) is p_hat and the standard deviation is
sqrt(Pi*(1-Pi)/n), which for deep statistical reasons we approximate by
sigma=sqrt(p_hat*(1-p_hat)/n). Then the z-score is
(Pi_0-p_hat)/sqrt(p_hat*(1-p_hat)/n). The z-score represents a value on the
standard normal curve, and an area under the curve represents a probability.
We can thus ensure that our method is accurate 95% of the time by comparing
the z-score with z_star=1.96. Perhaps better would be 99% confidence, which
corresponds to a z_star of 2.58. The drawback to a higher confidence level is
that more votes are required to reach that confidence. If the z-score is
greater than z_star, the story is posted; otherwise the process continues as
Well, sorry about the long post, but this is just an overview of my
thoughts about the K5 submission system. I can think of a number of problems
with a system like one that I've described, but I think I've written enough
for now and others can probably think of better objections anyway. Any
comments on whether such a system would be feasible would be
especially appreciated. While the math and statistics may have errors, those
can be fixed; problems with the assumptions that the system makes and
applying it to K5 could be more challenging.
Or anyone else, but this seems to happen to him most frequently. :)
Conversely and perhaps more importantly, this criticism also applies
to spam in the queue, unless an admin deletes it.
This convention might or might not be reflected in what the voter actually
sees. There wouldn't be any need to change the 1/0/-1 convention except
in the code, but I am attracted to the "all posts are assumed to
contribute" aspect of a 1/0/"don't care" system.
Note that once the required precision is reached, the story will be posted
or dropped. So if we are "certain" that 55% of the users want it
posted, but the threshold is 60%, then the story is dropped. That might seem
obvious, but it is different from the current system in that the same
criterion is used for both accepting and rejecting.
The method I present here uses the Wald normal approximation to the
binomial distribution, which is considered valid when min(np,n(1-p))>5.
I would have preferred to use the more exact Clopper and Pearson method,
which is based on the binomial distribution, but I must confess that I can't
remember a few of the details of that method and I don't have a statistics
book handy. :-( In practice, the results should be virtually identical
regardless of which method is used.