A word of introduction -- I'm one of the people who's had a lot of
influence in designing the Mojo and moderation systems at K5 and in
Scoop. When I first stumbled over the site a year ago, I realized it
had the potential to avoid a lot of the mistakes which Slashdot had
stumbled into -- rather nice of them to do so, as they showed us what
minefields to avoid in the process.
The following is abstracted from an email exchange between myself,
Rusty, and a K5 participant, and a few other random documents. Though
I'm not distinguishing among them, my foil in this discussion is
actually a composite personality.
I believe this should help address some of the design intentions,
pitfalls to be avoided, concessions, and limitations, to the Scoop
I'll also add that my own comments are rated over a wide range, some
of my stories get posted, some don't, and I've got my own gripes about
the system. But I think it's fundamentally right.
Fairness, Group Think, Filtering Tools
The issue of "fairness" is such a sticky one because it's such a
The biggest problem I see with ratings is that people really are
using them to push ideological views and personal biases at the
expense of those their marginal views.
I think it's deadly, I think group think is a cancer.
Think of this as a feature, not a bug. First, people will do this,
it's human nature. The goal then is to come up with a system which
identifies posts in which agenda-pushing (or just plain radical
disagreement between two or more camps) exits. IMO this will be
apparent in the statistics behind the post -- standard deviation, a
measure of variance, will be large for these posts. Computing and
displaying this need to be implemented, but the problem can be
K5 moderation is an amalgam of many things. Both personal opinion,
and personal response to differing opinion, are going to be part of it.
I will moderate down posts I feel are just plain dumb, while I'll give
credit to a thoughtful response to views similar to mine. Depends on
context, mood, etc.
Similarly, K5 users cannot be authenticated -- there's no way of
preventing me from creating multiple accounts, or sharing a single
account with multiple people. While there are systems which deal with
the issue of strong authentication (e.g.: online voting schemes), the
authentication step is assumed.
There is no technical fix which is going to change these facts,
they are a features of the system. Collaborative filtering and positive
incentive mechanisms must take them into account as givens.
There's a distinction between group-think (everyone voting the same)
and a difference in opinion resulting in a strong degree of disagreement
with a comment's true value. The latter, as I said earlier, will be
apparent with additional statistics such as standard deviation as a
conflict measure. This was part of my original suggestion for the Scoop
moderation scheme, and continues to be a lapse of the system.
Specifically, I'd like to see the following statistics. Note
particularly how I term them -- these are proxies for some real
quantity, however, they can only approximate (and sometimes poorly) that
- n: Number of moderations. Proxy for interest.
- mean: Average value of moderations. Proxy for "value".
This is what moderation currently provides.
- std dev: Standard deviation of moderations. Proxy for
...adding the ability to filter according to complex rules (show me
only highly rated or highly controversial stories with more than 5
ratings, and all stories with fewer than 5 moderations), will product a
system which can both support high S/N and be relatively free from abuse
of various sorts.
Another means to the same goal is to allow inclusion or exclusion of
specified users opinions in your rating scheme -- essentially coming up
with a "buddy list" of editors. This could be a manual or automated
process, or a bit of both.
Yet another idea I like immensely is the ability to apply a personal
bonus/decrement value to a member's score. This is more flexible than
blacklisting (essentially slapping a -5 on a user), and could itself be
used to provide feedback to the system.
The problem with conferring trusted status upon people who write
good posts is that it does not follow that a person who writes well
is trustable (demagogues; did you read Ender's Game? Locke/Peter)
and it does not follow that a person who does not write well is
This is called in managerial contexts as "The folly of rewarding
one behaviour to get another."
Very true. It's a complex problem of itself, and I've been thinking of
ways in which it might be improved. Essentially, you've got three
behaviors at K5:
- Writing (both stories and comments)
- Moderating (again, stories and comments)
Moderation works reasonably well. There needs to be (IMO) more of it, but
of generally quality moderations. What's a quality moderation? Good question.
- One which helps establish the status of a post. Early
moderations or moderations or moderation to posts which have few or
no moderations count a lot.
- Moderation patterns which tend to agree, within reason, with the
group consensus. Someone who's consistently an outlier probably
isn't feeding much signal to the system.
- Moderation patterns which don't slavishly follow group
trends. Moderation works best by increasing differentiation between
comments (preferably in a meaningful way). Note that a moderate
devation from the norm is quite healthy.
- Patterns which aren't pedantically consistent. The guy who mods
all 1s or all 5s isn't adding much to the system.
- Clustering -- moderators who cluster very strongly together,
particularly in an absence or negative sense to outsiders, may be
trying to game the system, and should at least get some
What's the goal? Maximize content quality. This means encouraging
both good writers and good moderators. Your population is composed of
those who write (who may also moderate), those who only moderate (but
don't write), and those who read. We can ignore the problem of
trusting Readers, simply because they don't participate in the system.
I would, however, like to be able to both assess moderation quality, and
find a way to reward those who moderate well, though they may not write.
The current Mojo scheme is a bit of a hack, but it seems to produce
mostly good results. The real key is to identify the outlier Really
Bad Elements that come along, and quickly. Trust and trusted member
priviledges/rights are a possibility, but really a secondary goal.
Moderation Frequency, Reputation Attacks
Since not enough readers rate comments a few poor ratings can
destroy an honest contributor's "Trusted" status.
This is a bug in the system. Moderation by small numbers of people
should have less influence than moderation by large numbers of people,
all else being equal. The Mojo calculation algorithm needs to be
modified to accommodate this. Rusty and I are still discussing why and
Encouraging moderation is another issue. Ultimately this feedback
should also be built into the system so that people have an incentive to
contribute quality moderation.
This is exactly the same problem with "karma whoring" on
Slashdot, those who know how to emotionally motivate moderators get
Crucial distinction, frequently iterated.
- Slashdot karma is cumulative, confers special powers, and
tends to be self-reinforcing. The fact that both karma and
moderation at Slashdot had to be artificially constrained (1-50 and
1-5, respectively), should be a clear indicator that something is
- Scoop moderation has several crucial properties:
- It is bounded, restricted to a range of 1-5.
- It is convergent -- with more moderation, a single final value
emerges from the noise.
- It is independent of the number of people moderating -- a
comment's moderated level can range from 1 to 5, regardless of
whether one person has moderated or 100.
- It is continuous. By allowing fractional values (in reality, two
decimal points), Scoop allows for many distinct moderation values,
making arbitrarily fine distinctions in ranking possible (though
not necessarily meaningful).
Mojo shares many of these attributes, but modifies the
convergence rule: it is short-term convergent (in the short term,
more moderations tend to converge toward a value), but weighted
toward recent activity. It also does *not* convey special
privileges in posting comments (e.g.: boosted score).
Slashdot karma allows "whores" to accumulate a quantity which is not
immediately responsive to current behavior, which tends to reinforce the
score (more highly moderated posts tend to get more moderation,
moderation score is not independent of number of moderators,
quite contrary), and abusive behavior becomes possible. While it's
possible to get an elevated K5 status, it takes continued effort to
keep it -- and the powers conferred by it are rather thin.
I've explained bits of this at the Scoop website somewhere, in a
followup comment to an article on moderation. (Unfortunately, this
comment did not survive the relaunch of Scoop to a new site -- the
comments database was corrupted).
Slashdot False Negatives (Undermoderation)
Because of this I've given up on reading Slashdot with a
threshold set and read it uncut at -1; there are simply too many
gems which sit at 0 or 1;
See above discussion of K5 moderation. Malda claims Slash moderation
is effective because setting a cutoff of +2 or +3 screens out bad posts.
And yes, the false positive incidence is low. However, the false
negative incidence (high quality posts with low moderation) is very
high. Moreover, the likelihood of a post, however worthy, of being
moderated up to a viewable level falls drastically as a discussion ages,
discouraging additional contributions to the topic.
Moderation Quality -- Factual/Technical
often technically accurate posts which never gain moderation
approval. And this is the part where I vigorously disagree, Rusty.
Those statements which can be verified as accurate and factual,
technically or by reference, should always gain a higher
rating than those which simply promote a personal opinion. If one
can defend their unpopular views with a strong argument backed by
references, as in academic debate, IMO this should always have an
advantage over simple personal opinion.
Agreement in goals, disagreement in degree. I'd say that the
factually supported arguments should tend to have higher ratings. I'll
allow for occasional lapses and variance. Imperfect worlds, and all
that. Unfortunately, a rigid implementation of this sort of rating
requires some sort of recognition of a moderator's knowledge and
authority in a field, and must recognize that a strong background in one
area doesn't necessarily translate to others.
About bias vs. objectivity in rating: I think the main problem here
is that comment rating serves two somewhat unrelated purposes. We
each see one of them as more important which colors our view of how
rating should be done. Here are the two basic things rating does:
In one sense, rating comments simply provides an ordering
That is not to say that the dissenting view should be suppressed,
especially a good expression of it, merely that one facet of rating
can and, I think should, highlight the view of the "community as a
whole". The only way to determine what that is, is if people do rate
partially on the basis of agreement.
From this perspective, "Karma Whoring" is not a problem, because
the basic mode of the karma whore is to define and express the
majority view of the community.
IMO the main issues with K5 are different. But that's just my
The other main purpose of comment rating is to provide an
"objective" idea of how much commitment to discussion an individual
has, in order to select those who have the highest commitment to
good discussion, so that they can be provided the tools to help
administer the discussions and keep them high-signal. "Trusted"
status is determined by a combination of average rating ("Mojo"),
and number of comments contributing to that rating. Trusted users
must maintain a high average rating across a reasonable number of
comments in order to be trusted.
These criteria should, IMO, be expanded.
This use is the one your view focuses on, and in this sense it is
clear that rating according to bias is a terrible thing. In this
case, there is no link at all between reflecting the consensus view
and being a good contributer to the site. One user may clearly and
consistently disagree with the majority view, and be the best
candidate for trusted status in the world, but if everyone rates
according to agreement alone, they will never be able to assist in
the way they should, because they will never become trusted. The
danger here is that by rating according to personal bias, only
readers who agree with the common view will be trusted, and at some
point, they may mistake their trusted status for a general ability
to suppress the unpopular view, and start rating comments they
disagree with down below the normal threshold.
So, if bias were the only relevant factor, the system as designed
is deeply flawed, because it would ultimately lead to a massively
bovine state of groupthink, where any dissenting opinion is
suppressed. Obviously, we don't want this to happen.
So, the question is, how do we balance these two sides of the
coin? On the one hand, some rating should be based on agreement,
IMO. It does serve the purpose of clarifying the majority view. On
the other hand, some rating must be based solely on the merits of a
comment as argued, and not on whether the rater agrees or disagrees.
The hope is that some people will see it one way, some will see
it the other, and perhaps some will see it both, or not think about
it at all, and just go on gut feeling. On the whole, then, I expect
the camps to balance each other, and strike a good balance overall.
This is the "good enough for now" solution, and I tend to be minded
This may be simply wishful thinking on my part. Your fear is that
the bias-voters will overwhelm the objective voters, based on
knowledge of human nature. I don't know if this is true or not, and
I have seen cases where comments have clearly been rated according
to bias alone. I've also seen cases where well-expressed comments
that utterly fly in the face of the consensus opinion are rated
highly, presumably based on their merits as writing.
I have no objective idea whether one side is "winning" or not. I
do know who is trusted, and some of them might surprise you.
[Specific user reference deleted - and no, it's not Sig11]
Several [...] well-known "dissidents" are also trusted users. I get
a daily report of who is trusted, and if anything, dissenting views
are overrepresented; i.e. there are fewer dissenters (by
definition), but almost all of the known and consistent dissidents
are in fact trusted. This is the main reason why I am not yet as
concerned as you about the trends in rating.
Good information. Some sort of posting of this to the site might be
useful, though I don't know how it would best be accommodated. How do
you identify "dissenters", BTW?
However, it is very wise to consider a potential problem before
it becomes a real problem. So, are there ways that the rating system
could be "fixed" to clarify and perhaps separate it's roles?
I agree with you that one really good possibility is to make the
rating system as transparent as the voting system. This has been an
idea from the start, and is very likely to appear soon. Having a way
to see who rated what comment and what the rating was would, I
think, tip the scales in favor of objective rating.
Also opens the door to retaliatory moderating...but, if that
increases participation, it might not be all bad <g>.
Bias Solution -- Multiple Moderation Dimensions?
Another idea would be to divorce the two roles entirely. Have two
rating choices, one which rates agreement (thus providing the data
for the site's "consensus reality"), and the other rating clarity
and quality of expression, thus determining who is likely to be a
good trusted user.
Ah, you technologists are all the same. Still, the naivete is
touching ;-). What's to say that a person wouldn't take the opportunity
to knock an objectionable post down twice, or an agreeable one up? You
still have to trust the rater. A single-unit metric means the statistic
has to be assumed combined, and you're counting on honesty (or
proportional representation) to work things out. A "controversy" metric
(std. dev.) would also help. Probably immensely.
I'm not sure where you get accuracy. There is no built in means to
detect or correct an error in fact.
I believe Rusty was referring to accuracy in measurement, not
content. Still, it's a known limitation of K5. Adding a metric for
"validity" only sidesteps the problem: you're now assuming that the
moderator has the background to make the judgement.
Ultimately, IMO, truth, goodness, accuracy, interest, humor, etc.,
are emergent within a metric called "value". Typically, an utterly
incorrect statement will generate a counterclaim, often with supporting
evidence (usually as Web links). As new readers (or those who've
already moderated -- you can change your vote) mull the evidence, the
truth is taken into account. Ultimately, it would be nice to have
recognized experts in a field, but then you'd need to categorize
comments by field of appropriateness. Weighing unfavorably on the
accuracy v. complexity scale.
And Some Words About Paradise
I think k5 is "lucky" to have such a high signal to noise ratio.
Lucky in the sense it was formed by a good team. Rusty and the rest
of the cabal, and Scoop, have done a really good job.
But also consider that Slashdot has greatly improved in quality
since the majority of detractors have moved to their Shang Ra La at
That's Shangri La, BTW, from Lost Horizon by James Hilton, 1933.
Interesting you should bring it up: "Everything in moderation, even
A Slightly Non-Sequitor Footnote:
Very interested. Have you read Lawrence Lessig's
and Other Laws of Cyberspace? I'm in the middle of that
right now, and it's really helping me clarify how online
communities (and K5 in particular, of course) operate,
I read this book about a year ago, it also influenced my thinking in
developing the moderation system at K5. Highly recommended. I'll do a
capsule summary on request (probably as a K5 post).
Karsten M. Self
SCO -- backgrounder on Caldera/SCO vs IBM
Support the EFF!!
There is no K5 cabal.