I don't think you realize the size of the dataset Google deals with.
It's just not practical to keep a variable on each indexed page.
As it is, Google recomputes its weights each month over a period of several days. We could at least expect the same with your scheme. But I'm not sure you'd
be too happy with a voting system with a one month lag.
Here's what Google does in a nutshell, (ok, it's speculation ;-):
the crawlers produce unordered compressed repositories of web pages.
If you take a look at the Google contest web page, you'll see what I mean.
The pages can't be ordered according to domain when they come in, since crawling a whole site at once is bad form, and often triggers retaliation by web admins.
Next, the unordered repositories are distributed over lots of computers and iteration begins. Now the PR equation a sparse matrix equation, but nevertheless at several billion webpages, you're looking at a lot of computer communication to synchronize the calculations. Since the repositories are initially unordered,
I'd expect n^2 communication between the number crunching computers, although
this _migh_ be reducible by reordering the web pages in the repositories.
However, that would amount to sorting the data according to the shape of the web graph, which isn't so easy either. Bottom line is the calculations are a major pain in the ass, and not as highly localized as one would expect.
After the pagerank and other weights have been calculated satisfactorily,
the servers receive little chunks of it. The sorted list of documents is associated with each valid keyword, at least to within the first few results (ever noticed that you can never read the 87,000th result in a Google search? They'd be inane to put it in their list). In this way, the servers can quickly look up
the sorted lists and present you with results quasi instantaneously.
Of course, some web queries are a lot more popular than others. I expect Google have a mechanism for replicating the servers which hold popular indexes, to reduce the load on the hardware. This is something that's going to take a lot of time to propagate, probably a couple of weeks.
So why is this incompatible with user defined weights? In principle, a vote
on one page can change the relative rankings of a whole lot of them. The only safe way is to recompute the whole pagerank and redistribute sorted lists to the servers. Given the difficulties outlined above, I'd expect this sort of thing about once a month, for the same reason that Google already recompute once a month and not more frequently.
If you're happy with a lag of a whole month, then I guess it's doable...
[ Parent ]