Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Open Who's Whom Database

By Eloquence in Culture
Mon Oct 02, 2000 at 01:50:55 AM EST
Tags: Technology (all tags)
Technology

It is often difficult to keep track of important people's background. Say, you want to find out who Judge Kaplan has worked for before the DeCSS trial (the movie industry), or about the Bushes' involvement in the oil industry. You can start with a search engine, but often you have too much information or too little. The existing Who's Who biographical databases are, to my knowledge, all commercial (Marquis Who's Who, for example). Besides, you will only find mainstream-compatible information there, often provided by the biographed people themselves. If an important banker is a member of the catholic Opus Dei sect, this will hardly be mentioned in an official biography. Such information is usually "hard to find", but it is obviously very important for our individual decision making processes.

How can we make it easier to find? I suggest the creation of a distributed semi-anonymous Who's Who database that could eventually be extended to become a distributed "Everything"-type database.


In a ShouldExist article, I have tried to explain basically how a distributed dynamic database could work. Generally, I envision a Gnutella-style broadcast protocol with automatic caching. Everyone has a local store of shared profile data, which can be queried on different fields. Searches are routed to a number of neighbouring clients (perhaps not in a Gnutella-like chaotic structure but instead in the more organized "Grid"-like fashion that Ben Houston describes in this paper). Some may argue that the scalability is too limited, but that is untrue. The scalability is obviously limited, but that doesn't necessarily hamper the operation of the network. With a limited TTL, the results only reach a limited number of neighbours, thus defining the maximum traffic caused by the search broadcasts (note that the current traffic on the GnutellaNet is mainly caused by pings and pongs, used to find all hosts on the network, and push requests, used to initiate transfers through firewalls, both could be made much more efficient).

The results can be transferred back directly to the querying IP number. This makes it easier to detect flooding (and reduces network load) but gives away the anonymity of searches. However, we still maintain a certain degree of anonymity by caching all data upon receipt. Thus, you don't know if the IP you download from is the IP of the author (in a small network, it will not be too hard to figure out, though).

One of the trickier aspects would be source authenticity. The problem is cryptographically solved through public/private key encryption and digital signatures. However, maintaining a directory of users and their public keys is not quite as easy if you do not want to rely on a central server. But perhaps this is not necessary: It might be enough to add the respective information to each database entry returned, thus, if you have one entry by a certain author, you can easily identify other entries by the same author (making identity fraud impossible).

Now say I want to create an entry about Jack Valenti. I store the respective data (for example, information about the nice parties he throws for "his" politicians) on my node and login to the network. Now anyone who queries for Jack gets results from anyone who provides this respective information (with timestamp and perhaps cryptographically signed+public key). The timestamp helps finding the most current version of a certain document.

Fine-tuning would include making the network searchable according to all thinkable different criteria, client-side indexing etc.

As an implementation platform, anything but Java or Linux-only would be fine. The "client" would have to be easy-to-use and easy-to-install for Windoze users. (The size of the network is essential to its success.)

Why doing this in a separate network instead of using an existing one? Because the existing distributed networks work according to different principles. Freenet routes all data across several clients, and it remains to be seen how scalable this approach will be in an environment that is speed-heterogenous with many dial-up users between busy routes (as Freenet is currently mostly used by "power users", this doesn't seem to be much of a problem yet). The same is true for Blocks. Gnutella has serious flaws in the protocol (too many pings/pongs, push requests, too many different client implementations) and is too busy for a specialized task such as this one. JungleMonkey is Linux-specific. The fact that the files this network would provide would generally be very small while the mentioned systems deal with sharing of very large files (MP3s or movies) also calls for a separate, light-weight network. Something that could run in the background without hindering your normal Internet use.

And now, back to the cultural aspects of such a network: At first, it would make it very easy to share information about politicians, economic leaders, journalists, programmers etc., people could even use it to distributed their own personal records. This information would be of interest to us all: When voting or deciding which product to buy or which organization to support. I don't perceive it as a threat to privacy. Most information that could be shared would have to be popular to be distributed over a large number of nodes and to be visible to all users.

While searching would be non-anonymous in the approach outlined above, providing files (especially popular ones) would be relatively anonymous. As the network would grow, it could be extended to allow the storage of different kinds of information, for examples, the behavior of certain corporations, lyrics sheets, guitar tabs (all information which has been censored on the WWW in the past). The database used should best be as extensible as possible (XML-based?) .

Why start with a task like the "Who's Who" and not make it more general from the start? In order to avoid confusion. If everyone starts their own database schemes immediately, things will get wacky. Perhaps some centralized elements could be used: an expendable "closest-node-finder" (reducing the necessity of pings and pongs on the network) and perhaps a "datasheet server" which defines the database structures that can be read from and written to. While clients could route "alternative" databases (similar to the alt.* usenet hierarchy), especially within private networks, the central server would define a general consensus. (Should it get shut down, this wouldn't be a problem, existing clients wouldn't be affected and new ones would simply d/l the sheets from somewhere else.)

Why do I submit this to the K5 community? Because I want to get the biggest potential feedback from a tech-oriented readership and /. doesn't usually publish articles this long unless they are written by Jon Katz ;-). While I would love to choose such a database system as a university project, I currently cannot do much as regards its implementation, and I feel the time is critical to show the potential of Gnutella-like technology for freedom of speech. I would like to cooperate with others on it, but I can't do the main work.

So what do you think? Would you use such a database? Are there serious flaws in my modest proposal? Is there perhaps even an easy-to-use, open-source free system that could be immediately used with minor modifications?

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
An open, distributed Who's Who DB is ..
o a good idea, I would love to use it 27%
o a good idea, I would love to code it 13%
o a good approach to the wrong problem 7%
o a bad approach to the right problem 13%
o a bad approach to the wrong problem 13%
o Who is Edna Graustein? 23%

Votes: 65
Results | Other Polls

Related Links
o Marquis Who's Who
o Everything
o ShouldExis t article
o this paper
o Also by Eloquence


Display: Sort:
Open Who's Whom Database | 45 comments (42 topical, 3 editorial, 0 hidden)
Moderation would be crucial (4.40 / 10) (#2)
by zavyman on Sun Oct 01, 2000 at 11:30:14 PM EST

What you are proposing is a network similar to what Freenet is trying to accomplish, where information gets stored in this meta-network shared by many computers throughout the world.

The goal, as I understand it, or your project is to gather a database of rumors / facts that are believed to be true. For this database to work out, there would have to be extreme protection against spamming. After all, if it were possible to submit any type of information about someone, the information could easily be as false as it could be true. Somehow the truth would have to be sorted out from the falsehoods.

But how would one go about such a feat? The network would be sharing what only a minority knows about with what a majority is clueless about. So any kind of popular voting scheme for the truthfulness of a statement would be completely out of the question. Facts would need to be authenticated by a trusted authority, one who would be known to be honest and have the inside scoop.

A PGP signed message system would theoretically work, allowing those close at hand to vouch for a certain tidbit with his/her signature. But PGP keys are not very popular overall, and certainly most politicians and aids do not possess PGP keys. They could, of course, generate one, but then the question is how does it get authenticated. It would have to follow the standard PGP web of trust, since a central key authority would be out of the question -- this is a distributed, unofficial network, remember.

And signatures would totally destroy any anonymity that would have been possible as a result of this network. It would be the opposite of what was wanted. Furthermore, most people have the limited ability to get stuff into the public mind through normal means.


Your goal is to create an anonymous network that still possesses credibility and usefulness, so that it does not turn into a rumor mill. Unfortunately it seems that the two are mutually exclusive in this case. I would love to see it happen, but no amount of programming or cryptography will make it what it dreams to be.

Re: Moderation would be crucial (2.66 / 3) (#5)
by royh on Sun Oct 01, 2000 at 11:53:46 PM EST

But how would one go about such a feat? The network would be sharing what only a minority knows about with what a majority is clueless about. So any kind of popular voting scheme for the truthfulness of a statement would be completely out of the question.

Well, as you say, there would have to be trusted authorities. Hard coding the trusted authorities is unnecessary though. There could be a "popular" moderation system, but it would be "personalized", as in, you decide who you trust, and those people's (organization's, whatever) moderation count. This system has the advantage of completely mimicing the effects of a trusted authority in the case of someone who trusts the authority in question.

But PGP keys are not very popular overall

They will be eventually...

And signatures would totally destroy any anonymity that would have been possible as a result of this network

Right, which is why it shouldn't be mandatory. No one with any sense will trust the average anonymous source, but sometimes it's useful.

Your goal is to create an anonymous network that still possesses credibility and usefulness, so that it does not turn into a rumor mill. Unfortunately it seems that the two are mutually exclusive in this case

But the two can still coexist; and all of the in-between states as well. The moderation determines how much rumor and how much "fact" it is. If you want to hear rumors, you can set your threshold low (or tell your computer you trust rumor mills)...



[ Parent ]
Moderation (none / 0) (#45)
by dennis on Wed Dec 13, 2000 at 04:58:47 PM EST

I read somewhere that a moderation system that lets people moderate the moderators, and other people moderate them, and on back as far as you want, eventually converges on accurate ratings. Don't have a reference, unfortunately.

Signatures aren't necessarily contrary to anonymity--they let you verify that the same person is behind a set of posts, without necessarily knowing who that person is.

[ Parent ]

Why distributed? (2.57 / 7) (#4)
by abe1x on Sun Oct 01, 2000 at 11:44:40 PM EST

I don't get it why does it need to be P2P? Isn't the goal to provide one centralized Who's Who? I can't see a single advantage to making it distributed, putting it on a centralized server whould make it far faster and far easier to use.

Re: Why distributed? (3.66 / 6) (#7)
by Eloquence on Mon Oct 02, 2000 at 12:15:36 AM EST

  • ensure relative anonymity of publishers
  • avoid trouble with copyright law
  • avoid trouble with information that some people think should not be published
  • ensure long-time existence of database and avoid commercialization (banners etc.)
  • remove central point of technical failure
  • avoid potentially corrupt authorities

--
Copyright law is bad: infoAnarchy Pleasure is good: Origins of Violence
spread the word!
[ Parent ]
Re: Why distributed? (3.00 / 2) (#19)
by Jack9 on Mon Oct 02, 2000 at 07:39:07 AM EST

Conversely, you do not need to make it distributed to - -ensure relative anonymity of publishers -avoid trouble with copyright law -avoid trouble with information that some people think should not be published -ensure long-time existence of database and avoid commercialization (banners etc.) -remove central point of technical failure -avoid potentially corrupt authorities There are more conventional ways to do all of these things. There is no internet-wide distributed system that works better than a centralizaed one at this time. So lets not get ahead of our expectations.
Often wrong but never in doubt.
I am Jack9.
Everyone knows me.

[ Parent ]
Separate Database (2.40 / 5) (#6)
by royh on Mon Oct 02, 2000 at 12:09:12 AM EST

Why doing this in a separate network instead of using an existing one? Because the existing distributed networks work according to different principles. Freenet routes all data across several clients, and it remains to be seen how scalable this approach will be in an environment that is speed-heterogenous with many dial-up users between busy routes

So are you not wanting to put this database on top of FreeNet because you're not sure it'll work? If it was proven to work, would you put this database on it. What's this principle you're talking about?

As the network would grow, it could be extended to allow the storage of different kinds of information, for examples, the behavior of certain corporations, lyrics sheets, guitar tabs

I like your idea, mostly because it sounds like some ideas I have had, which involve the whole web becoming a distributed database; Is that what your getting at?



Pros and Cons (4.33 / 9) (#8)
by abe1x on Mon Oct 02, 2000 at 12:18:00 AM EST

OK this is both a great and awful idea, here's a quick pros and cons list off the top of my head.

Pro

-Increases the transparency of information. I'd kill for something like this when hiring freelancers (if anyone runs into a fast talking animator from Atlanta named Todd don't even think about hiring him the only art he's good at is the con game).

-Would force the creation of some really powerful moderation/trust system.

-Fun to look up info on people.

-Best to do it now and open source before Doubleclick does it.

-It forces people to be honest.

Con

-Information from one part of your life would be open to people from other parts of your life. I really don't need potential clients reading about the amazing acid I hooked a friend up with in college, or rants from ex-girlfriends, or professors revealing my grades.

-Bigbrother@home (see elsewhere in this story).

-Needs a really powerful moderation/trust system.

-There are things I really don't want to know about people.

Overall I'm torn, part of me loves the idea part of me hates it. Sort of leaning pro. I rather the database be open and free then run by the government or some big corporation. Also my optimistic side can envision something like this cause society to become a bit more honest and understanding about the truth. All of a sudden people will realize that everyone and their mom smoked weed, has sex, etc... Slowly the retarded laws built up by misguided politicians start to melt away... Nah it'll never happen, but at least we'd have lots of info on the private lives of our favorite politicians.

Ethics (3.20 / 10) (#9)
by Sunir on Mon Oct 02, 2000 at 12:39:37 AM EST

Before you get carried away, I think you should think long and hard about why you want people talking about you anonymously.

"Look! You're free! Go, and be free!" and everyone hated it for that. --r

Re: Open Who's Who Database (3.37 / 8) (#11)
by Suanrw on Mon Oct 02, 2000 at 01:16:18 AM EST

At first glance, the concept has appeal. But, by the end of the piece, I found it downright scary. The biggest potential problems are privacy, and truth. Could we believe anything that appeared in the database? Could the subject person deny a false accusation? Could he/she confirm or correct or clarify an anecdotal submission?

Globally distributed information database? (2.50 / 8) (#12)
by charter on Mon Oct 02, 2000 at 03:18:54 AM EST

Wait... we already have one. It's called "the Internet." You might want to look into it; I hear it's pretty nifty.

-- Charter



Re: Globally distributed information database? (2.33 / 6) (#13)
by charter on Mon Oct 02, 2000 at 03:22:50 AM EST

Hang on, I've just re-read your article more carefully. It's either the most clever bit of social satire I've ever read (in which case my hat's off to you), or you're an unusually clueless person.

I prefer to think the former. Congratulations - it's been a long time since I've been suckered!

-- Charter



[ Parent ]
Everything sucks (3.66 / 9) (#14)
by Potsy on Mon Oct 02, 2000 at 04:25:09 AM EST

I have to say that you picked a poor example. The everything database sucks. It contains zero useful information. Instead, each entry is made entirely of smarmy in-jokes and smart-ass remarks, most of which are not even remotely funny.

And people actually link to it as though it were the @#$%ing dictionary!

Now, if you're talking about its technical merits, thats another matter. The everything database is a pretty good example of how to set up a web database. If you want to follow its model of links and nodes, but with an attempt at some serious information, then that's not a bad idea.

However, I think you should make two important changes from the "everything" model:

  • Fill it up with some serious information of your own to get it started and set the tone for what kind of information belongs in the database. A set of guidelines for people who add information would be essential.
  • Include external links!! The everything database is completely self-referrential -- all links within nodes merely point to other nodes. If you're going to be putting information about the rich and powerful in this thing, you better be able to back up everything you say, and that means references!

That said, I think a much better example of what you want to create is The Smoking Gun.

Re: Everything sucks (4.00 / 3) (#17)
by linuxonceleron on Mon Oct 02, 2000 at 06:40:48 AM EST

I'll have to disagree, while on the surface everything2 might seem like a database full of elitism, it actually has quite a bit of useful information. There's plenty of noders who write about nothing but sex, and still others that try to start flame wars (the now gone DMan comes to mind...) But I suggest you look at E2 a little harder before you berate it for having no information. If your idea of information consists of more than *nix information, then you'll realize that e2 has much to offer. There's no external links for a number of reasons, mostly because they don't want people using the "everyone" account to post goatse links on a bunch of nodes. As far as truly informative nodes go, try something like [Everything Bartender] or some of the many entries on bands/songs/people/etc. The thing to remember is that the site is *not* all about linux just because some guys from /. made it. At least 50% of its population are non-geeks.
I'm working on an AIM bot @ http://trisomy21.dhs.org
[ Parent ]
Re: Everything sucks (1.00 / 1) (#41)
by Potsy on Tue Oct 03, 2000 at 04:37:28 AM EST

Certainly my idea of information consists of more than *nix information. I never went to the Everything database looking for that. (The only places I do look for that are man pages and ORA books. :v) No, what I'm complaining about is that if you follow virtually any link to or within the Everything database on pretty much any topic you just get jokes. Stupid jokes. And links to more jokes. I did not expect to find *nix or even very much computer info there, but I was expecting to find an at least halfway serious attempt at sharing knowledge.

[ Parent ]
Re: Everything sucks (2.33 / 3) (#18)
by Jack9 on Mon Oct 02, 2000 at 07:34:37 AM EST

Everything and subsequently everything2 failed to start with even a modicum of actual information. Such a system is bound to fail without proper seeding . It's fair to say that Everything(1)-2 have fallen flat on their face, in terms of being a trusted source of information or even being able to convey general knowledge in a consistent manner. Ignoring these experiments, half-empty has essentially began the same experiment with a twist. Their deal is validity/visibility according to popularity. This is also a very silly way of going about documenting knowledge, although it may mature into an effective way to generate populist news articles. Kuro5hin makes no claims to being a knowledge base. This has allowed it to function properly as a news reporting site.
Often wrong but never in doubt.
I am Jack9.
Everyone knows me.

[ Parent ]
Re: Everything sucks (1.00 / 1) (#26)
by codemonkey_uk on Mon Oct 02, 2000 at 11:59:46 AM EST

> validity/visibility according to popularity

AKA Mob Rule.

[sorry]

Thad
---
Thad
"The most savage controversies are those about matters as to which there is no good evidence either way." - Bertrand Russell
[ Parent ]
Re: Everything sucks (4.00 / 2) (#21)
by WWWWolf on Mon Oct 02, 2000 at 07:51:52 AM EST

I have to disagree. I've found E2 a very useful site in many ways.

But the netizen nature being what it usually is, people tend to submit stuff they shouldn't submit anywhere. I keep the E2 editors in very high regard for working hard to mop up the place =)

Now, where have you last time seen people who have submitted stuff they shouldn't have? Well, personally, just twice today. Oh, wait, many more times, if you count Usenet.

People troll. People can't contribute, goddamn it.

Personally, like in many other communities, I don't understand what the hell people are whining about! I read Usenet, and couldn't care less of the trolls; I read K5 and could not care less about "Glory Of The Past" Whines...

... and I use Everything2 daily, try to give useful information for people to consult, and I couldn't care less about the trolls. In short: I don't quit. I just live on, and try to make the world a better place for everyone.

I've seen the "this community sucks, I'm leaving, goddamn it" happening way too many times - in many many places. I can't understand that, I wish you all could just understand that

  • There is no such thing as perfect community, and
  • YOU can make the world a better place if you want to! =)

(End of the Rant about Cowards Who Don't Believe In The Better Tomorrow =)

-- Weyfour WWWWolf, a lupine technomancer from the cold north...


[ Parent ]
Re: Everything sucks (1.00 / 1) (#42)
by Potsy on Tue Oct 03, 2000 at 05:08:50 AM EST

I realize you weren't necessarily speaking about me, but I feel it worth noting that I didn't leave E2 because I thought it was going downhill ("This place sucks now, I'm leaving"). I never participated in it in the first place!

[ Parent ]
Re: Everything sucks (2.66 / 3) (#22)
by dlc on Mon Oct 02, 2000 at 08:12:10 AM EST

I thought Sturgeon said 90% of Everything is crud?

All kidding aside, I have to disagree with you about Everything. It's not that it contains nothing useful, it's just that is it horribly misnamed. For the majority of its users -- the type who revel in inside jokes, smarmy attacks on the (deserving) targets within their own community, and self-referential silliness -- Everything is exactly what it bills itself as. For the rest of us, it is a place to look and say, "Gee, I hope (insert favorite site/community/UseNET group/whatever) never gets like that."

So I guess it's fair to say that, for some (very) limited definition of "everything", Everything does contain everything you need.

Now, what I'd like to see is someone take The Illuminatus! Trilogy and convert that to an Everything database...


(darren)
[ Parent ]

Re: Everything sucks (1.00 / 1) (#33)
by Spendocrat on Mon Oct 02, 2000 at 02:57:37 PM EST

Everything seems to be a watered down wanna-be version of a.r.k, on prozac (without pants, not in bed, and with Everything-the-clown noticeably absent).

[ Parent ]
Everything's Flaws (2.75 / 4) (#30)
by Eloquence on Mon Oct 02, 2000 at 01:50:00 PM EST

Don't get me started on E2. It is true, Everything2 is severely flawed in several ways. One major problem is the screwed up rating system. For example, in orded to see the rating a node has, you first have to rate it yourself. So, if I want to see the most interesting nodes on a subject, I first have to read them all -- how much sense makes that? Also, the experience points / level / voting / "cooling" guarantees a high addiction factor, but a low overall quality of nodes. People on E2 node to get "XP", and they have to provide a certain quantity (not quality) in order to reach the next level. While the creators argue that this is no problem since bad nodes will get voted down (and thereby lose XP, which you need to advance a level), this is not true:

Remember that E2 has a separate XP count and node count. In order to advance, you need a certain amount of XP and nodes. You lose XP if you get voted down extensively. But you get XP not only for writing nodes, but also for voting! So essentially, even if your nodes are voted down to -5 or lower, by spending your own amount of votes you can easily compensate for that. Furthermore, a low quality node can easily be saved by "cooling". Cooling is a feature that has been designed to make high quality nodes more visible, and only those with a certain level do have this ability. Behind this is the belief that those with a certain level will act responsibly. This is not the case. For example, most nodes by the notorious troll DMan have been cooled by other fellow trolls. Cooling gives 10 XP, while voting down sometimes gives -1 XP. See the imbalance?

Then there's the idea (even spread by the powerful leaders of E2) that everything must be linked to everything else. While this is a sweet thought, it makes little sense other than to provide a perfect environment for verbal masturbation. Example: You write a fact node about Winston Churchill, and you put a link on the word "sleep" (completely random, yet authentic example). Why would anyone want to go to sleep from Churchill (no pun intended)? Yet, this kind of behavior is massively encouraged, and nodes that don't have a high degree of linking are voted down. Also, very short fact-oriented nodes tend to be ignored or even voted down.

Then there's an issue that will probably make me lose all my K5-Mojo: E2 has a large percentage of female users, and while I certainly appreciate that -- it is probably because of the language-oriented nature of the site -- it has lead to a high number of nodes that deal with social problems, love, sex, boyfriends, fingernails and generally uninteresting stuff that women seem to like to talk about all day. This would be no problem if there were killfiles or personal scorefiles, but there aren't. There's also no logical difference between fact-nodes and rant-nodes.

And don't even get me started on removal of documents on E2. The Editors have the power to nuke or kill nodes, and they extensively use it. Sometimes simply because they disagree with a node. In many cases, if they wanted to nuke all nodes without value, they could remove about 50% on their own. This kind of selective removal is unacceptable. And there's the E2 Copyright Problem. Some people on E2 have taken the stance that they have to actively watch out for all possibly infringing documents and report them. Even the authors themselves often report their own documents. This is scary, even in a centralized system, a far more logical approach is "Wait until someone complains, then remove". In a distributed approach, you don't have to care about (C) at all, and frankly, being an outspoken infoanarchist, I am very glad about that.

There's a lot of positive things about E2, though. Some nodes are very interesting and if you only use the search box you sometimes actually get valuable info. Nowadays, E2 is a site that I visit when I look for specific info that I can't find somewhere else. Sometimes it works quite well for that. Generally I think E2 has demonstrated some interesting concepts, but now we should move on to the next level.
--
Copyright law is bad: infoAnarchy Pleasure is good: Origins of Violence
spread the word!
[ Parent ]

Re: Everything's Flaws (5.00 / 1) (#40)
by WWWWolf on Mon Oct 02, 2000 at 06:00:39 PM EST

So essentially, even if your nodes are voted down to -5 or lower, by spending your own amount of votes you can easily compensate for that. [...] Cooling gives 10 XP, while voting down sometimes gives -1 XP. See the imbalance?

Yep, E2's experience/voting system still needs a lot of work.

Furthermore, a low quality node can easily be saved by "cooling".

Cooling doesn't save the low quality nodes from the Wrath of Dem Bones and his Merry Gang®. Editors kill tons of crap every day (Yes, that's what they mostly seem to do with the nuking power). Furthermore, killing a node gives XP penalty. "Ack! You lost experience!"

If a crappy writeup gets C!'ed, I'd call it a kiss of death. =)

Then there's the idea (even spread by the powerful leaders of E2) that everything must be linked to everything else. [...]

I've seen this as an exhortation to make the hard links from nodes to point to relevant nodes, or nodes that don't exist but bloodily well ought to. =) For example, a link from Winston Churchill would lead not to "sleep", but "sleeping habits of great historical figures" (or something else that people would do research on).

Over-linking (more links than what's required) and under-linking (too few links) are both bad. Making good links requires certain level of "eye"; I doubt even I have had it enough over time =)

And there's the E2 Copyright Problem. Some people on E2 have taken the stance that they have to actively watch out for all possibly infringing documents and report them.

I thought the E2 Copyright Violations page nowadays states that only copyright holders should report violations. Dammit, I even once "turned in" most "violating" documents that I had noded and could find at that time, and none of them got nuked!

Even the authors themselves often report their own documents. This is scary, even in a centralized system, a far more logical approach is "Wait until someone complains, then remove".

Ex...cuse me? Please restate that. The authors report about violations against their copyrights, it's scary, so it'd better be that the authors would report about violations against their copyrights? =)

-- Weyfour WWWWolf, a lupine technomancer from the cold north...


[ Parent ]
What about lies? (3.66 / 6) (#15)
by Nickus on Mon Oct 02, 2000 at 04:44:21 AM EST

But what if I create entries about people that are full with lies? There is no way to know what facts are accurate and which are not. Which makes the system pretty useless if anyone can say anything about anybody without any control.

Due to budget cuts, light at end of tunnel will be out. --Unknown
Re: What about lies? (3.00 / 1) (#31)
by Eloquence on Mon Oct 02, 2000 at 01:57:07 PM EST

I find it scary to learn that you seem to rely completely on authorities in order to distinguish between truth and falsehood. There are, of course, many other ways: plausibility, logical consistence, primary and secondary sources etc. However, even the authority concept can be implemented using the described pseudonymity through digital signatures.
--
Copyright law is bad: infoAnarchy Pleasure is good: Origins of Violence
spread the word!
[ Parent ]
Re: What about lies? (none / 0) (#37)
by Nickus on Mon Oct 02, 2000 at 04:21:37 PM EST

No, I certainly do not rely on authorities to decide what is true or false. But I don't rely on things I read on the net either. The problem with all network based communications that it lacks personality in most cases. And textbased communications are the worst kind. It is difficult to get your message through exactly as you meant it. It is a lot easier to misunderstand things than in real life. And in real life it is far easier to get your mess cleaned up.



Due to budget cuts, light at end of tunnel will be out. --Unknown
[ Parent ]
Re: What about lies? (none / 0) (#38)
by Nickus on Mon Oct 02, 2000 at 04:24:43 PM EST

I forgot one thing. The language barrier is much higher in a textbased environment. English isn't my native language (as most have probably guessed) and sometimes I find it hard to get my message through because I don't always get the right words.

But I have driftet away from the topic now. Oh well...



Due to budget cuts, light at end of tunnel will be out. --Unknown
[ Parent ]
Ideas (3.33 / 3) (#16)
by Simon Kinahan on Mon Oct 02, 2000 at 06:29:51 AM EST

I think the idea of a massively distributed database system is a good one, but thats just because I've been working on it myself :) I had rather different applications in mind though, the main one being a web annotation system a bit like critlink, but distributed. I haven't got to the point of really testing out my ideas yet, but I'll tell you what they are anyway.

I tend to agree with the shouldexist article that a more sophisticated data model than Gnutella's is a good idea. There are several reasons for that, but the one that stands in isolation is that you get better search results. If you structure the data according to precise schema, and use similarly precise queries, you get less traffic and better answers. I wouldn't want to have to use the full relational mode: transactions, query optimisation and referential integrity are all very hard in distributed systesm, but just having select and project you can still do a lot better than keyword search.
More sophisticated operators can be run on the client.

Once you have a better data model, you are in a better position to judge the relatedness of data. This matters, because one way around the inherent scalability problems of this kind of system is to move data thats often needed together closer together. You can do this proactively, when the data is inserted, and this probably works better than doing it at request time. I have a lot of more precise thinking about this, but its probably a bit intense to explain here. I do think a dynamically shifiting topology based on relatedness would work better than a grid though - its a lot easier to limit the scope of broadcasts.





Simon

If you disagree, post, don't moderate
Trusted Information (3.66 / 3) (#20)
by Jack9 on Mon Oct 02, 2000 at 07:46:16 AM EST

The largest single problem facing collections of information on the internet (in whatever form), is the concept of 'a trusted source'. I can hear about a book and virtually GUARANTEE that Amazon will sell it in a couple different versions. I can hear a fragment of a band name, and do a search Napster to find their whole album. The concept of trusted sources includes the need to be comprehensive, simple/available, and accurate. Anonymous distributed systems have all but thrown these concepts out the window. You want a Who's Who Database? Start with 50 people, a fuckload of telephone books, parse away. You want it to be used? Get as many government documents as you can, hyperlinking names to information on a per-city or per-state basis for a couple months. Let people browse for free. Viola. Accurate, simple, comprehensive. If you build it, they will come.
Often wrong but never in doubt.
I am Jack9.
Everyone knows me.

prehistory: the librarians' approach (3.33 / 3) (#23)
by dchud on Mon Oct 02, 2000 at 09:24:16 AM EST

Librarians have been doing something like this for years, and we have enormous, high-quality "name authority" databases to show for it. This mostly centers around the concept of authorship, but what you're suggesting isn't terribly different and the concept could certainly scope out.

The problem is that our standards were designed over 30 years ago, and thus our databases aren't designed for distributed access. To get a sense of what our encoding looks like, check out the MARC21 authority record page at LC. Yes, it's ugly... very ugly. But there are thousands of librarians trained to produce very accurate records according to this syntax, and for the most part it works. Well, it only really works in the context of library catalogs, but my point is that there are a _ton_ of librarians who would help populate such a beast if somebody built one that was open, free, and had a better architecture than what we're dealing with now.

This sort of database already exists (2.80 / 5) (#24)
by Cid Highwind on Mon Oct 02, 2000 at 09:45:08 AM EST

Just walk into any run-down public restroom, and read the stuff scrawled on the walls. That's about how useful and accurate an anonymous who's-who database would be.

IMHO nobody would sign anything negative, out of fear of litigation, and the anonymous posters would fill up the database with useless "so-and-so is a fag" and "whoever is a slut and her phone number is 555-1212" type posts, the sort of "information" you could find on the wall of any gas station bathroom in the US.
0, 1 - just my two bits
Worst flaw shown in the highest level description. (2.75 / 4) (#25)
by zapman on Mon Oct 02, 2000 at 11:30:49 AM EST

"If an important banker is a member of the catholic Opus Dei sect..." When this kind of bias makes it into such a system, you have a problem, mainly because few people have ANY clue about Opus Dei. The coloring of this piece of data would make people cringe from a group that is largly dedicated to helping out the poor, education, and calling people to look beyond their own lives into the needs of others.

Jason
-- The request of a friend in need, is done by a friend in deed.
Just because you *can* do something... (2.60 / 5) (#27)
by Mr. Lunch on Mon Oct 02, 2000 at 12:24:02 PM EST

...doesn't mean you should. I really, really, really, really don't trust this. And sure, I'm paranoid, but paranoia is a healthy response to an irrational universe. What exactly does it say about a culture when this sort of Panopticon approach is considered a plausible way of interacting with each other? I don't want to know about my Prime Minister's sex life. Why? Because it's none of my god damned business, that's why. And because I wouldn't appreciate it if the Prime Minister used his power to investigate my personal life. Do unto others. Schoolyard rules. I'll grant you, this is certainly an interesting technological problem & a potentially clever bit of tech. But then again, so were Fat Man & Little Boy.
"Highly Professional."
What hypocrites you are... (3.33 / 6) (#28)
by Croatian Sensation on Mon Oct 02, 2000 at 12:28:19 PM EST

It's amazing that a group of people so worried about their own rights to privacy are even thinking about discussing such an intrusion into the private lives of so-called "public figures".

If you think that you have the right to collect and disseminate this type of private information about others, then others, such as the government, banks, Amazon.com or anybody else should have the right to collect whatever the hell they want about your purchasing habits, who you associate with, or aything else for that matter.

How about trying to do something according to your own damn ideology instead of bashing others for what they believe in.

If your damn agenda is so righteous, put some real effort into promoting it. Go out, make yourself a money, start a bank, get elected, start a revolution.

Re: What hypocrites you are... (3.00 / 3) (#36)
by RadiantMatrix on Mon Oct 02, 2000 at 04:12:20 PM EST

I completely agree. The double-standard that this would reflect is horrific. The privacy legislation in the US is notoriously weak, but even so there is a lot of information that can't be discovered without breaking the law. For instance, AIDS is classified as a confidential disease - you don't have to reveal that you have it to anyone, not even your doctor. And if your doctor discovers it, they cannot put it on record without your specific permission - even then, it goes into a sealed portion that only they can see unless you specifically release it.

All of that, just for a disease. Why? Because people still fear AIDS, and the government is proactively helping to prevent discrimination because of it.

Now, if this database existed, someone who you know personally might come by that information, and anonymously post it to the database. And that is just one example, I'm sure there are many more!
--
I'm not going out with a "meh". I plan to live, dammit. [ZorbaTHut]

[ Parent ]

Re: What hypocrites you are... (3.00 / 1) (#39)
by Nygard on Mon Oct 02, 2000 at 05:43:51 PM EST

Don't make the mistake of ascribing the attitudes of a few to an entire group. Remember that a heterogeneous population can have many disjoint sets, even if they share a common attribute.

If person A states in a public forum that "X" should be true, that does not mean that all members of that forum believe "X".

If person B states that "!X" should be true, that does not mean that all members of the forum believe "!X".

In particular, it does not mean that person A believes both "X" and "!X".

Put another way, suppose members of set S hold "K5 is useful". Some subset, S', holds "Privacy is good, deny access to private data". Some other subset, S", holds "Privacy is dead, equal access for all". These are not contradictory, nor are the members of S' or S" hypocrites.

In other words, just because we all read K5 doesn't mean that all of us agree with everything written here. Quite the contrary.

[ Parent ]

Ethics, Rights, Privacy, Trust (4.20 / 5) (#29)
by Eloquence on Mon Oct 02, 2000 at 01:24:48 PM EST

Some have argued that an Open Who's Who DB would potentially harm individual rights to privacy. Let me examine this argument.

First, as I wrote in the article, the persons catalogued by the DB would likely not be people like you and me because of the nature of the distributed database. Even if someone puts up information about you, it is highly unlikely that someone else, who is in the proximity of this person, will search for it, as it is unlikely that the number of requests is very high so that the data would be cached across a sufficient number of nodes.

So the persons you could successfully search for would generally be more or less popular public figures.

Second, to those who have argued that because the corporations and gov'ts shouldn't know all about us, we shouldn't know all about the corporations or the governments, this is a dangerous and -- sorry -- stupid attitude. There's a slight imbalance of power between me and Amazon.com or GBJ. It is obvious that the more information we have about those who have power over others, the less likely it is that this power will be abused.

I'm not talking about the president's sex life here, but those who take this as a basis to their political decisions should have the right to obtain the available information (whether they should believe it is a different question). I'm talking about corruption, crime, questionable organizational ties etc. Exactly the kind of thing the media doesn't tell us. Saying here "I don't want to know, and I don't want anyone else to know" is about the same as saying "Go ahead, exploit me, I will look away".

Now there's the issue of libel and trust. Let's assume for a moment that you or I are persons about who information is stored in the database. Someone creates an entry that says "X is a member of NAMBLA and has openly called himself a 'compassionate pedophile' in the past." The question here is not whether we like this information or not, the question is whether it is dangerous to us. And seriously, who in his right frame of mind would believe any such claim if it is not supported by sources and the person who makes the claim already has built up a relationship of trust with us? Anonymous libel is perhaps the least dangerous form of libel that is thinkable.

Let me get back to the "private information about little-known people" argument. While it is unlikely that such information would be obtainable today for the technical reasons I have presented, in a world where broadband access is widely distributed, it becomes a real possibility. What you must understand is that this is not dependant on the open DB technology. Doubleclick, Radiate, Cydoor et al. are collecting extensive profiles about us, including website preference, search engine terms and program usage, today. While most of these companies claim that the information is completely anonymized, I trust em about as far as I can throw them. It is fairly easy to combine a DC profile with a certain e-mail address once that information has been entered on a website that is part of the cookie-sharing network. (Lesson: Turn on cookies selectively only on sites that you trust, they are on of the biggest privacy risks.)

Having this information in a public network (which, again, is not the goal of the project and only a possibility in the long term) would IMHO be better than having it on the harddisks of DC et al. At least you could find out "what they know about you", the kind of profiles you can buy for $50 would be available for free. What do you like more, that only a few powerful corporations have this info, or that everyone has it and can rebut it if necessary?

Now, a few more words about trust. Perhaps I should have labeled the database "pseudonymous" and not "anonymous" since that is what is actually happening. You would create an entry and choose one of multiple pseudonyms to assign to it. E.g. I could write under the nickname Eloquence (with its own public key + digital signature) and people could be sure that it's always the same person without knowing who is behind this nickname (unless this information was published somewhere). You could use different nicknames for different subjects you write on. By having such pseudonyms, you will eventually have a loose "web of trust" with persons who have proven to deliver valuable information. A first-time user will first have to earn this trust by others. Extraordinary claims without extraordinary evidence by newbies will simply be ignored.

I love rating systems and this could surely be widely extended to allow for much more extensive tagging of database entries (you could be asked to review/rate an entry after reading it, and when returning search results, you would also return the rating for the entry). I don't know if this should already be in the first version of the network, though. However, it is a much needed feature to further improve the quality of the database.
--
Copyright law is bad: infoAnarchy Pleasure is good: Origins of Violence
spread the word!

Re: Ethics, Rights, Privacy, Trust (2.66 / 3) (#32)
by Spendocrat on Mon Oct 02, 2000 at 02:38:58 PM EST

I think you've really underplayed the problems that can arise from people inserting false information (willingly or not!) into such a database. Don't get me wrong, I would love to be part of such a network, but I'd need a bit more information about where people are getting the info in the database. I think it's naive to think that we won't have people trying to make the signal to noise ratio in such a project so low that it would be mostly useless. We've seen these kinds of things before in usenet wrt scientology. The real problem (IMHO) would be that people would come to rely on such a databse in the same way a lot of people seem to rely on slashdot for their tech news. It's convenient, but in this case you'd have even less of a chance to verify the veracity of any information, unless there was some kind of pointer to the originating resource.

The biggest obstacle seems to be having your anonmity and accurate information too.

It doesn't look like it could be a problem when you consider the extreme case, but if we're trying to make truly informed decisions, the extreme case isn't the only worry.

[ Parent ]

privacy (none / 0) (#44)
by Nyarlathotep on Tue Dec 12, 2000 at 01:18:10 PM EST

Actually, the reason that you give for not harmming people's privacy is a reason why it might not carryenough information. A random judge who gets invlved with an importent case would not necissarily have any more information available then flamboiant intern who works for RedHat.

Unfortunatl, I think your going to need a lot more information available then the people who feal this is a privacy violation would like. This is not necissarily a bad thing, but it means that the "kind" of information which gets posted must bewatched.

You could try the following system:

1) Anyone can enter a persons name into the database. Anyone can attach a question or statment to multiple people's names. Anyone can reply to a question or statment with question or statment, but only a statments repling to a question may also be associated to a name. A statment reply to a statment will be required to say true or false.

2) There are not supposed to be ANY duplicate statments. You would use an AI to detect possible duplicates and put them up for a vote. Duplication is the only reason information will every be removed from the network, i.e. there will be exactly one hot grits troll regarding CmdrTaco.

3) The search engine will return only results which are associated to a name and will sort by true votes since a lot of true votes should mean a lot of corroberation and links.

Campus Crusade for Cthulhu -- it found me!
[ Parent ]
Accuracy (3.00 / 4) (#34)
by Anonymous 6522 on Mon Oct 02, 2000 at 04:01:46 PM EST

One problem with a database of this type would determining the accuracy of the information. Sure you could encrypt it and verify who wrote it, but what if the person is intentionally spreading misinfomation or is just misinformed?

Some type of moderation system could work, but this would require moderators with the time and know how to verify this data for authenticity. If the moderation system is open to all, I can see certain interested groups using their moderation powers to declare lies truth and truth lies.

Basically I don't think that a system like this can be trusted to provide accurate, truthful data with a minimum of spam and lies.

moderation is a bad idea (none / 0) (#43)
by Nyarlathotep on Tue Dec 12, 2000 at 12:50:37 PM EST

It's a bad idea to allow moderation of this database since it makes it too easy to hack (control a lot of accounts). Plus, you want the crazy roomers to be included too and you really do not want anyone's data to be erased.

No, the correct solution would probable be to sort search results by the number of "Me too" ("Yes, this is correct and here is the evidence") reslts. Plus, identifing these "me too" results means that your search result presentation is more efficent since they do not need to be shown.


Campus Crusade for Cthulhu -- it found me!
[ Parent ]
grammar (2.33 / 3) (#35)
by mattdm on Mon Oct 02, 2000 at 04:05:59 PM EST

Actually, despite the title, "Who's Who" is grammatically correct. For whatever that's worth.

Open Who's Whom Database | 45 comments (42 topical, 3 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!