In a ShouldExist article, I have tried to explain basically how a distributed dynamic database could work. Generally, I envision a Gnutella-style broadcast protocol with automatic caching. Everyone has a local store of shared profile data, which can be queried on different fields. Searches are routed to a number of neighbouring clients (perhaps not in a Gnutella-like chaotic structure but instead in the more organized "Grid"-like fashion that Ben Houston describes in this paper). Some may argue that the scalability is too limited, but that is untrue. The scalability is obviously limited, but that doesn't necessarily hamper the operation of the network. With a limited TTL, the results only reach a limited number of neighbours, thus defining the maximum traffic caused by the search broadcasts (note that the current traffic on the GnutellaNet is mainly caused by pings and pongs, used to find all hosts on the network, and push requests, used to initiate transfers through firewalls, both could be made much more efficient).
The results can be transferred back directly to the querying IP number. This makes it easier to detect flooding (and reduces network load) but gives away the anonymity of searches. However, we still maintain a certain degree of anonymity by caching all data upon receipt. Thus, you don't know if the IP you download from is the IP of the author (in a small network, it will not be too hard to figure out, though).
One of the trickier aspects would be source authenticity. The problem is cryptographically solved through public/private key encryption and digital signatures. However, maintaining a directory of users and their public keys is not quite as easy if you do not want to rely on a central server. But perhaps this is not necessary: It might be enough to add the respective information to each database entry returned, thus, if you have one entry by a certain author, you can easily identify other entries by the same author (making identity fraud impossible).
Now say I want to create an entry about Jack Valenti. I store the respective data (for example, information about the nice parties he throws for "his" politicians) on my node and login to the network. Now anyone who queries for Jack gets results from anyone who provides this respective information (with timestamp and perhaps cryptographically signed+public key). The timestamp helps finding the most current version of a certain document.
Fine-tuning would include making the network searchable according to all thinkable different criteria, client-side indexing etc.
As an implementation platform, anything but Java or Linux-only would be fine. The "client" would have to be easy-to-use and easy-to-install for Windoze users. (The size of the network is essential to its success.)
Why doing this in a separate network instead of using an existing one? Because the existing distributed networks work according to different principles. Freenet routes all data across several clients, and it remains to be seen how scalable this approach will be in an environment that is speed-heterogenous with many dial-up users between busy routes (as Freenet is currently mostly used by "power users", this doesn't seem to be much of a problem yet). The same is true for Blocks. Gnutella has serious flaws in the protocol (too many pings/pongs, push requests, too many different client implementations) and is too busy for a specialized task such as this one. JungleMonkey is Linux-specific. The fact that the files this network would provide would generally be very small while the mentioned systems deal with sharing of very large files (MP3s or movies) also calls for a separate, light-weight network. Something that could run in the background without hindering your normal Internet use.
And now, back to the cultural aspects of such a network: At first, it would make it very easy to share information about politicians, economic leaders, journalists, programmers etc., people could even use it to distributed their own personal records. This information would be of interest to us all: When voting or deciding which product to buy or which organization to support. I don't perceive it as a threat to privacy. Most information that could be shared would have to be popular to be distributed over a large number of nodes and to be visible to all users.
While searching would be non-anonymous in the approach outlined above, providing files (especially popular ones) would be relatively anonymous. As the network would grow, it could be extended to allow the storage of different kinds of information, for examples, the behavior of certain corporations, lyrics sheets, guitar tabs (all information which has been censored on the WWW in the past). The database used should best be as extensible as possible (XML-based?) .
Why start with a task like the "Who's Who" and not make it more general from the start? In order to avoid confusion. If everyone starts their own database schemes immediately, things will get wacky. Perhaps some centralized elements could be used: an expendable "closest-node-finder" (reducing the necessity of pings and pongs on the network) and perhaps a "datasheet server" which defines the database structures that can be read from and written to. While clients could route "alternative" databases (similar to the alt.* usenet hierarchy), especially within private networks, the central server would define a general consensus. (Should it get shut down, this wouldn't be a problem, existing clients wouldn't be affected and new ones would simply d/l the sheets from somewhere else.)
Why do I submit this to the K5 community? Because I want to get the biggest potential feedback from a tech-oriented readership and /. doesn't usually publish articles this long unless they are written by Jon Katz ;-). While I would love to choose such a database system as a university project, I currently cannot do much as regards its implementation, and I feel the time is critical to show the potential of Gnutella-like technology for freedom of speech. I would like to cooperate with others on it, but I can't do the main work.
So what do you think? Would you use such a database? Are there serious flaws in my modest proposal? Is there perhaps even an easy-to-use, open-source free system that could be immediately used with minor modifications?