Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

A standard for software metadata

By tagish in Culture
Mon Sep 18, 2000 at 06:31:08 PM EST
Tags: Software (all tags)

I'm sitting here preparing some Java source to release under the GPL and wondering how best to tell people about what I'm doing. It's one thing to make this stuff available, but if people can't find it I'm wasting my time. Of course there are places I can go to publicise what I've done (Freshmeat, Jars, Gamelan, Servletcentral in this case) and those services perform a valuable function, but in practice it is still quite hard for someone to find some code in language X that performs function Y in a way that complies with constraint Z. There's no search engine that finds reusable code based on variable criteria and, given the number of incompatible ways source code can be packaged, described and distributed, little prospect of anyone building one.

Right now, when I release this code, if I want people to find it I have to

  • write a description of it
  • set up a home page for it
  • register that page with numerous search engines possibly using the description I wrote
  • visit the appropriate repository and announcement sites making submissions at each
  • find out whether there's an appropriate usenet group and post to it
Perhaps I won't bother.

It seems to me that there's a compelling need for a simple, extensible standard for software meta data -- an agreed way of describing for any piece of code, what it does, who made it, what license it is available under, what platforms it supports, what it is compatible with and so on. The first question then is: does such a standard exist? And if it does why is it not more popular? The closest thing I'm aware of is CPAN for Perl, but that doesn't necessarily scale well to other languages and situations.

Assuming that such a standard doesn't exist does anyone want to get together with me and devise one. I'm thinking of something (human) language independent, simple, capable of encompassing all types of code, amenable to automatic processing. What about it?


Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure


Related Links
o Freshmeat
o Also by tagish

Display: Sort:
A standard for software metadata | 17 comments (17 topical, editorial, 0 hidden)
What is wrong with Freshmeat and Sourceforge? (2.20 / 5) (#1)
by Qtmstr on Mon Sep 18, 2000 at 06:01:09 PM EST

What's wrong with using freshmeat and sourceforge? The combination is easy to setup, is free, and is visable to many people.

Kuro5hin delenda est!
Re: What is wrong with Freshmeat and Sourceforge? (4.50 / 2) (#2)
by PresJPolk on Mon Sep 18, 2000 at 06:03:09 PM EST

Yes, Sourceforge has the "Trove" categories, that do allow for one to look for software written in language X that does Y, given certain constraints Z.

[ Parent ]
Re: What is wrong with Freshmeat and Sourceforge? (3.66 / 3) (#8)
by tagish on Mon Sep 18, 2000 at 07:03:39 PM EST

That's getting close to what I had in mind, but I'd like to see a standard for comments that would allow the SourceForge snippets library to be built automatically, and I'd like to see (optionally) more detailed classification.

The first point is more important. I don't doubt that collections of source code can be built, but the current state of the art is that those collections have to be built 'by hand'. If the source was marked up using a standard metadata vocabulary collections of that source would be self ordering.

-- Hexten
[ Parent ]

Re: What is wrong with Freshmeat and Sourceforge? (3.33 / 3) (#6)
by tagish on Mon Sep 18, 2000 at 06:52:34 PM EST

Well they're fine, but that doesn't mean we can't do better. In general a repository that relies entirely on free form descriptions is going to be less well ordered than one that uses a standardized classification space.

What I have in mind is a system that would allow you to express something useful about components as small as functions and as large as entire applications with only a few hundred characters of (human) language independent metadata that could be embedded in a comment.

If I search Freshmeat for expression evaluators written in C++ I'll find those items where the author referred to the code as an "expression evaluator" (rather than any other term) and decided to mention the fact that the code is written in C++. There's a good chance that I'd find a bunch of stuff that wasn't relevant and miss quite a bit that was. Using the system I'm proposing you'd get a list containing exactly the expression evaluators in C++ and nothing else.
-- Hexten
[ Parent ]

language definition (3.00 / 6) (#3)
by madams on Mon Sep 18, 2000 at 06:11:57 PM EST

Defining a language (metadata) to describe software would be the most difficult part of this project. I wouldn't even know where to start in designing such a language.

Heiarchical catergorization would be the easy choice, but this gets into the problem of bloat: horizontal or vertical?

The second question is, would such a system actually help you find code? I might easily ask such a system, "I need an XML parser in Java" and get 20 results in return. What other search constraints would I be able to add? This last question would be answered by the metadata format, which, as stated previously, would be difficult to design.

Mark Adams
"But pay no attention to anonymous charges, for they are a bad precedent and are not worthy of our age." - Trajan's reply to Pliny the Younger, 112 A.D.

Re: language definition (3.00 / 3) (#5)
by maketo on Mon Sep 18, 2000 at 06:43:34 PM EST

Distributed agent frameworks offer an answer to this problem by providing yellow-pages type servers where components can register their services and all other "agent" components can see these interfaces. I am working on a project where we are using XML described component interfaces for this purpose. The idea is for all components to be able to access the yellow-pages, query the service by keyword(s), get a list of agents implementing a given service and then pick agent interfaces in XML, reason on them and offer the data in the format understood by the agent providing the service. Coupled with CORBA it is easy to write components in different programming languages, XML gives a standardized way of describing the component interface yadda-yadda-yadda.
agents, bugs, nanites....see the connection?
[ Parent ]
Re: language definition (4.00 / 2) (#7)
by faichai on Mon Sep 18, 2000 at 07:03:03 PM EST

Thats exactly the first thing I thought when I first read the original article.

Say you got some XML meta data format to describe the functionality of a piece of software, then in order to provide a rich enough vocabulary you end up with a language in its own right, which is going to end up being a bigger PITA than the original code.

If it were going to be done, I think a metat-data extraction tool should be devised, that with a little bit of help from the programmer, deduces the functionality of software automagicly.

Then to make most use of such data, a natural language interface would be nice to query it, and then all of a sudden we are in bluesky, and require more QuBits than are available ;-)

What I am trying to get at is that metadata for describing programs, is a hard problem, and I wouldn't want to hold my breath till it happens.

[ Parent ]

Re: language definition (4.66 / 3) (#9)
by ramses0 on Mon Sep 18, 2000 at 07:16:52 PM EST

I've heard of CPAN, but never gotten close enough to perl to actually want to take a look at it. Supposedly CPAN is a place for people to develop perl modules, and post them for reuse.

It must be easy to use because so many people talk about it, and the whole thing is focused on perl, right?

...so would it be possible to just "steal" cpan's organization, and add a "language" category to it? Possibly also "OO" v. "Non-OO" for projects which focus on one or the other?

This sounds like it would work as something to do right now to promote code reuse, but comments from some perl wizards would be welcome.

[ rate all comments , for great justice | sell.com ]
[ Parent ]

Rich Morin's Meta proposal (3.00 / 5) (#4)
by kmself on Mon Sep 18, 2000 at 06:18:45 PM EST

This seems to have elements of what Rich Morin is proposing with his Meta project, though the scope of Meta goes far beyond what you're proposing.

Much of the fundamental metadata you're talking about already exists in one form or another in various packaging formats, including Debian's DEB, the *BSD's ports collection, and the slightly battle-weary Red Hat RPM format.

Much of the functional infrastructure is provided by sites such as SourceForge, SourceXchange, and similar, as observed by others.

Karsten M. Self
SCO -- backgrounder on Caldera/SCO vs IBM
Support the EFF!!
There is no K5 cabal.

System 12 (3.00 / 4) (#10)
by IoaPetraka on Mon Sep 18, 2000 at 08:32:10 PM EST

What ever happened to Bowie Pogue's plan to create exactly what this person needs? When he quit the Propaganda root window tiles project about a year ago, he announced that he was going to be starting a website, or system, that offered code in a sorted manner, so that programmers could come along, pick the language, function, and requirements and hopefully walk away with a couple hours saved.

Does anybody know what became of this? Did it die, or has it not panned out to be what he wished for?

Ioa Aqualine Petra'ka

Re: System 12 (none / 0) (#15)
by Zarniwoop on Wed Sep 20, 2000 at 01:23:07 AM EST


He felt that sourceforge came too close to his ideas for the system12 project, and that VA had ripped him off. He subsequently canned it, and moved propaganda over to propaganda.tilez.org. He now rants quite a bit on slashdot... for his latest, check his user info.

[ Parent ]

There already is a standard. (3.83 / 6) (#11)
by DigDug on Mon Sep 18, 2000 at 08:43:08 PM EST

It's called PAD, which stands for Portable Application Description. It's XML-based, free and Free, and designed specifically for this use. It is already widely used by shareware authors, but I don't see why it can be used by everyone else as well.

Yavista - if you haven't found a nice homepage yet.

Re: There already is a standard. (none / 0) (#16)
by Alhazred on Thu Sep 21, 2000 at 04:28:35 PM EST

Can you tell us more about where we can find information on this?

I can think of a number of related or overlapping technologies which might be considered or which might be worth at least being aware of when developing such a system. Including meta, and RDF amongst others.
That is not dead which may eternal lie And with strange aeons death itself may die.
[ Parent ]

how many websites did you post this to? (4.00 / 4) (#12)
by mihalis on Mon Sep 18, 2000 at 11:13:43 PM EST

It's on K5 and The Other Place already. This duplication is unwelcome in my book. If I have some thoughts I don't want to have to post them on both sites.
-- Chris Morgan <see em at mihalis dot net>
Re: how many websites did you post this to? (none / 0) (#17)
by tagish on Sun Sep 24, 2000 at 05:03:49 AM EST

Not intentional. It sat in the queue at /. so long (and I forgot that they tell you if a story is rejected -- mea culpa) I thought it had died.
-- Hexten
[ Parent ]
OK, lets have a go (4.00 / 3) (#13)
by daani on Tue Sep 19, 2000 at 12:45:09 AM EST

What kind of queries would you give a database of the proposed meta-data? Say I wanted a platform independant C++ interface to the system dictionary. To do something like:

Dictionary myDictionary;

to see if I spelled the word "animal" correctly. What kind of data can I pass my query?

* The language: C++
* The task : "spell check"
* Platforms : unix, windows, mac

So I could constrain it for a particular language and a list of platforms. But I still have to do a search of a free-form description for "spell check".

What other meta-tags would you add? Hmmm.

Annoucements & Metadata (3.00 / 1) (#14)
by neuroman on Tue Sep 19, 2000 at 04:32:34 AM EST

I also had to make public my (Java) software so I can contribute my (small) experience accumulated during this.

This where the sites where making my announcement was (let's say) making a change:

   * java.co.uk
   * javalobby.org
   * javaworld.com
   * jdance.com
   * freshmeat.net
   * http://pharos.inria.fr/Java/
   * hotscripts.com
   * linuxlinks.com

From javalobby, javaworld and freshmeat the world spread a lot .

As for search engines, don't bother with some automatic registration, just go to the biggest and submit manually (about 30 minutes at all), but don't expect that you'll get on them quick.

The metadata is an interesting subject. I am also studying this for a new project and all I can tell you is that is madness. Everyone is making his own standard.

Here is an document which describe quite clear what is happening out there.
* http://www.onlineinc.com/onlinemag/OL1999/milstead1.html

and here is a link to metadata resources.
* http://web.simmons.edu/~schwartz/mymeta.html

But in software metadata I would definitely try to go with (or start from) the RDF/DC (Resource Description Framework / Doublin Core), because this is what W3C seems to support.

About implementing such a think I see 3 choices:

* make a mailing list invite everybody who is interested, agree on an standard, and do it. SourceForge project, applications for the api and yes get freshmeat to support it. (not very realistic, since the first point, is the place where all the other projects stopped)

* be silent, take the actual freshmeat way of registering stuff as the initial release, make lot's of software for it, make it accepted and used and then build the standard to fit everything, since will already be accepted. Not so nice, but more realistic approach.

I hope we (those who love and work for the Open Source idea) are more unite and we can make this reality.
"One must still have chaos in oneself to be able to give birth to a dancing star." [ Friedrich Nietzsche ]
A standard for software metadata | 17 comments (17 topical, 0 editorial, 0 hidden)
Display: Sort:


All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!