Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Extending the web with metadata profiles

By ubernostrum in Technology
Thu Sep 16, 2004 at 03:07:34 PM EST
Tags: Internet (all tags)
Internet

If you hang around on web-related mailing lists long enough, you start getting the idea that the future is full of metadata. Now, this metadata may or may not be XML, or it may or may not be RDF or OWL or a dozen other technologies with impressive-sounding words like “ontology” in their names. It may or may not be the long-dreamt-of (and often derided) “Semantic Web.” In fact, it may or may not be a dozen different buzzwords, and it may or may not be a good thing. But whatever the future is, it will definitely be full of metadata; on this the experts agree.

To my mind there’s a problem with this: the argument always seems to depend on technologies which don’t exist or aren’t quite ready yet, so it always falls back to talking about how things will be “in the future,” which may never get here. Luckily, there's an easy way to add oodles of metadata to your documents right this minute, without having to learn anything more complicated than trusty old HTML 4.01. If it catches on, “the future” might get here a lot sooner than expected.


What is metadata?

In case I lost you in that opening paragraph, what I’m talking about here is the concept of “data about data,” or, more accurately, information which talks about itself a bit. Simple examples of metadata surround us all the time: in the process of fetching this page, for example, your browser was probably told how large a file it is (some number of bytes), what type of file it is (text marked up with HTML), and what language it’s written in (English). Even though it’s usually invisible to you, this information and more can and should be sent with every single web page, because this sort of metadata is very obviously useful. Telling the browser how big the file is lets your computer set aside memory for it, specifying the type of file helps figure out what type of program should deal with it, and stating the language up-front lets your computer know that the page should be displayed using the Western alphabet rather than, say, Chinese characters.[1]

And it’s common to go much further than this; for example, if you have a page about vineyards in Roanoke, Virginia, you probably wouldn’t be content to just give it a descriptive title:

<title>Roanoke Vineyards</title>

Knowing that HTML provides the <meta> element for expressing certain types of metadata, you’d probably also add some more information:

<meta name="description" content="Vineyards in and around Roanoke, Virginia, and other local wine-related info">
<meta name="keywords" content="wine, wine-growing, wine-tastings, vineyards, vintages, Roanoke, Virginia">

This will help search engines to categorize your page and return it as a result in relevant searches[2], which is generally considered a good thing, good enough that at least half and likely far more than half of the things you see on the web will use this kind of simple, effective metadata. But <meta> tags are far from revolutionary and, as metadata goes, they’re still kid stuff.

There’s metadata and then there’s metadata

Real Web Authors are into the sort of hardcore metadata that (they think) HTML simply isn’t built to handle. They want to be able to express more interesting information than “this is a page about vineyards,” and they come up with some pretty complex and interesting ways to do it. To take a simple example, consider a statement like “Alice is Bob Jones’ friend.” This is potentially useful information (it can be used for networking, for “vouching” for a new acquaintance, and so on), but how could it be expressed in a way that, say, a search engine could understand? One solution is FOAF (Friend Of A Friend) markup. FOAF is based on RDF, a metadata format standard from the W3C, and it can get a little unwieldy. For example, if Bob wanted to create a simple FOAF document explaining that Alice is his friend, it would look something like this:

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
<foaf:PersonalProfileDocument rdf:about="">
<foaf:maker rdf:nodeID="me"/>
<foaf:primaryTopic rdf:nodeID="me"/>
</foaf:PersonalProfileDocument>
<foaf:Person rdf:nodeID="me">
<foaf:name>Bob Jones</foaf:name>
<foaf:givenname>Bob</foaf:givenname>
<foaf:family_name>Jones</foaf:family_name>
<foaf:mbox rdf:resource="mailto:bob@example.com"/>
<foaf:homepage rdf:resource="http://example.com/bob/"/>
<foaf:knows>
<foaf:Person>
<foaf:name>Alice</foaf:name>
<foaf:mbox rdf:resource="mailto:alice@example.com"/>
</foaf:Person>
</foaf:knows>
</foaf:Person>
</rdf:RDF>

That translates, roughly, to “I’m Bob Jones, and Alice is my friend.[3]” And, up until very recently, even very knowledgeable people in the field would have told you that there was simply no way to express that sort of metadata in HTML; hence we need the Semantic Web and technologies like RDF and OWL. Or at least, we need them if we assume there’s nothing in HTML which can provide this functionality; as it turns out, that assumption is wrong.

Link relationships and metadata

HTML currently provides two elements for linking to particular resources: the workhorse is the a element, which is what most of us mean when we’re talking about links. There’s also the link element, which the HTML spec says “conveys relationship information that may be rendered by user agents in a variety of ways” and which is responsible for most of the stylesheets and “favicons” on the web. For example, the following indicates that the file style.css is the page’s stylesheet:

<link rel="stylesheet" href="style.css" type="text/css">

The way a browser knows that this is the stylesheet is, of course, by the content of the rel attribute, which is where the magic happens. The spec says that rel should be a list of “link types,” and provides a variety of types to choose from as needed. And rel can also be applied to links created with a; for example, to provide a link back to a site’s index:

<a rel="index" href="index.html">Home</a>

Some browsers already provide a menu of navigation options based on link relationships in the page, and Mozilla’s link pre-fetching feature will pre-load the next page in a sequence if it finds a link with rel="next"; these are interesting features which require very little work to use — adding a rel attribute to a link is actually a wonderfully easy way to go about this. But the list of link types in the HTML specification is pretty sparse, given the huge number of possible relationships between pages. And that’s where metadata profiles come in.

The forgotten attribute: profile

Here’s a quick quiz to amuse your markup-savvy friends: without looking at the HTML specification, how many different attributes can you think of which legally apply to a <head> tag? The answer is three: lang, dir and profile.[4] You can be forgiven if you didn’t get any of them, and especially if you didn’t get profile; it’s the attribute that time forgot. But it’s also the attribute which makes rich, “hardcore” sorts of metadata possible in pure HTML.

As defined, profile simply “specifies the location of one or more meta data profiles” for the page, but that’s where the magic is. A metadata profile isn’t hard to create, and is even easier to use. The specification doesn’t actually outline the format of a metadata profile; the Dublin Core profile is a highly detailed document, but the very popular XFN profile is much simpler, and the XHTML Meta Data Profiles tutorial page provides a simple definition list as a sample profile. Returning to the example of Alice and Bob, Bob could avoid all that messy FOAF markup he created earlier by using the XFN (that’s “XHTML Friends Network”) profile with his page and linking to Alice’s site like so:

<a rel="friend" href="http://example.com/alice/">Alice</a>

And they said it couldn’t be done in HTML.

And a metadata profile can specify other types of information besides link types: the Dublin Core profile specifies information to be inserted in <meta> tags, and the scheme attribute in HTML allows for easy interpretation of troublesome formats like dates (use of scheme makes it possible, for example, to determine whether 09-11-2001 refers to September 11 or November 9).

The possibilities of profiles

Common use of profiles would make it possible to express a vast array of information without having to resort to convoluted, heavily-abstracted solutions like RDF; think of it as metadata for the people. For example, XFN was the first I ever heard of metadata profiles (and, I imagine, the first that a lot of people heard of them) and has become extremely popular in the weblogging community. With XFN’s profile, it’s possible and actually downright easy to turn a blogroll into much more than a list of links; for example, a simple script can index the pages of a group of people who all use XFN and build a map of their relationships to each other. From there it’s a simple step to being able to answer questions based on those relationships, and other interesting applications.[5]

And the potential doesn’t stop with keeping track of your social circle; there are even more interesting possibilities coming to light. For example, Ian Hickson (formerly of the Mozilla project, now working for Opera), recently wondered aloud about ways to fight spam comments on weblogs and other sites which allow open commenting:

I’m thinking that HTML should have an element that basically says "content within this section may contain links from external sources; just because they are here does not mean we are endorsing them" which Google could then use to block Google rank whoring.

The idea, of course, is that without the benefit of increased PageRank there would be much less incentive to post spam comments. A Web developer named Lachlan Hunt saw that comment and posted some ideas on using metadata profiles to implement this; for example, a profile could define an “unendorsed” relationship, which search engines could look for and use to adjust their ranking calculations, with an unendorsed link providing little or no benefit to the page linked.

Lachlan also proposed a number of other interesting link relationships which would be handy in everyday use; sites like k5 and Slashdot could use rel="member-only" to indicate links to sites like The New York Times which require registration to view articles, and using the relationship comment would make it easy to quickly distinguish links to external pages from links to comments in a discussion forum. Altogether he has quite a list, ranging over accessible versus inaccessible sites; kid-friendly versus adult-themed; and plenty of others which could lead to interesting searching and cataloging utilities if commonly used.

The future

So it seems that metadata profiles can solve problems previously thought intractable in pure HTML; this would be a huge step forward for useful metadata on the web if it got into widespread use, but at the moment the technology is far too obscure. Resolving this would require several things to happen:

  • First and foremost, people need to be made aware that this technology exists; that's one of my main motivations for writing this article. The profile attribute has been sitting around in the HTML spec for years without much notice or use, but once people know about it the ease of use compared to other metadata solutions should help its popularity quite a bit.
  • Second, support for metadata profiles needs to be built in to common web tools; weblogging systems are implementing XFN, but there needs to be support in more mainstream content-management systems and also web composing applications (Dreamweaver, Frontpage, etc.), and it needs to not be specific to one profile.
  • Finally, a standard format defined for writing metadata profiles needs to be defined, the simpler the better. This could be a good project for the newly-formed WHAT-WG to tackle; it could get a specification out fairly quickly, and once that was established it would be much simpler to lobby for support and introduce new users .

The future will probably be full of metadata one way or another, but the technology to do it simply and efficiently in HTML exists today in the form of metadata profiles; all that's needed is for it to get into wider use and “the future” will be here.



Notes

[1]
Well, really it’s the character encoding, not the language, which determines the alphabet used, but character encoding is a whole ’nother article. Let’s keep this one simple and on-topic.
[2]
Or at least it will for some search engines. Due to rampant abuse (read: stuffing completely unrelated keywords into your page in hopes of coming up in more searches), there are a lot of search engines which ignore or don’t give much weight to keywords found in <meta> tags. Moral of the story: don’t piss in the metaphorical metadata well.
[3]
I say “roughly” because I don’t think I can really explain the abstractions of RDF adequately for an article on this level. If you’re interested in learning exactly what’s going on in all that markup, I recommend you pick up a good book on RDF.
[4]
For reasons which I don’t fully understand, it’s not possible to apply id or class; the “common” attribute set isn’t specified as applicable to head.
[5]
For example, if Bob was supposed to meet Alice but couldn’t remember where the party was, he could ask a search engine for the latest weblog entries of Alice’s friends and see if one of them mentioned it. This is the sort of application the Semantic Web working groups dream about, and it would be pretty simple to implement.

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
This is:
o Pretty cool. 33%
o Meh. 33%
o Pretty lame. 25%
o Full of problems, which I will list in a comment. 7%

Votes: 27
Results | Other Polls

Related Links
o Slashdot
o Google
o Semantic Web
o HTML 4.01
o 1
o &lt;meta&g t;
o 2
o FOAF
o RDF
o 3
o OWL
o a
o link
o rel
o a variety of types to choose from
o link pre-fetching
o lang
o dir
o profile
o 4
o Dublin Core profile
o XFN profile
o XHTML Meta Data Profiles
o definition list
o a sample profile
o scheme
o 5
o Ian Hickson
o Opera
o recently wondered aloud
o PageRank
o Lachlan Hunt
o some ideas on using metadata profiles to implement this
o WHAT-WG
o 1 [2]
o 2 [2]
o 3 [2]
o a good book on RDF
o 4 [2]
o 5 [2]
o Also by ubernostrum


Display: Sort:
Extending the web with metadata profiles | 107 comments (103 topical, 4 editorial, 2 hidden)
i dunno what to think (2.33 / 3) (#2)
by the77x42 on Wed Sep 15, 2004 at 03:32:05 AM EST

I've finally figured out that XML is just a really, really poor-man's database. The big difference between XHTML1.0 and HTML4.01 still misses me. Using RDF for anything meaningful beyond syndication is foreign. Now using relationship links and profiles seems to be consistent with everything I just mentioned.

The only reason I can see going for complicated RDF instead of rel tags is some parsing reason. I mean, come on, XPaths are really cool.


"We're not here to educate. We're here to point and laugh." - creature
"You have some pretty stupid ideas." - indubitable ‮

well (none / 0) (#3)
by reklaw on Wed Sep 15, 2004 at 03:55:13 AM EST

Not really a "poor man's" database... I think it's a far better way of doing databases than people generally use now. It is kind of surprising when you realise that the whole "semantic web" thing has basically been a big database all along, though -- because no-one ever talks about it like that.

Between obfuscation of purpose and the dreadful language that is RDF-XML (seriously, what a piece of crap), people are prone to dismiss the whole thing, which is a shame. I'm actually quite interested in the possibilities of combining Notation3 (RDF that doesn't suck) with RDQL (SQL-like querying of RDF).
-
[ Parent ]

Disagree (3.00 / 3) (#8)
by vadim on Wed Sep 15, 2004 at 10:07:17 AM EST

If XML was that great for databases, everybody would be switching by now, and we aren't. The thing is that relational databases are perfectly good for most things, and much more efficent.

For example, there's no reason why you can't represent data stored in XML with a relational database. In fact, it probably will work much better, since XML is a pain to search or index.

Now, it would be really wonderful if every database could dump the tables in XML format. If we could come up with an universal way of doing that, you could easily write a parser that'd take a dump from Oracle and load it into MySQL, ignoring the parts MySQL doesn't understand. Or the reverse.

But XML itself as the database? Ew.
--
<@chani> I *cannot* remember names. but I did memorize 214 digits of pi once.
[ Parent ]

hur hur. (none / 1) (#13)
by Nursie on Wed Sep 15, 2004 at 01:04:36 PM EST

you said "take a dump"

Meta Sigs suck.

[ Parent ]
database serialisation (none / 1) (#17)
by reklaw on Wed Sep 15, 2004 at 04:52:06 PM EST

It's important to note that we're not talking about XML here. XML sucks. It's just one way of describing RDF, and a bad one at that.

The reason I like RDF is the potential for a distributed database. Imagine if amazon and allmusic both serialised their music data available as RDF. Assuming both sets of data are decently organised, tou could then cross-reference the two resources easily. Then perhaps you could cross-reference again with an RDF representation of the current charts. "artist x has an album, its name is y, and its RRP is z", says amazon. So you query your chart data -- what is the chart position of the album with the name of y by artist x? -- and you get your answer.

That was a slightly long-winded example, but you get the picture. The point is, import/export as XML in a standard form between databases would only get you so far. Having everyone's data available and cross-referenceable as RDF would be much more useful.
-
[ Parent ]

XML as a database.... (3.00 / 3) (#69)
by ckaminski on Thu Sep 16, 2004 at 10:29:53 PM EST

You are very correct.  An XML text file is a poor substitute for a database.  XML databases, however, are not.  A true XML database, like the Sonic XML Data server or the Xindice database from the apache group CAN provide quick queries where you simply specify:

http://webhost/url-to-script.xml?order/custid=customer/id[name='custname']

to return an XMl document that is built in realtime out of a database that contains the order history of a specified customer.

Even the big SQL vendors are implementing technology like this now.  The big benefit is when you leverage stuff like this with XSL.  Then your UI geeks create the perfect forms and html markup,  while your developers simply plug in the code to make it all work, without shitload of dynamic code that looks like this:

<%
code
%>

<html>

<%code%

<more html>
<%code%>

which sucks ass to maintain.

[ Parent ]

Could you give more details? (none / 0) (#88)
by vadim on Fri Sep 17, 2004 at 10:07:31 AM EST

I haven't had a change to play with XML databases yet, so what's the advantage over say, MS SQL Server or Postgres? Generating XML is trivial in most cases. For example, I wrote the XML::Maker Perl module (available on CPAN). It took just a few days and the code and usage is very simple.

So, could you give me a concrete example of what would one of those XML databases make easier to do?
--
<@chani> I *cannot* remember names. but I did memorize 214 digits of pi once.
[ Parent ]

Easy question... (none / 0) (#107)
by ckaminski on Tue Sep 28, 2004 at 11:57:08 AM EST

complicated answer...

As more and more relational databases add xpath/xquery functionality to their command/language set, the benefit really becomes difficult to justify.  But once upon a time, if all your data was XML, in that you had dynamic schema, non-obviously-relational data, the true hallmark of XML vs. relational, the XML database provides a secure system that allows you to perform XPath/Xquery operations without having to parse a text file (very expensive).

Dealing with XML isn't hard.  It's relatively trivial, especially with the great inventions like expat and libxml.  Making XML fast enough for million operations/day internet commerce required database software that could create fast indexes, search optimization, and enforced security on document nodes.

Similar to the C++ Object Oriented DBMS debate, where the main benefit comes from assured speed (with a particular C++ OODBMS I know of, on particular hardware, we can guarantee system response of sub 5ms for operations, which was critical to winning big telecomms contracts).  XML databases are another tool.  If you find that your data is predominantly XML, then an XML database like Sonics, or the apache groups is better than parsing through large text files.

If you're data is relational, then converting it to XML is trivial, you already have all the benefits of the XML database in fast indexing, sorting, searching, etc.  It's when you want to run an xquery or xpath search against a 1 MB XML document that XML databases come to the fore.

But I digress, because SQL Server, Oracle, DB2, even Postgres are all gaining XML parser support as part of the language set.  The major problem with the XML support of these servers though is indexes.  XPath information is not generally kept as part of the indexing, so parsing a blob, while not as bad as parsing a large document, is still more expensive than a system designed for tree traversal.

[Disclaimer: I worked for Sonic when it was still eXcelon Corporation, so I have plenty of experience with these products.  Even so, relational is still usually better.]

[ Parent ]

Relational better than alternatives (none / 0) (#104)
by ocrow on Sun Sep 19, 2004 at 11:54:43 PM EST

I think it's a far better way of doing databases than people generally use now.

I think that you are saying that an XML based database system would be better than a relational database system. This is, I believe, incorrect.

The problem with XML, is that while it provides a good way to mark-up text, it is not sufficient for data management. There are not good ways to specify constraints on the data stored in an XML database. There are not flexible ways to query the content of the database.

Whatever methods do exist have not been proven to be complete or sufficient, for all of the types of queries and constraints that may be necessary.

For relational databases these things have been proven, mathemetically. This is why we say that relational databases have mathematical underpinning. An accurate mathematical model of the relational database was developed by EF Codd, and has been proven to provide correct answers, and not to introduce inconsistencies in the data. If you care about the meaning of your data, this is key.

Once you understand how a relational database models the world and the value of flexible queries and constraints, it is very difficult to come up with any alternative scheme that is equivalently powerful.

For a little more exposition of these ideas see
http://www.dbazine.com/pascal20.shtml
http://www.dbazine.com/pascal8.html

[ Parent ]

the big difference (3.00 / 5) (#6)
by clover_kicker on Wed Sep 15, 2004 at 09:35:29 AM EST

> The big difference between XHTML1.0 and HTML4.01 still misses me.

It is much easier to parse XHTML1.0.
--
I am the very model of a K5 personality.
I intersperse obscenity with tedious banality.

[ Parent ]

XML is bad as a database (3.00 / 3) (#7)
by vadim on Wed Sep 15, 2004 at 09:58:36 AM EST

XML is very useful however for sharing information. That's what it's good for. The simplest way of making a program that outputs some kind of data is to come up with your own crappy format. That kind of stuff inevitably ends breaking when a field is too long, or when you need to add a new one, or when there happens to be an extra space somewhere...

XML is just a handy invention that helps you store data in such a way that it can be extended and parsed easily. It doesn't, IMHO offer any advantage over a relational database if that's the kind of thing you're looking for, though.
--
<@chani> I *cannot* remember names. but I did memorize 214 digits of pi once.
[ Parent ]

well that's fine (none / 1) (#9)
by the77x42 on Wed Sep 15, 2004 at 11:56:21 AM EST

but i've never seen it being used as such. using xml to store preferences or get simple data from the internet specific to programs (web updates, motd, etc.) can me done with databases probably much faster. xml just bypasses the need for separate database software. for sharing data between applications though... come on, no self-respecting capitalist software company is going to make their data that portable.


"We're not here to educate. We're here to point and laugh." - creature
"You have some pretty stupid ideas." - indubitable ‮

[ Parent ]
That's got little to do with databases (3.00 / 4) (#12)
by vadim on Wed Sep 15, 2004 at 12:49:05 PM EST

Yeah, you could use Oracle to store an IRC config file, but it'd be *way* overkill. People use XML because they don't need to come up with some format of their own, and can easily add new features while making it trivial to read an old config file. That's the kind of thing XML is good for.

Now, database tasks, such as storing a few hundred thousand of order rows, then determine what client bought most stuff from you is a task for a proper database. It can be done in XML, but it's a pain. Even MySQL will deal much better with it.

To say it another way, XML is great for serializing data. Stuff you're going to write as a big block, and then somebody will read it from top to bottom. This is great for config files, office documents, and network protocols like say Jabber. XML is precisely good for what "no self-respecting capitalist software company is going to make". It's quite mistaken too, IMHO, there are many ways of earning money without locking your customers into your way of doing things.

On the other hand for tasks like storing large amounts of data, changing it, finding things in it, etc, XML is almost always a very bad idea.
--
<@chani> I *cannot* remember names. but I did memorize 214 digits of pi once.
[ Parent ]

Hm. (3.00 / 4) (#21)
by regeya on Wed Sep 15, 2004 at 09:27:47 PM EST

This is about the time you, Carnage4Life, and every other XML zealot jumps on me, because I use XML at work to store contact and inventory information. Now, to be fair, I use SAX and Python to manipulate the data (I actually dump the data into dictionaries) but that's another story.

"XML isn't for database-type operations!" Yeah, but it makes for a nice, convenient container for flexible data. You can do the database-style stuff in your program. Anyone who tells you you have to use tech such as XPath, and then clarifies by saying that XML should not must not be used in such a way deserves to be shot, because they're not creative enough to be good programmers and they need to get out of the way.

[ yokelpunk | kuro5hin diary ]
[ Parent ]

Hear-hear. (none / 0) (#24)
by toulouse on Wed Sep 15, 2004 at 11:39:07 PM EST

XPath and XSL are over-blown declarative mind-structs whose only purpose is to consolidate the ivory-tower tag-soup brigade's grip over network data manipulation. There's nothing you can do with either of them that can't be done (often faster) with old-fashioned imperative programming.


--
'My god...it's full of blogs.' - ktakki
--


[ Parent ]
No problem with that, really (none / 0) (#29)
by vadim on Thu Sep 16, 2004 at 04:24:14 AM EST

There's a difference between what you can do, and what you *should* do. If you want to store your data in plain text files, you certainly can do that.

Now, what I'm saying is that I consider XML to be quite clumsy and inadequate for such a task, especially when compared with a good relational database.

Of course, perhaps you don't have that much data, or do something clever with it. Out of curiosity, what do you do with it, and how much do you have?

Here I have about 150K rows of order data, and I'm pretty sure that if it was stored in XML it'd take an awful long time to do common operations on it, like finding which still haven't been served. Sure I could work around that, but then I'd just be writing my own database, and there are perfectly good ones available already.
--
<@chani> I *cannot* remember names. but I did memorize 214 digits of pi once.
[ Parent ]

XML is magic pixie dust (3.00 / 5) (#23)
by jolly st nick on Wed Sep 15, 2004 at 10:57:52 PM EST

XML is a very system for representing document contents for itnerchange, transformation and archiving. Aside from that, there's a lot of overuse of XML for no particular reason other than that it is magic picxie dust, which leaves people scratching their heads and trying to come up with analogies like this. It's a waste of time, because the simple answer is that it's fine in its own domain, but it is way overused.

Java programmers have a lot to answer for when it comes to the overuse of XML, since they have as group (in my experience at least) a rather morbid fondness for complexity. I once gave a Java programmer working for me a simple "parse this and put it in a database" problem and he came back with a JAXB solution that autogenerated code that serialized the XML into beans. After figuring out how to do that and creating the Ant scripts necessary, he then had to write a program which walked to tree of beans that essentialy structurally mirrored the SAX document handler I expected him to write in the first place. He managed to stretch a two or three hour programming job into a week, and came up with a solution that was painfully slow and sucked memory like nobody's business.

In most cases I've seen XML used for outside of representing docuemnts or standardized data interchange, there are better solutions. For transferriong data structures over a network, ASN.1 is pretty isomorphic to XML in its expresiveness but much less bloated lexically. Many other places where XML is used can be replaced with simple property files or simple purpose built parsers for special purpose languages.

[ Parent ]

XML is a syntax (none / 1) (#30)
by pieroxy on Thu Sep 16, 2004 at 05:30:57 AM EST

and nothing else. It is a syntax to represent structured hierarchical data.

Anything else is not XML.

[ Parent ]

well here's my use of xml (none / 0) (#81)
by the77x42 on Fri Sep 17, 2004 at 01:57:52 AM EST

i have a simple xml file that lists all my current applications in development and then the latest version number as well as a download url.

an application will download the xml file, find itself using xpaths, and the check its latest version number.

now for simple little programs, this works really well because the MS XML Parser can go onto the internet and retreive the xml file in only one line of code. everything else is then trivial.

i honestly can't think of any other use of xml. maybe if i were transmitting some sort of accounting data to another application... hrm.. i dunno. it's all very confusing and seems unnecessary. i mean, come on, dreamweaver's preference files are in xml, what's up with that?


"We're not here to educate. We're here to point and laugh." - creature
"You have some pretty stupid ideas." - indubitable ‮

[ Parent ]

Config files (none / 0) (#83)
by Nursie on Fri Sep 17, 2004 at 03:00:39 AM EST

We used XML as a good way of organising heirachical data in a form readable by both machine and human. For that, XML rocked over other flat file formats (ini perhaps...) because we had a parser that returned the elements we wanted, and it was easily extendable and could even be validated with relative ease.

A database wasn't really warranted because speed of reading was not too much of an issue, reading was infrequent, and it's easier to open up a text/xml file in notepad to edit it.

Meta Sigs suck.

[ Parent ]
It's good for markup. (none / 1) (#85)
by ubernostrum on Fri Sep 17, 2004 at 03:32:36 AM EST

i honestly can't think of any other use of xml. maybe if i were transmitting some sort of accounting data to another application... hrm.. i dunno. it's all very confusing and seems unnecessary.

XML is for markup. That's what the 'M' stands for. It's descended from SGML, which was designed to provide a standard, generalized way to create markup languages (you get three guesses what 'SGML' stands for), and is very easy to write but devilishly hard to parse. And in addition to being tough to parse SGML has lax error handling, which makes it easy to create an incorrect SGML document without noticing (run the average web page through a validator sometime for an example; HTML, after all, is an SGML-based language).

XML attempted to solve the parsing problems by doing away with a lot of SGML's quirks; in SGML it's possible to leave out large portions of markup and let the parser infer them, but in XML all markup must be explicitly present. This feature is chiefly responsible for XML's oft-lamented verbosity. XML came at the error-handling problem by basically doing away with error handling: all errors are fatal errors in XML documents.

As a result, XML is a system for designing markup languages which is easier to parse and much stricter than SGML. This obviously has some benefits, but the ultimate value rests in the user: XML is just a tool, and you are free to create smart, useful markup languages with it or to create stupid, useless markup languages.

Additionally, XML offers somewhat easier ways to create the markup languages in question: where SGML required the construction of ugly, hard-to-understand DTDs, XML can use a variety of systems including very friendly ones like Relax NG.

And for a real-world example of XML being useful, this article, and pretty much everything else I write, was originally produced in DocBook XML, an XML-based markup language.




--
You cooin' with my bird?
[ Parent ]
XHTML (none / 0) (#44)
by interiot on Thu Sep 16, 2004 at 04:37:06 PM EST

One big thing XHTML allows is the co-mingling of HTML data with any other XML data, in the same document. For example, MSWord documents reasonably allow mathematical formulas and vector drawings to be included alongside text, so authors can use all three together to convey certain types of information better. Now HTML can do this too.

[ Parent ]
The problem with (none / 0) (#70)
by toulouse on Thu Sep 16, 2004 at 10:42:37 PM EST

that is that it requires your XHTML pages to be delivered as mime-type application/xhtml+xml, application/xml, or even text/xml. The problem is that IE (you know? - that browser with most of the market on board) doesn't recognize the 'proper' mime type (application/xhtml+xml), so people have come up with numerous hacks to circumvent this issue.

The most common hack is to deliver the content as text/html (which XHTML 1.0 permits), but this leads to a crucial difference: As far as user agents are concerned, XHTML 1.0 delivered as text/html is not XML, it's just XML-compliant HTML, and is treated as such (i.e. a variation of HTML 4.01), so you get little material benefit from using XHTML to begin with (apart from improved 'scrape-ability') - you certainly can't start embedding non-HTML XML within it.

The other hack is to deliver it as application/xml and include an XSL stylesheet to con IE's internal parser into performing XML translations, but note that the transformed document is still essentially text/html, so no XML-imbued advantage comes of this, either.

The third hack is to determine server-side what the browser supports through content negotiation and send the right mime-type accordingly, auto adjusting the document and headers as you go, but this is little better than the sniff-for-browser antics of the late 90's.

In the short term, XHTML is essentially broken, and the IE developers have pointed out that they're not ready to support it any time soon (see this thread, or this article for the quick synopsis). Their attitude is overwhelmingly that XHTML is a frankenstein, neither one thing nor the other, and that if you want browser-documents you should use HTML, or if wou want machine-parsability of semantic content you should use XML and use XSL transforms. Sure, you might be future-proofing your website, but in the near-term XHTML is essentially useless unless you decide to stop supporting IE (brave move).


--
'My god...it's full of blogs.' - ktakki
--


[ Parent ]
Fourth option: (none / 0) (#71)
by ubernostrum on Thu Sep 16, 2004 at 10:48:50 PM EST

Write XHTML which follows the HTML Compatibility Guidelines in Appendix C of the XHTML 1.0 spec, and serve as application/xhtml+xml to user-agents which indicate they can accept it, and text/html to browsers which indicate they cannot.




--
You cooin' with my bird?
[ Parent ]
Uhmmm - see 'third hack' in above comment? (none / 0) (#73)
by toulouse on Thu Sep 16, 2004 at 10:58:59 PM EST

The problem here, though, is that it renders the (very useful, I whole-heartedly agree) idea completely pointless.

For example: You can't just sent an application/xhtml+xml document with embedded MathML / SVG / whatever; you've first got to establish that the agent supports it, and send either a gracefully-degraded version (libgd-based images or something), or a "Sorry, but your browser doesn't support..." page.

The second option doesn't really cut it when the "browser which doesn't support..." is the one with at least 85% market share, and the first option doesn't really cut it because, if you're going to that anyway, you may as well send it to all browsers and not bother with XMLitude at all.


--
'My god...it's full of blogs.' - ktakki
--


[ Parent ]
You said something different. (none / 0) (#77)
by ubernostrum on Fri Sep 17, 2004 at 12:00:22 AM EST

You said:

The third hack is to determine server-side what the browser supports through content negotiation and send the right mime-type accordingly, auto adjusting the document and headers as you go, but this is little better than the sniff-for-browser antics of the late 90's.

What I described was not "adjusting the document"; the document is the same one you would have served to a browser which groks the XHTML MIME-type.

Also, this is better than the sniff-for-browser antics; it relies on the HTTP Accept header rather than sniffing the user-agent string or testing for document.all vs. document.layers or what have you. It's how content negotiation is meant to be.

And, frankly, nobody really supports embedding of other namespaces in XHTML documents yet. Yeah, Mozilla will render MathML (and there's an IE plugin which will do it, too), but MathML seems to have been developed less for use by mathematicians (who, in my experience, use LaTeX) and more for the sake of having an example of something embeddable in XHTML.




--
You cooin' with my bird?
[ Parent ]
Ermmm 2. (none / 0) (#78)
by toulouse on Fri Sep 17, 2004 at 12:27:37 AM EST

Also, this is better than the sniff-for-browser antics; it relies on the HTTP Accept header rather than sniffing the user-agent string or testing for document.all vs. document.layers or what have you. It's how content negotiation is meant to be.

What exactly did you think I meant by "content negotiation"? Determining browser ability from the Accept header is what content negotiation is, as you clearly appear to know. Weird...

You have to adjust the document in the instances where embedded XML content is being used, otherwise the browser won't treat it correctly. Adjusting the mime headers is necessary as we've already discussed. Kindly desist from putting words in my mouth because you failed to comprehend the point at the first time of asking.

I notice you can't help ducking the main issue, however. Given what has been mentioned; what is the practical point of using XHTML in the current climate? And how does it have an advantage over transforming base XML into different formats (HTML and RDF for example)? Do me the kindness of not simply pointing at an article by some hand-waving theocrat; I'm interested in practicalities here.


--
'My god...it's full of blogs.' - ktakki
--


[ Parent ]
Honestly? (none / 1) (#79)
by ubernostrum on Fri Sep 17, 2004 at 12:42:36 AM EST

The only advantage to using XHTML right now is that if, in the future, everybody goes to XML-based stuff, you'll be ready. Other than that, I recommend sticking with HTML 4.01.

Nobody uses the embeddability right now, so namespaces aren't an advantage. SGML tools can perform just as many transforms as XML tools can, so XSLT isn' an advantage. Hardly anybody supports well-formedness constraints, so strict parsing isn't an advantage. Nobody at all supports validating, so enforced validity isn't an advantage. Etc.

XHTML is supposedly "the future," though.




--
You cooin' with my bird?
[ Parent ]
In other news: Apples are the poor man's orange nt (none / 0) (#91)
by SoupIsGoodFood on Fri Sep 17, 2004 at 01:36:18 PM EST



[ Parent ]
in other news (2.50 / 4) (#4)
by forgotten on Wed Sep 15, 2004 at 06:18:35 AM EST

C++, Java, Smalltalk are also unnecessary, because objects can all be done in assembler.

Its all about making it simpler, man!

--

This does look simpler to me (nt) (none / 0) (#5)
by cburke on Wed Sep 15, 2004 at 08:57:02 AM EST



[ Parent ]
never trust example code -nt (none / 0) (#18)
by forgotten on Wed Sep 15, 2004 at 05:01:15 PM EST


--

[ Parent ]

no (none / 1) (#10)
by the77x42 on Wed Sep 15, 2004 at 11:59:51 AM EST

if it was about making it simplier, you wouldn't have 3 languages to solve one problem.


"We're not here to educate. We're here to point and laugh." - creature
"You have some pretty stupid ideas." - indubitable ‮

[ Parent ]
there is only one problem? -nt (none / 1) (#19)
by forgotten on Wed Sep 15, 2004 at 05:02:17 PM EST


--

[ Parent ]

Your Argument Makes A Counter-Point (2.87 / 8) (#14)
by DLWormwood on Wed Sep 15, 2004 at 01:18:17 PM EST

You argue that by using "forgotten" HTML concepts like profile and meta tags and whatnot that we can get better metadata today. The fact that these tags are part of HTML should tell you that your ideas have already been though of, so you gain a point there.

But the fact these features aren't used now should tell you even more. The sad fact of things is that humans have a very hard time with "meta" anything: from metaphysics to metadata. This is sometimes known as the "card catalog" problem, since librarians had early exposure to this issue. A person, when asked to describe how to categorize or describe a web page, will focus on some things and leave out others. Political views, area of expertise, life experience, and if he/she is "selling something" will bias the meaning of the data that is preserved or emphasized as well as introduce bogus or erronious information at times. Without dictatorial control, any collective metadata system devolves into anarchy and uselessness.
--
Those who complain about affect & effect on k5 should be disemvoweled

Not necessarily. (3.00 / 2) (#20)
by ubernostrum on Wed Sep 15, 2004 at 09:19:02 PM EST

But the fact these features aren't used now should tell you even more.

Or maybe they're just buried in a fairly dense technical specification which few people ever bother reading. There are plenty of interesting and extremely useful features in HTML which don't see much use outside of a few markup wankers and uber-geeks who've actually read the specs.

The sad fact of things is that humans have a very hard time with "meta" anything: from metaphysics to metadata.

One of the reasons for this is likely the fact that attempts at metadata systems are (nearly always) all-or-nothing. If you try to create a scheme for organizing, classifying and tagging everything in existence (and usually also plenty of things which might exist but don't yet), then it's going to be too unwieldy to be of any practical use.

This is sometimes known as the "card catalog" problem, since librarians had early exposure to this issue. A person, when asked to describe how to categorize or describe a web page, will focus on some things and leave out others.

Is metadata the answer to all the world's problems? No, and I didn't say it was; there are very good reasons for thinking that the kinds of comprehensive system-of-the-world ideas behind things like RDF and OWL are just pipe dreams. But I'm not advocating the Semantic Web, and I'm not talking about RDF or OWL, yet you're bringing up arguments which really are prroperly aimed at them. This article is simply about the already-existing ability of HTML to easily include and use a lot of useful and interesting metadata.

Political views, area of expertise, life experience, and if he/she is "selling something" will bias the meaning of the data that is preserved or emphasized as well as introduce bogus or erronious information at times. Without dictatorial control, any collective metadata system devolves into anarchy and uselessness.

I don't feel any great need to rebut that, but I feel I should point out that you gave yourself away in the last sentence.




--
You cooin' with my bird?
[ Parent ]
Metadata's a Problem In General (none / 1) (#34)
by DLWormwood on Thu Sep 16, 2004 at 11:47:26 AM EST

Is metadata the answer to all the world's problems? No, and I didn't say it was; there are very good reasons for thinking that the kinds of comprehensive system-of-the-world ideas behind things like RDF and OWL are just pipe dreams. But I'm not advocating the Semantic Web, and I'm not talking about RDF or OWL, yet you're bringing up arguments which really are prroperly aimed at them.

But some of those arguments have to be kept in mind when trying to implement any metadata system, even something as open-ended as yours. It's only going to be as useful as it is deployed. Metadata is a pain in the butt for most people to create in the first place.

It's sort of like commenting source code. Yes, it's very important. Yes, it's important to have a standard way of doing it. No, most of us programmers suck at writing them.

I guess what my point is is that inventing something just for metadata will always fail. Metadata is, and will always be, a form of ad-hoc information that will always be created at the last minute using the pre-existing tools at hand on a case-by-case basis. Metadata functionality is always part of "value added" work, never part of a lasting foundation.
--
Those who complain about affect & effect on k5 should be disemvoweled
[ Parent ]

Please see my top-level comment above. (nt) (none / 0) (#62)
by ubernostrum on Thu Sep 16, 2004 at 08:46:55 PM EST




--
You cooin' with my bird?
[ Parent ]
Uhmm.... (none / 0) (#90)
by DLWormwood on Fri Sep 17, 2004 at 01:03:43 PM EST

...didn't I quote your top-level comment?
--
Those who complain about affect & effect on k5 should be disemvoweled
[ Parent ]
No. (none / 0) (#99)
by ubernostrum on Fri Sep 17, 2004 at 08:45:26 PM EST

By "top-level" I mean "started its own thread." Specifically I meant this comment.




--
You cooin' with my bird?
[ Parent ]
urm (2.50 / 2) (#25)
by circletimessquare on Wed Sep 15, 2004 at 11:53:53 PM EST

what a great way to phish for info, googlebomb your rankings, or otherwise wreak havoc on the web

metadata is too open to abuse


The tigers of wrath are wiser than the horses of instruction.

Meh. (3.00 / 3) (#26)
by ubernostrum on Wed Sep 15, 2004 at 11:57:04 PM EST

Phishing, Googlebombing and general mayhem are already taking place, and I don't foresee this making it much worse.




--
You cooin' with my bird?
[ Parent ]
ok, i hear you (2.66 / 3) (#27)
by circletimessquare on Thu Sep 16, 2004 at 12:05:17 AM EST

but metadata is just too wide open for abuse, and therefore undependable and unimplementable

for better or for worse, the biggest problems facing the web today is not the goals of the metadata appplication this article implies, but problems of security, authentication, reputation, and relevancy

so it's all good, just kind of off the mark in terms of what web providers and users need right now, it doesn't solve any problems of import

so while you say the problems i point out with the metadata potential use are already underway, i respond to you in inverse: this metadata application doesn't solve anything problems we have now, so there is no point


The tigers of wrath are wiser than the horses of instruction.

[ Parent ]

Bah. (none / 1) (#28)
by ubernostrum on Thu Sep 16, 2004 at 01:14:21 AM EST

Far too many interesting and useful things didn't solve existing problems at the time of their creation; you're starting to sound like kitten with his "solutions in search of problems" rants.

Also, metadata can help to solve a relevant problem: how can we quickly find the information we want?




--
You cooin' with my bird?
[ Parent ]
the answer is google... (none / 0) (#31)
by circletimessquare on Thu Sep 16, 2004 at 05:52:49 AM EST

for which metadata is ignored, because it's a frequent source of abuse

enough said

and yes, i agree, it is important to research new things sheerly for the sake of pure research, because you never know what amazing unknown things you may discover that might change the world, not just because of some goal or implementation in mind: it is lacking imagination to not try new things

i understand that concept very well, thank you very much

however, you're confusing particle physics with html ;-P


The tigers of wrath are wiser than the horses of instruction.

[ Parent ]

why metadata is useful (none / 0) (#32)
by phred on Thu Sep 16, 2004 at 08:19:11 AM EST

Its as the article describes, its data about data. To say that it can be abused is just silly, otherwise you could extend it to say that regular data could be abused. Heck, any data can be abusive, look at spam traps on the web, the "infinite links in a webpage thing" for instance, thats abusive html produced by a cgi program, and I bet that doesn't bog google down if it happens to try spidering it.

Any time you use any data from untrusted parties, you program accordingly.

[ Parent ]

Metadata is much more vulnerable (none / 1) (#33)
by vadim on Thu Sep 16, 2004 at 11:43:17 AM EST

For example, suppose I make a decent but uninteresting for 99% of the people web site about how to upgrade the RAM on an ancient laptop, or something of that kind.

Now, in the keywords I specify something like this: "sex, porn, videos", etc. Metadata is usually invisible to users, so most of them won't have any idea of why my page comes up when looking for porn.

On the other hand, data is harder to falsify. Including stuff about porn in a site about laptops would look really weird. It's harder to make a site appear in an unrelated search if you can only change the data shown to the user.

Same happens in the real world. I can easily make a cover for a book that makes it look like a pornographic novel, but it's much harder to do that with the content.
--
<@chani> I *cannot* remember names. but I did memorize 214 digits of pi once.
[ Parent ]

are you saying (none / 0) (#38)
by phred on Thu Sep 16, 2004 at 12:59:51 PM EST

that metadata is more dangerous because no human reads it?

[ Parent ]
yes (3.00 / 3) (#40)
by circletimessquare on Thu Sep 16, 2004 at 01:07:41 PM EST

the disconnect between what the metadata says the content is and what the content really is, is rife for abuse

so it can't be depended upon on an open internet, and should b e ignored and discarded, which is exactly what google does with metadata, and rightly so, for the purposes of building a more semantic, more honest web

in short: metadata on a completely open system is not useful or to be trusted

The tigers of wrath are wiser than the horses of instruction.

[ Parent ]

still doesn't make sense (none / 0) (#89)
by phred on Fri Sep 17, 2004 at 12:34:40 PM EST

What happens if the regular data itself is bogus? Whats the difference?

[ Parent ]
it's shown to the user (none / 0) (#94)
by speek on Fri Sep 17, 2004 at 08:04:08 PM EST

The user can use his/her brain to determine if the data is bogus. With metadata, however, the user never sees it and what the browsers and search engines do as a result of the metadata is not an open process for the user to see.

For instance, let's say you set up your browser security to trust a particular site. The site has a link to an affiliated site, and the text next to the link says, "if you trust us, you can trust them". You decide for yourself whether or not that's true. Using metadata, your trust of the original website could be interpreted by a browser, which sees <a rel="trust"... and updates it's own security settings.

There are examples of metadata that are not abusable in this way, but all those examples involve modifying the information displayable to the user in some way. Ie, rel="friend" in a link might show the friend info when you right click on the link, but it's not going to mean anything to the browser except that, here's more data to display (one way or another) to the user.

To summarize: metadata that is essentially a stylesheet type thing, good. Metadata that is essentially a browser/search engine scripting language, bad.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

bogus argument (none / 0) (#105)
by phred on Mon Sep 20, 2004 at 08:23:11 AM EST

For example using your same logic, suppose a user runs every executable file that his browser discovers?

[ Parent ]
and? (none / 0) (#106)
by speek on Mon Sep 20, 2004 at 04:15:23 PM EST


--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

Bah, humbug. (none / 0) (#45)
by Meshigene Ferd on Thu Sep 16, 2004 at 04:45:55 PM EST

This is HTML we're talking about, rigt? It's dead easy to bury "data" in HTML — visible to search engines but much less so to humans. A line of javascript or css is probably enough.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

It's harder, though (none / 0) (#49)
by vadim on Thu Sep 16, 2004 at 05:21:26 PM EST

And there's a fairly simple (for a search engine) workaround for it. Take a HTML engine, patch it to output a plain text version of what the user would see with JavaScript disabled, and index that. Sure it's not ideal, but at least something can be done.

Then there's that most people don't bother with cheating. When metadata was actually used, everybody cheated in one way or another because people knew your position depended on your keywords, so people had a whole page of them, trying to find everything remotely relevant.
--
<@chani> I *cannot* remember names. but I did memorize 214 digits of pi once.
[ Parent ]

Nay. (none / 1) (#51)
by Meshigene Ferd on Thu Sep 16, 2004 at 06:09:34 PM EST

You have to index what the user would see with js enabled. This is far from trivial.

Of course now everyone uses google and this way of cheating no longer works, metadata or data.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

It works because Google ignores metadata (none / 0) (#53)
by vadim on Thu Sep 16, 2004 at 06:21:43 PM EST

Since metadata is no longer used you don't have to bother specifying it. So there's only a few people who bother to try to fool Google instead of how it used to be when keywords worked
--
<@chani> I *cannot* remember names. but I did memorize 214 digits of pi once.
[ Parent ]
You can't easily fool google with keywords. (none / 0) (#60)
by Meshigene Ferd on Thu Sep 16, 2004 at 08:34:53 PM EST

In metadata or data. You can make your site show up where it normally wouldn't, but you can't make it show up high. You have to mess with pagerank somehow.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

The problem is bigger than JavaScript. (none / 0) (#86)
by ubernostrum on Fri Sep 17, 2004 at 03:44:12 AM EST

Take a HTML engine, patch it to output a plain text version of what the user would see with JavaScript disabled, and index that.

Suppose I put my "xxx hot hardcore sex" keywords in a paragraph with white text on a white background; you won't see them, but Google is blind and will not notice the colors. It will notice the text. Or suppose I use CSS to position that paragraph 20000 pixels left of the edge of the page and hide the overflow? You won't see the keywords, but Google will.




--
You cooin' with my bird?
[ Parent ]
the internet is open to abuse /nt (none / 1) (#37)
by Xcyther on Thu Sep 16, 2004 at 12:15:39 PM EST



_________________________________________
"Insydious" -- It's not as bad as you think

[ Parent ]
Please don't let this get FP...... (2.80 / 5) (#36)
by Nursie on Thu Sep 16, 2004 at 12:06:41 PM EST

Metadata was useful before people started trying to sell stuff online much. Then advertising/marketing droids started to abuse it for their own purposes and now it can probably never be useful again.

Unless there is some way of auto generating and enforcing metadata from the data (in which case it is useless anyway as it will be obvious) then it will be used to make things turn up in searches they shouldn't turn up in and places it shouldn't be.

Basically -1: Unrealistic. Relies on Humans not abusing it for profit.

Meta Sigs suck.

Maybe we're solving the wrong problem. (none / 1) (#59)
by handslikesnakes on Thu Sep 16, 2004 at 08:21:53 PM EST

The only real incentive to give false metadata is to get a few more hits on your banner ads, right? Maybe I'm atypical, but if a site turns up in a search for something completely unrelated I'm heading for the back button, not my credit card.

If companies that advertise on the web could be convinced that paying for ads viewed by people who are completely uninterested in the subject is a waste then maybe they could use their leverage to discourage false metadata.



[ Parent ]
You'd think so wouldn't you? (3.00 / 3) (#64)
by Nursie on Thu Sep 16, 2004 at 09:01:56 PM EST

But you'd think that companies would realise pissing people off by flooding their email accounts with spam wouldn't be a useful endeavor. But look at the reality, it's going up and up and up......

The problem is that if they can fool people to even see the banner add then they will get a certain (tiny) amount of clickthrough, and that makes it worth it for them.

Meta Sigs suck.

[ Parent ]
You're right (none / 1) (#68)
by toulouse on Thu Sep 16, 2004 at 10:10:41 PM EST

I remeber reading a breakdown on the statistics involved (sorry - no link to hand). The key point was that the margins on this stuff are awfully slim; something like the difference between a 0.6% clickthrough rate and a 0.4% clickthrough rate makes the difference between profitability and loss.

All the abusers / marketers have to do is keep the rate above the break-even point to make it worthwhile. People like you, me, handslikesnakes may ignore them and take our beady eyes elsewhere, but as far as market potential goes, we're essentially an irrelevant demographic minority.


--
'My god...it's full of blogs.' - ktakki
--


[ Parent ]
-1, fails to drop enough buzzwords (1.66 / 6) (#39)
by I Am Jacks Severed Testicles on Thu Sep 16, 2004 at 01:04:56 PM EST



Support our troops - buy W Ketchup!
You only think that... (3.00 / 2) (#65)
by ubernostrum on Thu Sep 16, 2004 at 09:06:57 PM EST

But if you view the source of this page, you'll see otherwise…

<meta name="keywords" content="Java, JavaBeans, J2EE, XML, Semantic Web, ROI, Enterprise-grade, B2B, xxx, sex, hardcore sex, porn, hot sex, Syndication, Metadata, Technology, Culture, Internet, Web, XHTML, Interoperability, Interchangeability">

Something for everybody in there. Long live metadata.




--
You cooin' with my bird?
[ Parent ]
won't scale (1.50 / 6) (#41)
by Black Belt Jones on Thu Sep 16, 2004 at 03:14:31 PM EST



<Meta> Tags as categories (none / 0) (#42)
by Viagra on Thu Sep 16, 2004 at 03:41:30 PM EST

Here's the real problem with Meta tags -- people. Trying to categorize things like this from the ground up never works because there are too many people who think of their own work as "other", many times without bothering to look at the list of accepted categories (which is usually too long anyway).
BRV Viagra
Thanks for not reading the article. (nt) (none / 0) (#98)
by ubernostrum on Fri Sep 17, 2004 at 08:41:38 PM EST




--
You cooin' with my bird?
[ Parent ]
Say What You Will About Metadata... (2.66 / 3) (#43)
by CheeseburgerBrown on Thu Sep 16, 2004 at 04:35:39 PM EST

...This is the simplest and downright cleanest implementation I've yet heard proposed. I like it.

Thou rockest, ubernostrum.


___
If you can read this signature clearly, you are sitting too close to your monitor.
agreed (none / 1) (#46)
by transient0 on Thu Sep 16, 2004 at 05:01:29 PM EST

i got pulled onto a government project last year and it was RDF this and Dublin Core that out the wazoo. you should have seen the buzzwords flying.

i really wanted to say: "why can't the metadata just be included the HTML document in a reasonable and intuitive way?"

if i had had this implementation to point at then, things would have been a lot simpler. Of course none of it is really useful until the language is standardized sufficiently to be actually machine readable (and this is where the Dublin Core people are actually doing some reasonable stuff), but this is at least practical to implement and doesn't rely one wide adoption of fringe technologies.
---------
lysergically yours
[ Parent ]

Heh. (none / 1) (#63)
by ubernostrum on Thu Sep 16, 2004 at 08:56:40 PM EST

HTML is actually full of useful stuff like this, if you're willing to spend some quality time with the spec... I remember about two years ago people started rediscovering the <cite> element and cite attribute, it was like a revelation...

And if you want ideas about clean implementations, I really recommend you take a look at the WHAT-WG; their whole purpose is basically to come up with stuff like this.




--
You cooin' with my bird?
[ Parent ]
What is so great about Metadata? (3.00 / 2) (#47)
by gmol on Thu Sep 16, 2004 at 05:03:53 PM EST

Hasn't Google proven that companies who drone on and on about ontologies/XML/semantic webs etc. will fall to the company who writes software that can reliably infer data sans metadata?

No-one can do it 100% reliably (none / 0) (#55)
by greenrd on Thu Sep 16, 2004 at 06:53:40 PM EST

Hasn't Google proven that companies who drone on and on about ontologies/XML/semantic webs etc. will fall to the company who writes software that can reliably infer data sans metadata?

There is no such company. Google can do so better than most, but not 100% reliably.


"Capitalism is the absurd belief that the worst of men, for the worst of reasons, will somehow work for the benefit of us all." -- John Maynard Keynes
[ Parent ]

that's ok ... (none / 0) (#72)
by gdanjo on Thu Sep 16, 2004 at 10:55:23 PM EST

... no data is 100% accurate either.

Dan ...
"Death - oh! fair and `guiling copesmate Death!
Be not a malais'd beggar; claim this bloody jester!"
-ToT
[ Parent ]

woah, it's greenrd (none / 0) (#101)
by mikpos on Sat Sep 18, 2004 at 12:08:35 PM EST

I had just assumed you had died or something.

[ Parent ]
Understand RDF before you discard it (none / 0) (#48)
by avdi on Thu Sep 16, 2004 at 05:15:06 PM EST

RDF is not a format. RDF and OWL are not alternative metadata formats. And none of these recommendations address the problems that RDF and OWL address, to wit: once you have all this nice juicy metadata, how do you draw equivalencies between disparate ontologies (like the Dublin Core and XFN standards mentioned)? And how do you derive useful inferences from the information once you've managed to merge it? You can encode metadata however you like. RDF/XML is just one way of many. What RDF and it's related technologies tackle is making it easier to work with and draw conclusions from that data.

--
Now leave us, and take your fish with you. - Faramir
Hi. (none / 0) (#56)
by ubernostrum on Thu Sep 16, 2004 at 08:10:31 PM EST

Read the footnotes; I readily admit that I'm giving a rough description of what RDF does, because explaining it in enough detail to satisfy a reader like you (for example, explaining the difference between RDF and RDF serialized as XML) would have been overkill for this article.

And yes, RDF is technically defined as a general-purpose format for the representation of information. But like it or not, RDF is the foundation technology of the Semantic Web's proposed metadata infrastructure. And I never claimed OWL was a metadata format; it's an ontology language, but as you point out it's perceived as necessary for the implementation of a semantic infrastructure.




--
You cooin' with my bird?
[ Parent ]
Human-created metadata is doomed. (3.00 / 4) (#50)
by ZorbaTHut on Thu Sep 16, 2004 at 05:54:43 PM EST

And it always has been. Humans don't *want* to create metadata by hand. Metadata's boring. Why write metadata when you could write data? Not only that, but humans writing metadata are, more often than not, wrong.

The only way to get metadata reliably is to set up a system where humans create data, and you extract metadata from that programatically.

I can't think of any exceptions to this, although I'd love to see some.

The problem (none / 0) (#52)
by minerboy on Thu Sep 16, 2004 at 06:17:28 PM EST

Is that currently machine generated metadata is worse than Human Generated Metadata. One solution is to severely limit the flexibility of the app's that you use create the data, so that you force a particular organization. Microsoft LRN for example. But it would be dificult to get people to start using these types of things -they're a real pain. Even if you do figure out how to do good natural language processing for metadata, you will still have issues down the road as language usage changes, and with domain specific language.

So in the end, I think you'll always needs humans to generate their admittedly flawed metadata.



[ Parent ]
I disagree (3.00 / 2) (#58)
by ZorbaTHut on Thu Sep 16, 2004 at 08:21:23 PM EST

Machine generated metadata is worse than correct, unspammed, carefully crafted human generated metadata, sometimes.

Note all the qualifiers.

If the humans make mistakes, the machine generated metadata is better.
If the humans are spamming, the machine generated metadata is better.
If the humans are just lazy or incompetent, the machine generated metadata is better.
And generally, if the amount of data that has to be looked at is large, the machine generated metadata is *still* better.

I've been looking at interest matching algorithms lately. Humans suck at matching up related interests. I'm on my second version of the algorithm, I have lots more ideas for improvements, and it's already generating clusters that make people say "Why did it put those together? That's completely . . . oh. Wait. That makes a lot of sense. Wow, that's a really good connection."

Humans are horrible at dealing with large amounts of similar data in any way. Let the computers do it. They're better at it.

[ Parent ]

I have yet to see (none / 1) (#75)
by minerboy on Thu Sep 16, 2004 at 11:43:56 PM EST

A program that could do even all the dublin core for a random webpage. If you want something beyond dublin core that requires analysis and interpretation, machines still don't have a chance.

The interest matching is interesting , but "interests" is a very unique kind of metadata. Compare this to the DC Description tag - You just can't do that one by machine very well



[ Parent ]
The real problem (3.00 / 4) (#54)
by Nursie on Thu Sep 16, 2004 at 06:32:51 PM EST

is that if machines can generate metadata, then why bother generating it?

An example to illustrate:
Jim has a web page, he runs meta-gen over it in order to add meta information into it. The meta data generated is concise and accurate.
GoogleSpider, or whatever google uses to collect web page info, comes along and reads the meta data in order to add Jim's page to the database.
Why can't google spider just generate the meta info itself?
It can.
If meta data can be machine generated then there is no need for it in the page, as it can be generated whenever necessary.

In fact googlespider is better off generating it's own because it can trust that. What if Jim's site actually deals in shelves and brackets but Jimn decided to put a few celeb names and sex-related meta keys in to get a few more hits?


Meta Sigs suck.

[ Parent ]
Agreed (2.50 / 2) (#57)
by ZorbaTHut on Thu Sep 16, 2004 at 08:18:27 PM EST

Let the people who want the metadata generate it. You can't trust provided metadata, so you're going to have to generate it anyway. Why bother even looking at the provided metadata?

[ Parent ]
To the detractors: (2.66 / 6) (#61)
by ubernostrum on Thu Sep 16, 2004 at 08:41:45 PM EST

If you have posted something to the effect that "metadata is doomed because it's too hard/people don't care/evil spammers will abuse it", please read this. If you're going to post something to that effect, please read this first, as I'd like to make a few points:

To "metadata is too hard": yes, right now it is. That's why I'm writing an article about easier ways to do it; you did catch the word "easy" in there about a zillion times, right? Now, I'll admit that it's pretty hard to come up with a standard, unanimous way to classify absolutely everything — my degree is in philosophy, so believe me when I say I know this — but that's not what this is about. This is about common, useful everyday metadata like "Alice is my friend," which shouldn't need a half-dozen ontology and information-representation markup syntaxes to explain.

In other words, this isn't a scheme for saying "this is what this page is about, and this is how it fits into the Taxonomy of Everything." This is a scheme for saying "this is how this page relates to that one over there", and "this is what this page thinks of that one", and so on and so forth. I would hope the difference is clear.

To "people don't care": sure they do. Look what happened when XFN went public; all of a sudden all those big blogging circle-jerks had a better, shinier, more detailed method of mutual masturbation, and people went crazy with it. In large part, good metadata practice is about knowing what your target audience wants. Which, really, is what any web designer/developer with his/her salt should be looking at to begin with.

To "evil spammers will abuse it": sure they will. Evil spammers will abuse any tool you give them. Right now there are people who use white text on a white background to invisibly sneak more "hardcore porn xxx sex hot sex hardcore sex xxx" keywords into their pages; should we give up on font and background colors, then? Of course not.

Now, I know as well as you do that most search engines ignore content in <meta> keyword tags (in fact, Im pretty sure I mentioned that in the article), but thankfully we're not talking about <meta> keywords; we're talking about actual useful things. Plus, I'd like to see a spammer abuse rel="unendorsed" to boost his PageRank. And even if someone does find a way to abuse it, no one gets screwed too hard if the idea has to be ditched, because this is just a few things sprinkled into ordinary HTML. On the other hand, if we move the whole web to an RDF/OWL/what-have-you based system and have to bail out due to abuse, we're really up a creek; the odd HTML attribute here and there can be ignored or removed, but getting rid of a core technology is something else.

Now, detract away.




--
You cooin' with my bird?
Re: unendorsed (none / 1) (#66)
by Nursie on Thu Sep 16, 2004 at 09:11:18 PM EST

Yes, it would be a good way to stop things you didn't endorse from getting a bit more pagerank because of your link. That's fine. But it doesn't stop people from not using it! In fact you would probably find that most businesses linking to each other would steer well clear of the "unendorsed" value as a simple courtesy to whoever they are linking to.

The Foaf thing is fine, but could happily be done in actual data rather than meta-data.

And you miss the point many of the detractors are making, people are bad at making meta-data, and it is open to abuse. Machine generated meta is pointless to include as it can as easily be generated by the viewer as the server.

Also, anything that helps bloggers should be stricken from the record and the creators eyes put out, IMHO!

Meta Sigs suck.

[ Parent ]
Missing the point. (none / 0) (#67)
by ubernostrum on Thu Sep 16, 2004 at 09:36:08 PM EST

And you miss the point many of the detractors are making, people are bad at making meta-data, and it is open to abuse.

Well, gee, I thought I just posted a comment which noted and responded to those points. But maybe I was wrong, so I'll try again.

To "people are bad at making metadata": you're still in the "metadata means the Taxonomy of Everything" mindset. The sorts of metadata I'm talking about don't require people to categorize, hierachicalize, subcategorize and so on; most people do suck at that. All this does is give people a simple way to say, in machine-readable format, the things their pages are already saying in human-readable format. For example, you could have a heading saying "Friends" and a list of links underneath, but your French friends are going to have a heading saying "Mes amis"; does a program which tries to catalogue this information need to be able to translate between all the languages on the web? Or should we use a tiny bit of metadata?

To "it's open to abuse": well, I really did respond to that. Please read my top-level comment again and tell me where you disagree with my reasoning.




--
You cooin' with my bird?
[ Parent ]
Ummm (none / 1) (#74)
by Nursie on Thu Sep 16, 2004 at 11:32:59 PM EST

All this does is give people a simple way to say, in machine-readable format, the things their pages are already saying in human-readable format.
Then it isn't meta-data surely? it's simply the same data in another format.....

Meta Sigs suck.

[ Parent ]
No. (none / 0) (#76)
by ubernostrum on Thu Sep 16, 2004 at 11:54:43 PM EST

It's machine-readable information about the information conveyed by the document.




--
You cooin' with my bird?
[ Parent ]
that's not what you said......... (none / 0) (#80)
by Nursie on Fri Sep 17, 2004 at 01:43:00 AM EST

But I'll let this one go.

Meta Sigs suck.

[ Parent ]
No, it is what I said. (none / 0) (#82)
by ubernostrum on Fri Sep 17, 2004 at 02:15:16 AM EST

You have a heading marked "Friends" and a list of links. You put rel="friend" on each. You have presented the relationship "friends", which is information about the links in the list (hence, data about data or "meta data") in two ways: one human-readable and one machine-readable.




--
You cooin' with my bird?
[ Parent ]
I disagree (none / 0) (#87)
by curien on Fri Sep 17, 2004 at 08:31:04 AM EST

In fact you would probably find that most businesses linking to each other would steer well clear of the "unendorsed" value as a simple courtesy to whoever they are linking to.

I don't think so. Have you noticed how often people put up disclaimers like, "These links go to other domains and their presence does not indicate our endorsement... click at your own risk" or something to that effect? (The US government, by law, must do it almost every time a .gov/.mil site links to a private site, for example.)

--
This sig is umop apisdn.
[ Parent ]

That's true (none / 0) (#93)
by Nursie on Fri Sep 17, 2004 at 02:38:12 PM EST

but I would think that friendly business practice would dictate that you help the other business get better rankings by leaving out the unendorsed thing. If they do the same for you then everyone wins. Maybe I'm wrong.

Definitely good for gov/military sites though.

Meta Sigs suck.

[ Parent ]
Not necessarily. (none / 0) (#97)
by ubernostrum on Fri Sep 17, 2004 at 08:40:50 PM EST

The ideal situation for a business would be charging to add rel="endorsed"




--
You cooin' with my bird?
[ Parent ]
Re: To the detractors: (none / 0) (#84)
by nml on Fri Sep 17, 2004 at 03:18:03 AM EST

Now, detract away.

thanks, i'll try my best ;o)

To "metadata is too hard": yes, right now it is.

actually, no its not. It doesn't really get much easier than the <meta> tags that have been available for ages.

thankfully we're not talking about <meta> keywords; we're talking about actual useful things

The <meta> tags do describe simple, useful things. Author, keywords, etc etc. If we can't get them right, why would something more complex work? You've made the same mistake as the semantic web people, and assumed that metadata isn't used because our existing mechanisms are somehow inadequate (in your case you've assumed that they're inadequately documented, the semantic web people typically assume that our existing mechansims aren't expressive enough). The existing metadata mechanisms work just fine, its the idea that people will create metadata with enough reliability to make using it worthwhile that is flawed. Metadata doesn't work simply because people who abuse metadata are virtually the only ones who are motivated to create it (them and the adept users). No-one else cares.

To "people don't care":
To "evil spammers will abuse it"

Both your suggestions here ignore the fact that there has to be a certain critical mass of correct metadata for it to be at all worthwhile to utilise it. Just because a few bloggers start using metadata, doesn't mean that 99% of everything else is completely missing metadata, except for the porn pages put up by the spammers.

I'd like to see a spammer abuse rel="unendorsed" to boost his PageRank

the rel="unendorsed" attribute thing sounds like it would be a good use of metadata, in that it doesn't try to achieve too much, and people creating pages actually have some motivation to use it. Much like the robot directives through <meta> tags, which people also use. The other good point about them is that they don't require everyone to use them to work. If one person marked up their data with 'unendorsed' then that would still be useful metadata. The pitiful proportion of people who are willing to insert descriptions, keywords and FOAF descriptions makes using this data uneconomic, because whenever the metadata is absent (almost always), you have to fall back on trying to interpret the data anyway. If everyone has to be able to fall back and look at my data anyway, whats my motivation for creating metadata? In this situation most metadata becomes pure overhead.



[ Parent ]
Detracting your detraction (none / 0) (#96)
by ubernostrum on Fri Sep 17, 2004 at 08:39:14 PM EST

It doesn't really get much easier than the <meta> tags that have been available for ages.

Which is a large part of what metadata profiles are good for; Dublin Core, for example, is almost exclusively implemented in <meta> tags.

Author, keywords, etc etc. If we can't get them right, why would something more complex work? You've made the same mistake as the semantic web people, and assumed that metadata isn't used because our existing mechanisms are somehow inadequate (in your case you've assumed that they're inadequately documented, the semantic web people typically assume that our existing mechansims aren't expressive enough).

Actually, you're making an assumption about metadata, which is that Google and services like it are the only reason for metadata to exist at all — basically, saying that if it doesn't aid a massive, world-wide search system then it's pointless. For an example of why this is wrong, ask some professional information architects whether they put <meta> keywords or something similar in sites they work on; my guess is you'll find that a lot of them do because they're extremely useful for a site's local search and navigation facilities. In that case there's no potential for abuse (all the pages are yours and under your control, so evil spammers aren't going to cram keywords and skew your results).

Metadata doesn't work simply because people who abuse metadata are virtually the only ones who are motivated to create it (them and the adept users). No-one else cares.

See my example above for a rebuttal of that point. Also, people do care about metadata, they just don't all care about the same kinds of metadata. Know your audience and tailor your metadata facilities to their tastes.

Both your suggestions here ignore the fact that there has to be a certain critical mass of correct metadata for it to be at all worthwhile to utilise it.

Again, see my point above. And stop and ask yourself why so many people with degrees in library science are making good money on the web right now by going on about things like synonym rings and controlled vocabularies. Metadata can be extremely useful on a local as well as a global scale.

the rel="unendorsed" attribute thing sounds like it would be a good use of metadata, in that it doesn't try to achieve too much, and people creating pages actually have some motivation to use it. Much like the robot directives through <meta> tags, which people also use. The other good point about them is that they don't require everyone to use them to work.

Now you're getting it ;)




--
You cooin' with my bird?
[ Parent ]
metadetraction (none / 0) (#100)
by nml on Fri Sep 17, 2004 at 09:50:06 PM EST

Actually, you're making an assumption about metadata, which is that Google and services like it are the only reason for metadata to exist at all -- basically, saying that if it doesn't aid a massive, world-wide search system then it's pointless.

My point wasn't that metadata had to be useful on a massive scale, but that it had to be useful for something. Even on smaller scales it's difficult (impossible?) to get people to enter accurate and complete metadata. There are exceptions of course, where sufficiently motivated (typically by money) groups have produced metadata-rich information. However, as many people have discovered, in contexts ranging from comments in source code to keywords entries on webpages, is that the most predictable thing about metadata is that it is usually missing, innaccurate or incomplete. The best source of information about some data is the data itself - metadata lies.

Again, see my point above. And stop and ask yourself why so many people with degrees in library science are making good money on the web right now by going on about things like synonym rings and controlled vocabularies. Metadata can be extremely useful on a local as well as a global scale.

Yes, in very specific conditions, controlled vocabularies and the like (and metadata) are useful. They are very specific solutions for very specific problems though. Take a counter-example though, and look at the success of search engines. Search engines are the ultimate example of data being better than metadata, in that they operate almost exclusively by applying hueristics to the available data (typically not metadata) and perform tasks that are traditionally the domain of metadata. And i'm willing to guess that there's more money being made in that kind of search than in synonym rings ;o)



[ Parent ]
Metadata for the people (none / 0) (#102)
by ubernostrum on Sat Sep 18, 2004 at 03:54:40 PM EST

Even on smaller scales it's difficult (impossible?) to get people to enter accurate and complete metadata. There are exceptions of course, where sufficiently motivated (typically by money) groups have produced metadata-rich information.

Here is where I think we disagree. I look at webloggers with things like XFN and even FOAF, complicated mess that it is to produce, and I see people who are creating pretty good metadata. What that says to me is that maybe we haven't found the right motivation for the average Joe Website to create and use metadata: so many attempts at these things in the past have focused solely on things which were good for search engines like Google, and so the noise of people trying to game the system and sell stuff drowned the signal.

But when it was in their interest (i.e., helping to define and possibly expand their online social circle), bloggers rapidly adopted a couple metadata formats, one of which is quite literally incomprehensible to most of them. I see things like XFN and FOAF as outgrowths of the "webrings" of the good old days, where the motivation was group identity.

Which, I guess, is what I've been saying in the comments and wish I'd thought to say in the article itself: when proposing that people use a metadata format, the reason for the metadata may be just as important as the format. Know your audience and give them a way to do what they want (even if they don't know they want it yet), and they'll use the format.

Search engines are the ultimate example of data being better than metadata

I think this is far from settled.




--
You cooin' with my bird?
[ Parent ]
re: Metadata for the people (none / 0) (#103)
by nml on Sun Sep 19, 2004 at 02:04:34 AM EST

Here is where I think we disagree. I look at webloggers with things like XFN and even FOAF, complicated mess that it is to produce, and I see people who are creating pretty good metadata. What that says to me is that maybe we haven't found the right motivation for the average Joe Website to create and use metadata: so many attempts at these things in the past have focused solely on things which were good for search engines like Google, and so the noise of people trying to game the system and sell stuff drowned the signal.

Yes, thats true. I look at metadata from a more historical perspective, where its basically been a complete failure outside of confined applications. I'm sure that at least some of the bloggers are producing very high quality metadata, and the trend may continue and sweep metadata and/or the semantic web into usefulness and common use. However, i doubt it. You have to remember that blogging is still fairly early in the adoption curve - most of the blogging community are early adopters who care enough to rigorously follow standards. Increases in popularity of blogging will bring with it people who want to make money off of it, people who use it to promote causes, and people who just want to use it to write home to mum. And i doubt that they will care much about producing metadata. Another problem is that bloggers produce only a very small portion of the internet, so regardless of how well they follow standards, the majority of pages won't contain metadata.

I think this is far from settled.

of course ;o). But i am going to claim that its true right now. I'm happy to agree to disagree about the rest - only time will tell whether metadata becomes useful.



[ Parent ]
Not quite (none / 0) (#92)
by dja on Fri Sep 17, 2004 at 01:58:57 PM EST

Nice piece, but I do have to quibble that you're not comparing like with like. For a start, there is a *lot* more information in the FOAF version than:
<a rel="friend" href="http://example.com/alice/">Alice</a>

What's more, it's not clear what the XFN version is talking about - http://example.com/alice/ is a web page, not a person. If you also said:

<a rel="dc:creator" href="http://example.com/alice/">Alice</a>

Would you be saying that Alice made you?

The argument "people can't be bothered" is a total red herring. LiveJournal creates a FOAF profile for all its users, all that "messy FOAF markup" but without any coding on the user's part.

Finally, I'm surprised you didn't mention GRDDL which is a way of extracting RDF from metadata embedded in XHTML using profiles, such as XFN.

RDF and "The tools will save us" (none / 1) (#95)
by ubernostrum on Fri Sep 17, 2004 at 08:23:23 PM EST

Please read the footnote to the FOAF example before commenting on what I did or didn't do properly with that one. I believe I've admitted that there's more going on in the RDF.

As to it not being clear what the XFN is talking about, it's quite clear: rel="friend" means that the person found at that URL is a friend of the person responsible for the document in which the link is found. This is a lot sloppier ontologically than RDF-based systems, but so are most people's lives.

And saying that tools will generate the complicated messy markup without the user needing to know how (like LiveJournal creating FOAF profiles) is, to me, dodging the problem; it's the "don't worry your pretty little head about it" mentality, and for the record I hate that. If Tim Berners-Lee had decided that it didn't matter whether people could make sense of HTML because tools would generate it for them, would the web have taken off like it did? There's a lot to be said for being able to view source and understand what's going on, which is something that the vast majority of people are never going to be able to do with RDF-based solutions.

And concerning GRDDL, I don't have any strong feelings one way or another, but I'm inclined to say that metadata inserted into HTML in this fashion could be directly extracted and processed without the need for converting it to an intermediate format such as RDF/XML.




--
You cooin' with my bird?
[ Parent ]
Extending the web with metadata profiles | 107 comments (103 topical, 4 editorial, 2 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!