Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Why YAML? Why not?

By regeya in Op-Ed
Sun Oct 31, 2004 at 01:01:03 PM EST
Tags: Software (all tags)
Software

I started a simple project a while back; all I needed to do was store some information about stuff around the office, and since it's a small office, I thought that a RDBMS system was overkill. So I decided to go with XML. After creating a document, I then decided to fight with XML DTDs, and tried to fight with both XML DOM and XML SAX for Python, I finally decided on an approach that built dictionaries using SAX, as described here.

After messing with it for a while, I decided there had to be a better way. Almost by accident, I stumbled upon YAML.


ADVERTISEMENT
Sponsor: rusty
This space intentionally left blank
...because it's waiting for your ad. So why are you still reading this? Come on, get going. Read the story, and then get an ad. Alright stop it. I'm not going to say anything else. Now you're just being silly. STOP LOOKING AT ME! I'm done!
comments (24)
active | buy ad
ADVERTISEMENT

# Introduction

While XML is buzzword-friendly, YAML (used to be Yet Another Markup Language, but is now a GNU-ish YAML Ain't Markup Language) is almost a non-buzzword. Mentioning it to people is the surest way to get a blank stare, even from fairly tech-savvy people.

That's a real shame, in my opinion, because both XML and YAML have their uses. XML, a subset of SGML, is designed to be simpler than SGML, and designed to be a standardized way of storing and sharing various types of data. It's good for a number of things; however, for every article I've read that makes me say, "Hey, that's neat!" I can find at least one rebuttal that says, essentially, "That's not what XML was designed to do!"

One wonders sometimes what XML was designed to do if it wasn't designed to do the things that people are using it for. On top of that, I've run into a number of people have expressed a number of concerns about XML. The biggest complaints I've seen are "too many characters" and "what makes this better than other file formats, exactly?"

My thoughts exactly.

# A look at YAML

One of the first things people try to do (including me) is compare YAML to XML. That's not really a fair comparison. While XML builds on a legacy language, YAML was designed from the start to be a data serialization language that's both powerful and human readable. The similarities end there.

Take a look at the reference card. The reference card is, essentially, a YAML document. For a more complete explanation, take a look at the YAML specification, and have a look at the YAML Cookbook (warning: the YAML Cookbook has a strong Ruby bias.)

# XML, or YAML?

Bear in mind that I'm not a raving expert on either XML or YAML when looking at these examples. Having said that, here's a modified snippet from the XML file I had created for an in-house contacts list:

<userid="babooey"on="cpu1">
<firstname>Bob</firstname>
<lastname>Abooey</lastname>
<department>adv</department>
<cell>555-1212</cell>
<addresspassword="xxxx">ahunter@example1.com</address>
<addresspassword="xxxx">babooey@example2.com</address>
</user>

Now, contrast this with what I would enter for the YAML file:

babooey:
computer :cpu1
firstname:Bob
lastname:Abooey
cell:555-1212
addresses:
-address:babooey@example1.com
password:xxxx
-address:babooey@example2.com
password:xxxx

I don't know what anyone else's thoughts are on the subject, but I find the YAML version to be much more readable. Python-phobes will be shocked to learn that whitespace is significant in YAML.

Another thing to notice is that YAML is designed with scripting languages (such as Python, Perl, PHP, and Ruby, among others) in mind. What does that mean? Well, it means that the language was designed for scripting languages, and designed to translate easily to structures common to various languages.

What does that mean for you? Well, if I were to read this example YAML structure into Python, assuming that I'd created a complete YAML document containing only this information, the resulting structure would look something like this:

{babooey: {computer: cpu1, firstname: Bob, lastname: Abooey, cell: 555, 1212, addresses: [{address: babooey@example1.com, password: xxxx},{address: babooey@example2.com, password: xxxx}]}

In other words, there's a dictionary with one key, and a dictionary as a value. That dictionary has a number of keys and values, with the "addresses" key mapping to a list of dictionaries. Crystal clear? Good.

And finally, what's the line count and character count on the resulting files?

XML:
Line count: 245
Character count: 10110

YAML:
Line count: 289
Character count: 9447

Not a huge savings, but YAML saves itself by being readable.

Some people might be asking "Why use either standard?" Good question! My own feeling is that I should stick with a standard language simply because I want my data to be readable, but not force someone to write a new parser if they pick up my data and, say, wish to process it in Perl. If you don't have that goal in mind, or don't care, neither XML nor YAML will appeal to you.

# Reading the document into Python

There are two parsers I know of for Python: PyYAML and Syck. Syck is designed to be a fast parser for multiple languages; PyYAML is designed to be a Python parser. After experimenting with both, I had better luck with PyYAML. Unfortunately, neither parser is a complete implementation of the YAML specification; fortunately, PyYAML implements enough for my needs.

Here's what I originally used to pretty-print the data structure resulting from loading contacts.yml:

#!/usr/bin/env python
import yaml, pprint

datafile = yaml.loadFile("contacts.yml")
dataset = datafile.next()

print pprint.pprint(dataset)

And that's it!

What does this do? Well, you can retrieve all the YAML documents in a YAML file by assigning the value of foo.next() to a variable; the result is a data structure based on the YAML document.

# In closing...

I've only scratched the surface. I've not even covered enclosing multiple documents into a single YAML file, shortcuts, forcing datatypes, blocks, trailing newlines in literals, folding, aliases, and a raftload of handy features I've not even had the pleasure to need. For more on the subject, have a look at the YAML Cookbook for good examples.

I for one know what sort of format I'll be using for my next project, and it won't be XML.

# What about all the cool stuff that's been done with XML? Can YAML compete?

So what? Anything that can be done in XML can be done without XML. Do you need XML for XML-RPC? Well, yeah, but is it necessary to use XML for RPC? No. Is it necessary to use XML for RDF? No. Is it necessary to use XML for, well, anything? No. Is YAML a perfect replacement for XML? No, but in some circumstances, YAML may be more suitable than XML.

Do I think that YAML can compete with XML? Yep. Do I see YAML replacing XML? Nope. Both are great standards, and both have their place. I'd just like to help clue people in to a standard that's more suitable in situations where XML, quite frankly, stinks.

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
XML, YAML, or neither?
o XML 13%
o YAML 9%
o Neither 13%
o Whatever works 63%

Votes: 44
Results | Other Polls

Related Links
o RDBMS
o XML
o XML DTDs
o XML DOM
o XML SAX
o Python
o dictionari es
o here
o YAML
o #
o SGML
o # [2]
o reference card
o YAML specification
o YAML Cookbook
o # [3]
o Perl
o PHP
o Ruby
o # [4]
o PyYAML
o Syck
o # [5]
o # [6]
o Also by regeya


Display: Sort:
Why YAML? Why not? | 184 comments (166 topical, 18 editorial, 2 hidden)
maybe it's because I'm a heavy XML user... (3.00 / 3) (#2)
by zenofchai on Fri Oct 29, 2004 at 02:50:35 PM EST

But I find XML much more human-parseable than YAML. For example, in XML, often times the tags describe themselves; in your example, I don't know what "babooey" itself is, and apparently would likely have to rely on it being in some kind of file called: "user-ids.yaml" or something.

In short: a root document tag is a good idea.

Plus, XML is very interoperable with programming languages. Java, C, C++, and Javascript come to mind very quickly. Many of the "next gen" data formats are already in XML (SVG, XHTML).

In conclusion: I'll learn YAML when I have to use it for a project. ;]
--
The K5 Interactive Political Compass SVG Graph

XML (none / 1) (#4)
by Pxtl on Fri Oct 29, 2004 at 03:52:24 PM EST

My problem with XML is twofold.  First, there's the inconsistent and confusing distinction between the contained data and the attributes of the tag, and the minor religious wars over which to use when.  Second is the closing tags.  Why have the redundant name?  In HTML it made sense, where <b><u>foo</b></u> was acceptable, so the close tag needed to be defined to handle the overlap.  XML, on the other hand, just makes it pointlessly verbose - XML doesn't allow such overlapping tags.

Fundamentally, its a misapplication.  Why is a typesetting lexicon being used for properties files?   The only massive advantage to XML that I see - the use of schema files for easy conversion from text to binary storage (for low-bandwidth transmission) and back is hardly ever used.

[ Parent ]

I don't understand (none / 1) (#6)
by zenofchai on Fri Oct 29, 2004 at 03:58:11 PM EST

In HTML it made sense, where <b><u>foo</b></u> was acceptable

Holy crap, that kind of poor tagging is hardly acceptable, just because the spec might have allowed it or Netscape rendered it (chicken and egg as to which of these caused which).

XML Schema actually makes XML worth using again. If you are still using DTD-based XML, give XML Schema a try.
--
The K5 Interactive Political Compass SVG Graph
[ Parent ]

My point... (none / 1) (#7)
by Pxtl on Fri Oct 29, 2004 at 04:21:43 PM EST

Why is the closing tag so verbose?  Why not just a simple / on its own, no text, no <> crap, just /...  or any other character for that matter?  The redunant name makes XML very difficult to read, imho.

[ Parent ]
Uh, no... (none / 1) (#33)
by kraant on Sat Oct 30, 2004 at 07:27:02 AM EST

You can close a tag in XML with "</>" and if your tag just needs to be on its own you can do "<foo/>".

So these are equivalent.

<foo></foo>
<foo></>
<foo/>

heh.
--
"kraant, open source guru" -- tumeric
Never In Our Names...
[ Parent ]

No, you can not (3.00 / 5) (#86)
by twanvl on Sun Oct 31, 2004 at 05:33:44 PM EST

To quote the XML 1.1 spec
[Definition: The end of every element that begins with a start-tag MUST be marked by an end-tag containing a name that echoes the element's type as given in the start-tag:]
End-tag
[42] ETag ::= '</' Name S? '>'


[ Parent ]
Wrong (3.00 / 2) (#88)
by ttfkam on Sun Oct 31, 2004 at 05:52:26 PM EST

You could close a tag that way (&lt;/>) in SGML, but not in XML.  This is by design.

If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]
Heh, (none / 1) (#109)
by kraant on Mon Nov 01, 2004 at 01:52:56 AM EST

Seems you two are right.

Serves me right for relying on the behaviour of parsers instead of reading the spec.
--
"kraant, open source guru" -- tumeric
Never In Our Names...
[ Parent ]

Because... (none / 1) (#90)
by ttfkam on Sun Oct 31, 2004 at 06:10:49 PM EST

it introduces authoring complexity?  "Say what?" I hear you say...

Let's say you could just have / as a universal end tag.  Taking a sample document based on HTML:

<table><thead><tr><th>Column One/<th>Column Two//<tbody><tr><td>Value One/<td>Value Two////

Aside from extra encoding issues -- the forward slash must now be escaped as an entity whenever it's used as content -- this is how your spec would look.  But it contains an error.  Where is it?  Unless you were passing it through a validating parser (not just well-formedness which is most commonly used today), you would not easily find out.

Let's compare to the XML version:

<table><thead><tr><th>Column One</th><th>Column Two</th></tr><tbody><tr><td>Value One</td><td>Value Two</td></tr></tbody></table>

Even a non-validating parser would help you find the error better.  Also, eyeballing where the tags don't match up is much easier.  Think this is a contrived example?  Check out nested table hell on most web pages on the net (k5 comes to mind).  Explicitly labeling the end of a tag is a good thing.

(Incidentally, this is why S-Expressions are not as suitable for documents at large as HTML/XML.  That said, S-Expressions are noticeably better than YAML.)

Also, before you say, "If you had indented that correctly, both would be equally easy," remember that a great deal of markup is dynamically generated by scripting... aside from the fact that most people in the real world don't indent the document markup religiously.

If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]

sexps are easy to auto-indent (none / 0) (#102)
by Delirium on Sun Oct 31, 2004 at 09:42:09 PM EST

There's plenty of pretty-printers for S-expressions, because the syntax is so simple that it's trivial to write one. After that, they're quite readable.

[ Parent ]
S-Expressions are not sufficient to the problem (none / 0) (#119)
by ttfkam on Mon Nov 01, 2004 at 09:40:16 AM EST

S-Expressions cannot do the job of XML. In addition, its deficiencies include a lack of explicit character encoding and lack of support of Unicode. I18n and l10n effectively blown out of the water.

And for the record, there are also pretty printers for XML (and derivative) structured markup. I myself wrote one as a SAX filter only a month or two ago. Others have written them in XSLT or STM stylesheets. But I digress.


If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]

there's no inherent lack of Unicode (none / 0) (#123)
by Delirium on Mon Nov 01, 2004 at 11:01:21 AM EST

Pretty much the only required piece of syntax of the S-expression is the parenthesis; what you put in between is more or less arbitrary. Why can't what's in between be Unicode? Current implementations largely don't use it, but I do believe that they're moving towards that as the standard character encoding for everything, including sexps.

[ Parent ]
What's in between!?! (none / 0) (#132)
by ttfkam on Mon Nov 01, 2004 at 12:04:40 PM EST

Please tell me you aren't advocating multiple character encoding schemes within the same text document.

Is the file UTF8?  Is it UTF16?  Is it UCS2?  Is it ISO8859-15?  Is it ASCII?  It makes a difference.

And you know what else makes a difference?  Encoding problems (as well as solutions in progress like Unicode) have been around for a long time.  I18n and l10n issues have been around even longer.  The folks behind S-Expressions haven't produced ANYTHING yet to address these.  What if you needed to work on your project now let alone two years ago?  S-Expressions can't help you.

Convert this K5 article from HTML to S-Expressions.  Let me know how much prettier it looks.  With all the necessary extra quotation marks and quotation escape sequences and such -- let alone deciding on the character encoding (showstopper!) -- you'd be hard pressed to convince me it's a significant improvement.

<drivenpoint>A whole bunch of text that can include almost all characters without needing to do anything special.  Certainly 99.9% of all characters in common prose.

The only exceptions here are less-than and ampersand symbols.  Of course in text with a great deal of these characters like programming examples, XML provides a clear demarcation of free-form text, the CDATA section.
<![CDATA[

  if (you.has_code && (you.patience_for_escape_sequences < ABUNDANT)) {
    use_of_xml = "less burden than other syntaxes";
  }

]]>

While that short content doesn't do it justice, a dozen characters gave me the ability to write volumes of content without having any significant worries about escape codes or character limitations.  What is the S-Expression equivalent or do you have to escape out each and every double-quote -- a very common character in prose -- to get the same results?  After six quotes, twelve quotation marks, S-Expressions actually becomes more expensive for the parser and readability.
</drivenpoint>

If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]

character encoding (none / 0) (#133)
by Delirium on Mon Nov 01, 2004 at 12:18:25 PM EST

I wasn't advocating multiple character schemes, merely a switch to One and Only One character scheme, likely UTF8. There will be no encoding-scheme specification necessary then, because all S-Expressions will be in the same encoding scheme.

[ Parent ]
You'd be screwing Asian languages [n/t] (none / 0) (#136)
by ttfkam on Mon Nov 01, 2004 at 01:06:24 PM EST


If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]
data storage (none / 0) (#137)
by Delirium on Mon Nov 01, 2004 at 01:24:47 PM EST

That's true for now, but storage capacity and bandwidth are rapidly increasing to the point where I don't think they will be much of an issue for text data, even if it has to take several bytes per character. (Video, on the other hand...)

[ Parent ]
overlapping elements... (3.00 / 2) (#26)
by MrLarch on Sat Oct 30, 2004 at 03:35:10 AM EST

That was never acceptable. But what are you going to do if you're a browser maker? Die on typos or gracefully error correct? And after a certain point you might be tempted to just interpret what people do as something else, supposing that they're too stupid to use what they really mean -- so it "just works", except for those times when somebody knew what they were doing. HTML being a tool of and rendered client-side by the incompetent masses made for a sad saga.

[ Parent ]
I hit the spam button (1.10 / 10) (#3)
by phred on Fri Oct 29, 2004 at 03:36:40 PM EST

then I'm going to -1 it. Next I'm going to modbomb the author with 0's. Then I'm going to spam his email all over slashdot.

Because whitespace is at best a delimiter, and I can't visually tell the difference between multiple spaces and a tab. So how the heck am I gonna read a python program?

white space is evil (none / 1) (#10)
by khallow on Fri Oct 29, 2004 at 04:56:18 PM EST

You don't realize how evil white space is until you use up $200 worth of shotgun shells and have to scrub to clean the splatter off your cubical walls.

Stating the obvious since 1969.
[ Parent ]

you don't need to tell the difference (none / 0) (#54)
by Delirium on Sun Oct 31, 2004 at 01:27:36 AM EST

As long as you use a sane editor with tabs set to 8 spaces, it will work just fine. Python/etc. treat a tab and 8 spaces identically, so if your editor also has 8-spaces-per-tab, and everything looks right, it'll run correctly.

[ Parent ]
it doesn't work for me (none / 1) (#122)
by phred on Mon Nov 01, 2004 at 11:01:20 AM EST

I treat the first tab as 8 spaces, then the rest as 4 spaces for a few tabs, then reduce a bit further. The reason I can do this is no sane language enforces tabstops on me.

[ Parent ]
You don't have to read a Python program. (none / 0) (#55)
by Meshigene Ferd on Sun Oct 31, 2004 at 04:16:35 AM EST

But if I want to, I do this:

:set expandtab
:set tabstop=8
:retab

YMMV. HTH.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

Duh (none / 1) (#103)
by dn on Sun Oct 31, 2004 at 10:11:55 PM EST

YAML bans tabs for indention.

    I ♥
TOXIC
WASTE

[ Parent ]

Nice. (none / 0) (#8)
by ZorbaTHut on Fri Oct 29, 2004 at 04:30:42 PM EST

I use XML as data storage for a game I'm working on - my philosophy is that, if a horrible disaster occured and all of my source were to be deleted, I should be able to open every source data file up in an existing program and look at it there. (In other words, .png is fine because I can use Adobe, and .wav is fine because I can use Winamp. Any other source format should be XML.)

I may change it over to YAML.

. . . Or I would, but it seems there's no YAML C++ implementation. Bah. I may have to whip up a very quick back-of-the-envelope one.

I wouldn't think that'd be hard. (none / 0) (#53)
by regeya on Sun Oct 31, 2004 at 12:13:22 AM EST

I mean, after all, the world already has JavaScript and Objective-C parsers, eh? ;-D

[ yokelpunk | kuro5hin diary ]
[ Parent ]

There IS a C YAML parser. (none / 0) (#184)
by chickenkiller on Mon Feb 20, 2006 at 04:31:34 AM EST

Oren Ben Kiki has been working on a C parser for YAML. It can be found in the YAML spec CVS repository at Sourceforge, and is called libyaml. For that it works in C, it should work with C++. -Lionel

[ Parent ]
Minor detail (none / 0) (#9)
by jd on Fri Oct 29, 2004 at 04:31:52 PM EST

XML is a set of pre-defined templates within SGML. To call it a subset of SGML is not strictly correct, because XML is defined on top of SGML. (Unlike HTML, which really IS a subset, as it is implemented entirely seperately.)

I like the look of YAML, and will likely look at it much more closely.

I've been working on a system of my own, though, for some time which stores data in a "folded" format, rather than the flat format most people are familiar with. I then have a second heirarchy, matching the first in structure, containing metadata on what the block is supposed to look like.

The idea is that I can then "flatten" the file into any markup I like - HTML, XML, LaTeX, etc. All I need is a mapping which tells me how to translate the metadata into the notation used, and then just do a suitable substitution.

The reason I favour this kind of approach is that I don't see why I should be telling users how they should be viewing the document. That should be up to them. I should be concerned only with the content, NOT the presentation.

Sadly, too much emphasis these days IS placed on the presentation, which means web pages (especially) are impossibly cluttered as "designers" try to look impressive, rather than be functional.

The only purpose in storing data on a computer is to deliver it. "Write Once, Read Never" approaches should be left to rot.

a shortcoming you overlooked (2.77 / 9) (#11)
by dimaq on Fri Oct 29, 2004 at 05:05:52 PM EST

there's one property of xml that is so obviousely not present in your yaml - you can take a block of xml and stick it anywere in another xml document, without any modifications.

your yaml would need whitespace (nesting) adjustment.

another bit that I find erratic in python and complete murderous in your yaml is indentation nesting - granted good code shouldn't be over-indented anyway, so python gets off the hook easily - no consider a yaml document that has, say, 200 levels of nesting - would it still look that pretty? would you even be able to edit it sensibly?

xml, by contrast, has a [semi-standard] convention on white-spaces that allows you to show/edit/store it either as indented or as one long line.

besides, just because you chose to group elements in your yaml representation differently from the xml representation, doesn't mean it's a great special property of yaml!

p.s. all this said I'd like to see something more human-readable than xml (and yaml seems to be for short data dumps) as a standardized config file format for example. perhaps even as some request format.

One long line (3.00 / 3) (#12)
by Armin Hardwood on Fri Oct 29, 2004 at 06:16:34 PM EST

From looking at the reference card and spec, it seems that yaml does allow you to store a structure on one line.

For example, instead of


item:
    one: 1
    two: 2
    list:
        - a
        - b
        - c

you could have


item: {one: 1, two: 2, list: [a, b, c]}

That might help a little with the excessive indentation problem if you have deep & narrow data structures.


[ Parent ]

A closer look (2.50 / 2) (#95)
by ttfkam on Sun Oct 31, 2004 at 06:54:38 PM EST

<item one="1" two="2">
  <a/>
  <b/>
  <c/>
</item>

or

<item one="1" two="2"><a/><b/><c/></item>

That's one option.  A more verbose XML might be:

<item>
  <one>1</one>
  <two>2</two>
  <list>
    <a/>
    <b/>
    <c/>
  </list>
</item>

or

<item><one>1</one><two>2</two><list><a/><b/><c/></list></item>

----

Technical Summary...
  Your first YAML example: 73 characters with spaces; 37 with tabs instead of spaces.
  Your single line YAML: 39 characters
  My first XML example: 51 characters
  My first XML as a single line: 41 characters
  My verbose XML: 90 characters
  My verbose XML as a single line: 62 characters

Seems to have more to do with how you structure things than the markup itself from your example.  Let's move on to more interesting matters.

1.  What do you do if your element "one" contains commas, whitespace, carriage returns, etc.?
2.  How do you specify the character encoding of the YAML document?
3.  How do you merge elements from two (or more) YAML documents together without losing the distinctive semantics of each?
4.  How do you validate a YAML document for correctness?
5.  What is the easiest way to retrieve the second list item, element b, from your YAML document?
6.  XML has three common special cases -- <, " and & become &lt;, &quot; and & respectively -- and only require &quot; to escape quotation marks in attributes.  How many character exceptions does YAML have?  As an exercise, convert this K5 HTML page to YAML and see how many you come across.
7.  How do you write non-Latin characters in YAML?

Those who try to replace XML without learning its lessons are doomed to repeat old mistakes.

If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]

Fit (none / 0) (#173)
by neehnahw on Sat Nov 06, 2004 at 06:18:56 AM EST

consider a yaml document that has, say, 200 levels of nesting - would it still look that pretty? would you even be able to edit it sensibly? It would look horrible with all that indentation. However, if you are sane enough you wouldn't use YAML for data which would have nesting of that size . So I agree with you that YAML seems to be more suited to small-scale data dumps and I think it is better for tasks where you already have an available YAML parser and learning DTD, Schema, XSLT or related XML technologies would be overkill. My conclusion is to use whatever format fits the task.

[ Parent ]
whitespace (3.00 / 3) (#15)
by forgotten on Fri Oct 29, 2004 at 07:12:29 PM EST

i can live with whitespace in programs.

but having whitespace significant in data files is just asking for trouble.

--

what's the difference? (none / 0) (#16)
by Armin Hardwood on Fri Oct 29, 2004 at 07:25:39 PM EST

care to elaborate?


[ Parent ]
they are two different things (none / 1) (#18)
by forgotten on Fri Oct 29, 2004 at 08:06:51 PM EST

when you are coding, you normally aim for a finished product that is concise and readable. if along the way things get complicated you break the problem down into smaller pieces so that it remains concise and readable. and its easy to do this: use separate functions, separate files, modular structure, indentation. the finished product often naturally lends itself to a whitespace-significant format, which is why most people really dont have any trouble when they try (say) python.

but with a data file you dont have those kinds of options. you have to assume that it is not readable or modular. now either it will never be edited by a human - in which case, there is certainly no point in having meaningful whitespace, or it may have to be, in which case whitespace in a large messy data file is almost noise to a human reader, can easily be overlooked, and when needed but absent may not be able to be inserted.

--

[ Parent ]

No it isn't (none / 0) (#66)
by trhurler on Sun Oct 31, 2004 at 01:35:41 PM EST

There are reasons why YAML sucks, but this isn't one of them. If you use a decent set of tools(possibly written by yourself,) to read and write these files, and otherwise treat them as opaque except for debugging purposes(which you should,) then it does not matter AT ALL what the format is as long as you can read it when you want to(again, hopefully for debugging purposes.)

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
Defeats the purpose (none / 1) (#113)
by curien on Mon Nov 01, 2004 at 06:59:18 AM EST

One of the purposes of YAML (at least according to this article) is a file format that's more easily human-readable than XML. Treating it as opaque defeats that purpose entirely. And if you write your own parser, you might as well use a custom format.

--
This sig is umop apisdn.
[ Parent ]
It is also deficient (none / 1) (#120)
by ttfkam on Mon Nov 01, 2004 at 09:50:49 AM EST

...when compared against the requirements for XML.  Where was the character encoding specified again?  How suitable is it for non-Latin languages and non-Germanic languages.  How many necessary character exceptions are there in YAML?  XML has a total of three.  Can you mix and match YAML documents without losing coherence?  How do you validate the YAML file?  Do you have a choice between push, pull, and in-memory parsing options?

If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]
You CAN do YAML without using indentation ! (none / 0) (#183)
by chickenkiller on Mon Feb 20, 2006 at 04:17:27 AM EST

I am not a guru about YAML, but what I know is that in order to get information about it, you should keep an eye on what Why The Lucky Stiff writes. He is a great contributor to YAML and several other very interesting projects. To get to the point, have a look at this article. You will understand that if you have a non-whitespace paranoia, there is a solution. Besides, it remains quite readable using inline collections (though I personally prefer blocks). Enjoy. -Lionel

[ Parent ]
All parsing sucks cock. (2.14 / 14) (#17)
by five volt on Fri Oct 29, 2004 at 07:59:17 PM EST

Why are people so obsessed with using text files as the between-programs representation of non-text data?

The only easily readable way to do it is to put every item on its own line, and throw in some more whitespace every time you want to nest. Like your YAML. Then it looks sorta okay in a text editor. You just need to page up and down to see what you're nested under.

But it's still retarded to make a computer store its data in semi-readable human format. Why not store it in a format that computers are good with (binary with pointers), and then do quick conversions when, perchance, a human wants to look at/manually edit it? Hey computer scientists, stop inventing new languages for a minute and make binary tools.

In a sensible computing system, a core program should not need to care about the contents of a string. Filenames suck cock. XML sucks cock. URLs suck cock. Text config files suck cock. Text protocols suck cock.

The only reason I choose to live with the pain is that whoever designed UNIX decided that standard programs should talk to each other in a mishmash of English and line noise. WAY TO GO, GENIUSES. So, for every one line of real code (you know, crunching numbers - what computers were originally made for), I seem to have five lines of cooksocking parse code - reading some stupid config file, figuring out what the fuck the text is saying, making sure user didn't put a string in where I expect a floating point number, and dying gracefully if anything was wrong. And then I have to output my numbers into a text file so that gnuplot can read it.

Blah.

--
Ruthlessness kicks ass.

There are good reasons (3.00 / 6) (#19)
by ZorbaTHut on Fri Oct 29, 2004 at 10:17:47 PM EST

I used to agree. Then I read The Art of Unix Programming. Specifically, this chapter.

Summary:

Text files are far easier to modify at a later date
Text files are far easier to inspect for correctness
Text files are rarely significantly larger, especially if you pass over it with gzip compression (and often smaller!)
Text files aren't that hard to parse, and if your parsing code is the majority of your code, you're not doing anything complicated anyway
Computers are extremely fast and have lots of hard drive space, and chances are that text parsing and disk reading isn't the bottleneck anyway for even slightly interesting programs

Seriously, all you really need for text parsing is a little bit of code and a "hell, something's gone wrong" system - if you're using C, that's setjmp/longjmp, if you're using C++ or Java (or probably Python) that's exceptions. All of which are near-trivial in this case, since file parsing/loading code with side effects is inexcusable.

[ Parent ]

hmm (none / 0) (#22)
by five volt on Fri Oct 29, 2004 at 11:57:02 PM EST

I suppose what I'm really looking for is a better overall computing system.

Honestly, I would like a system where I can pass a complex data structure or a function to a program, as a 'command-line option', and the program can use that input as if it were internal data.

Object editors would be pretty common in a system like that, as common as the ASCII-byte editors we have today (eg: Notepad).

I can see that unless there's a total overhaul, it would just be best to stick with text, even though text is the most awkward format a computer can deal with.

Glue code still sucks.

--
Ruthlessness kicks ass.
[ Parent ]

Not entirely hard (none / 0) (#31)
by ZorbaTHut on Sat Oct 30, 2004 at 07:21:10 AM EST

Java has Serialize functions. Obviously there's no particular reason you couldn't serialize to stdout, then pipe it to another program to read it back as stdin.

Most of the XML libraries I know are a bit difficult to work with, but with a bit of framework around them you could make C++-data-structure-to-XML similarly convenient.

Functions are a bit harder, of course, unless you have an interpreted language where you can pass in code (Python, Lisp, Perl), or the function in a dynamically linked library already, or can shell out to the system - possibly using XML for variable passing - and run an independent program as your "function" (though it clearly won't be in your memory space, and sending large amounts of data could add a prohibitive amount of overhead.)

.NET supports passing complex data structures natively between different languages, as well as functions and callbacks and so forth in various languages. (Which is severely cool, if you've never used it. I have a tool written half in C# and half in C++, with the C++ part also linked to several non-.net utilities.)

The problem with anything other than text is that, inevitably, anything simpler than binary is also more restrictive. Text at least has the advantage that it's relatively easy to write and parse buglessly, in the sense that you can trivially inspect the output to check it yourself.

[ Parent ]

Not to mention easier to test (none / 0) (#34)
by Trevasel on Sat Oct 30, 2004 at 12:15:51 PM EST

In our game engine, we need a file format that can be both hand-edited, logged, cvs merged, and also be very high-performance.

We have a structued text file format (more FORTRANny than yaml or xml for simplifying merges). There is a 1:1 correspondence between features in the text and binary format. The text format has a lot more redundancy and error checking, and can be logged and visually inspected in the debugger while implementing serialization functions. We can insure that our file readers and writers are compatible, that different tools using the same files interoperate, and still compile to a binary format read using the same code.

-- That which does not kill you only makes you stranger - Trevor Goodchild
[ Parent ]

I sort of agree.. (2.66 / 3) (#21)
by paxman on Fri Oct 29, 2004 at 11:37:03 PM EST

I think XML is way over hyped. I've even heard people indicate that it will replace relational databases. Yea, right. I agree that human readable forms are not that important, so long as it serves the purpose of XML, which is to have a universally understood language for data exchange. Further, I think people abuse XML. I have seen 60 MB files of XML. WTF? It took an hour just to load the friggin' thing using extant XML parsing tools. XML, SMxml

[ Parent ]
Age old problem (none / 0) (#61)
by pyro9 on Sun Oct 31, 2004 at 12:04:10 PM EST

Give a man a hammer and the whole world will look like a nail.

Other things that would 'save the world'and make everything interoperate in perfect harmony include .NET, Java, CORBA, SGML, and OOP. Note that since XML is a subset of SGML, it's a re-run.

See also TQM, ISO9000, eXtreme/agile/whatever Programming.


The future isn't what it used to be
[ Parent ]
Except that it can... (none / 0) (#146)
by ckaminski on Tue Nov 02, 2004 at 12:18:29 AM EST

I have seen XML databases perform nearly on par with standard relational database.  Granted, SQL server is faster, and has much better searching and agglomeration features, but XML databases can be fast.  Hence the massive support you see in EVERY RDBMS for XML, and the relatively cumbersome integration XML plays in these systems.

XPath and XQuery might have some maturity issues, and ANSI SQL2005 might improve RDBMS+XML co-existence, but to say that XML DBs are useless is disengenuous.  Aside from tool maturity and optimization, RDBMS's have very little on a modern XML DB like the Sonic XML Data Server.

XML DBs in flat files, are however, stupid.

[ Parent ]

well.. (none / 0) (#172)
by paxman on Sat Nov 06, 2004 at 12:06:25 AM EST

XML was never meant to be a database. It was meant for data exchange only. Otherwise there would be provisions for indexes, etc. I have never seen an XML database perform worth squat. It is adequate for a few hundred records, maybe, but once we enter the thousands, that is stretching the limits a tad, I'd say. I saw a 65 MB XML file take over an hour to load into a viewing application. That's absurd, and an abuse of XML. JAXB v2 and XMLBeans v2 are just now starting to find ways to accomodate fast querying through alternate representation in memory, specifically StaX. Furthermore, the XML schema specification is just notoriously unfriendly.

[ Parent ]
"Cooksocking"? (2.40 / 5) (#25)
by pwhysall on Sat Oct 30, 2004 at 02:57:27 AM EST

You just earned your 3.
--
Peter
K5 Editors
I'm going to wager that the story keeps getting dumped because it is a steaming pile of badly formatted fool-meme.
CheeseBurgerBrown
[ Parent ]
Binary data (3.00 / 3) (#43)
by pyro9 on Sat Oct 30, 2004 at 04:41:14 PM EST

What size is an int? What endian is it? Will that be true next year? When you send the binary data from your PC to someone else's Mac, will it work? answer: Not without byte swapping. If you're going to have to byte swap, you might as well just use atoi.

That doesn't mean storing binary data is never the right thing, but when it might reasonably be moved to another machine (or back and forth), it is usually the wrong thing.


The future isn't what it used to be
[ Parent ]
nonsense (none / 0) (#125)
by phred on Mon Nov 01, 2004 at 11:15:32 AM EST

jpegs for instance are pretty portable. So are tar / gz files. There are standards. A 16 bit integer signed two's complement stored little-endian-wise for instance is fully specified and unambiguous.

I'm not disagreeing with the benefits of text files, I'm just mentioning that binary files can be read by different computers.

[ Parent ]

Binary anyway (none / 0) (#142)
by pyro9 on Mon Nov 01, 2004 at 05:01:12 PM EST

Both file formats you mentioned are well documented, and darned near intrinsically binary anyway. There's little chance anyone with a text editor could do much good with either archive or full color image file. I do note that there are several somewhat incompatible versions of tar file out there, and it does cause some people trouble. However, they are in part, why I said almost always the wrong thing.

I do note that xbm is a text image format, and I have edited and created icons in vi. That was quite handy.

A zillion years ago when I had to deal with Windows where binary config files are rampant, they caused no end of pain and frustration.

Internet transpoerts (smtp, http, nntp, pop, others) are text based. That's probably why support for them is so nearly universal.


The future isn't what it used to be
[ Parent ]
For internal data I agree. (none / 0) (#51)
by porkchop_d_clown on Sat Oct 30, 2004 at 10:26:50 PM EST

For communicating between applications - particularly between arbitrary machines that may have different CPU architectures, and for self-explanatory data files, XML and YAML can be very useful.

I'll tell you why I don't listen. I can only read so much of your stupid a-- b--- s--- before I lose all faith in the future of humanity and start sort
[ Parent ]
You are wrong (3.00 / 3) (#67)
by trhurler on Sun Oct 31, 2004 at 01:43:20 PM EST

Film at 11.

First of all, Unix programs communicate in the way they do because of lessons learned. Once upon a time, it was done the way you want to do it. The result was unmaintainable, incomprehensible(it gets quite large, you see,) had poor scaling characteristics(obviously a related problem,) and was very, very inflexible(do YOU want to write a new program every time you want a different combination of bits and pieces?)

Pipes were basically the original modular software architecture. For those of us who USE them that way instead of just saying "neat, I can use the more program to page the output of a program in addition to just paging a text file!", they're fucking fantastic.

As for filenames, you cannot be serious. Have you noticed that usually those filenames are input by a user? Is the user going to input some binary gibberish? No? Ok then. You already have a function that turns a filename into some binary gibberish(it is called open,) so what's your fucking problem?

Regarding config files: some of us have no trouble modularizing config files away into a corner of our programs and forgetting about them, but maybe you're just an incompetent whiny retard. In any case, the whole point of standardized file formats is that it allows you to spend LESS time writing code to parse said formats, so one would think you'd be happy about them.

The whole point of a computer system is that we USE it. It isn't there for its own sake. It isn't there to provide you with an enjoyable programming experience. It is there to be USED, and it has therefore to be USABLE. This means that yes, computers have to deal with loosely formatted human-readable data in many contexts. That fact is not going to change, so get over it.

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
Ahem (none / 0) (#143)
by awgsilyari on Mon Nov 01, 2004 at 05:05:20 PM EST

Hey computer scientists, stop inventing new languages for a minute and make binary tools.

Bush doesn't listen to the little people, and neither do we.

Now get back to work programming in the perfect universe we have defined for you.

--------
Please direct SPAM to john@neuralnw.com
[ Parent ]

Bah, doesn't solve size and XML just as readable (2.50 / 2) (#24)
by jongleur on Sat Oct 30, 2004 at 12:36:31 AM EST

.. with the right tool. If you have a computer to edit a file, you can find some viewer that can peel off the tags and indent it to look like YAML anyway. Piddly idea IMO. But I'll abstain.
--
"If you can't imagine a better way let silence bury you" - Midnight Oil
This looks like .plist structure in NeXT/Apple n/t (2.00 / 2) (#30)
by israfil on Sat Oct 30, 2004 at 07:18:53 AM EST


i. - this sig provided by /dev/arandom and an infinite number of monkeys with keyboards.
Ummm.... (none / 0) (#50)
by porkchop_d_clown on Sat Oct 30, 2004 at 10:24:27 PM EST

plists are XML.

I'll tell you why I don't listen. I can only read so much of your stupid a-- b--- s--- before I lose all faith in the future of humanity and start sort
[ Parent ]
plists & XML (none / 1) (#74)
by naomi385 on Sun Oct 31, 2004 at 02:57:45 PM EST

In fact, XML is just one way of formatting plists. There is also a binary plist format and a ASCII plist format.

Propaganda. Questionable Intelligence. The Visitations.


[ Parent ]
Ummm.... (none / 0) (#115)
by kraant on Mon Nov 01, 2004 at 07:24:30 AM EST

Only if you were born yesterday.
--
"kraant, open source guru" -- tumeric
Never In Our Names...
[ Parent ]
bottom's up gay (1.09 / 11) (#35)
by Requiem for a Dream on Sat Oct 30, 2004 at 01:56:15 PM EST

no comment

generic lisp-bigot response (3.00 / 9) (#36)
by Delirium on Sat Oct 30, 2004 at 02:32:56 PM EST

(babooey
   (computer cpu1)
   (firstname Bob)
   (lastname Abooey)
   (cell 555-1212)
   (addresses
     (address babooey@example1.com (password xxxx))
     (address babooey@example2.com (password xxxx))))

LISP Smackdown! (3.00 / 2) (#39)
by Peahippo on Sat Oct 30, 2004 at 03:35:37 PM EST

(address babooey@example2.com (password xxxx))))

You do realise that LISP is an acronym that stands for:

Lots of Irritating Single Parentheses


[ Parent ]
Even better (3.00 / 3) (#42)
by pyro9 on Sat Oct 30, 2004 at 04:25:15 PM EST

(person babooey
   (computer cpu1)
   (firstname Bob)
   (lastname Abooey)
   (cell 555-1212)
   (addresses
     (address babooey@example1.com (password xxxx))
     (address babooey@example2.com (password xxxx))))

If person is a defined function, you can just feed the whole thing to lisp (or anything lisp-like) and it will do the right thing.

If building lisp in to the app (or just using lisp) is overkill, the small, fast s-expression library is a good choice. I found it fairly easy to cobble up a Python binding for it so that S-expression goes in, structure of dicts and arrays comes out.


The future isn't what it used to be
[ Parent ]
If (none / 0) (#44)
by trane on Sat Oct 30, 2004 at 04:58:15 PM EST

you didn't have a second address say, or a phone, would you have to include place-markers for those in the lisp code, or could you leave them out as in xml? I'm thinking, it would depend on how you defined the "person" function...but my lisp is not advanced enough to figure out what that would look like.

[ Parent ]
leave them out (none / 0) (#45)
by Delirium on Sat Oct 30, 2004 at 05:34:54 PM EST

The skeleton parsing code would be something like this (using Scheme syntax):

(define person
   (lambda attribute-list
      (let loop ((attribute (car attribute-list))
                   (rest (cdr attribute-list)))
         (set-attribute-value (car attribute) (cdr attribute))
         (if (not (null? rest))
            (loop (car rest) (cdr rest))))))  

[ Parent ]

That's pretty awesome. (none / 0) (#46)
by trane on Sat Oct 30, 2004 at 05:56:44 PM EST

The text file would just be a lisp list, and to parse it you would just eval the list (assuming the format with person), then you would have a bunch of attribute-value pairs to play with? If that's it, it seems a lot cleaner than the xml parsing i've done in java...

[ Parent ]
Duh (none / 0) (#47)
by trane on Sat Oct 30, 2004 at 06:03:14 PM EST

I just reread pyro9's post, and he answers my question.

If person is a defined function, you can just feed the whole thing to lisp (or anything lisp-like) and it will do the right thing.

[ Parent ]

Can sfsexp do non-tree structures? /nt (none / 0) (#56)
by Meshigene Ferd on Sun Oct 31, 2004 at 05:25:12 AM EST


--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

Yes (none / 0) (#59)
by pyro9 on Sun Oct 31, 2004 at 11:13:52 AM EST

It can parse any arbitrary valid S expression.


The future isn't what it used to be
[ Parent ]
From the sorry excuse for documentation: (none / 0) (#62)
by Meshigene Ferd on Sun Oct 31, 2004 at 12:47:06 PM EST

"An s-expression is an expression composed of elements that are either atoms or s-expressions."

Do you want a letrec with that?
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

Recursion is intrinsic (none / 0) (#68)
by pyro9 on Sun Oct 31, 2004 at 01:43:48 PM EST

The recursive definition is intrinsic to S expressions. That's why Lisp and friends are so predisposed to recursive solutions.


The future isn't what it used to be
[ Parent ]
Can't you read? (none / 0) (#75)
by Meshigene Ferd on Sun Oct 31, 2004 at 03:16:17 PM EST

Let's start over, slowly. How to encode a cyclic data structure using the sexpr syntax?
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

Here's how to create sexp cycles (none / 1) (#84)
by Tayssir John Gabbour on Sun Oct 31, 2004 at 05:29:16 PM EST

Sure, you can have something like:
#1=(a b c #1#)

Where the sexp is its own last element.

There's more here. I admit it could be seen as Lisp advocacy; just mentioning it if you're interested. I'm not interested in advocating.

[ Parent ]

That's not how sexp is defined (none / 0) (#87)
by Meshigene Ferd on Sun Oct 31, 2004 at 05:38:56 PM EST

by the library in question. I can easily imagine that I can do more or less whatever I want if I write my own parser for my own sexp-like language. Which is precisely what I want to avoid.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

If you're dealing with a feature-light sexp lib (none / 0) (#97)
by Tayssir John Gabbour on Sun Oct 31, 2004 at 07:41:20 PM EST

If the sexp library in question cuts out features of modern sexps, then maybe you can discuss on the project's mailing list whether they're willing to support/maintain it.

I previously thought you guys were talking about Common Lisp, or one of the Lisp predecessors like the LispMachine dialects.

[ Parent ]

Ah, you were talking about "sfsexp" (none / 0) (#98)
by Tayssir John Gabbour on Sun Oct 31, 2004 at 07:43:11 PM EST

That's what I get for skimming, only looking at your last post.

[ Parent ]
I think you can annotate them (none / 0) (#85)
by Delirium on Sun Oct 31, 2004 at 05:30:07 PM EST

Common Lisp at least lets you annotate parts of your sexp with labels that you can refer to in other parts. Something like (a :label node-a b c node-a) to make an a->b->c->a circular structure, although I don't recall whether the :label thing is the exact syntax.

[ Parent ]
that's not the same as circular though (none / 0) (#76)
by Delirium on Sun Oct 31, 2004 at 03:27:41 PM EST

A simple recursive definition as "an S-expression is composed of elements that are either atoms or other S-expressions" doesn't allow for non-tree datatypes, because you can't refer "upwards" on the tree to make cycles; all you can do is refer "downwards" to the wholly contained sub-expressions. For example, you might want to make a graph with 3 notes, a, b, c, with edges a->b->c->a.

To allow cycles, you have to define it as "an S-expression is composed of elements that are either atoms or references to other S-expressions". This would allow them to either be new S-expressions or previously-used ones that are being referred to.

Lisp is implemented in the latter form, because an S-expression is a cons pair, consisting of an atom and a reference. However, while you can write out tree-form s-expressions, like <texttt>'(blah (foo bar) zoom)</texttt>, you have to "construct" cyclical ones by setting the cdr field of a cons pair to an already-extant reference (although I think Common Lisp has some way of referring upwards in the tree so you can actually write it out).

So, to summarize, whether S-expressions can represent non-tree data types depends on whether they're allowed to contain references for their sub-expressions, or are required to actually contain the sub-expressions themselves "by value".

[ Parent ]

bah, formatting (none / 0) (#78)
by Delirium on Sun Oct 31, 2004 at 03:32:26 PM EST

Pretend that <texttt> was a <tt>. Too used to \LaTeX...

[ Parent ]
That's what I'm saying. /nt (none / 0) (#80)
by Meshigene Ferd on Sun Oct 31, 2004 at 04:21:38 PM EST


--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

references (none / 0) (#99)
by pyro9 on Sun Oct 31, 2004 at 08:45:56 PM EST

References are a function of the language, not the expression, thus:

(def alpha (a b c alpha))

Is perfectly parsable by the lib, it's up to the calling program to have a valid definition of def in this example

Since the library is not an implementation oflisp,it doesn't define lambda or anything else itself.


The future isn't what it used to be
[ Parent ]
but that's not very simple (none / 0) (#100)
by Delirium on Sun Oct 31, 2004 at 09:15:43 PM EST

I think the issue in question was whether you could describe non-tree data structures with an sexp. Apparently modern Common Lisp variants let you do something like #1=(a b c #1#), which would do the trick. If you have to actually basically write a program that needs to execute in order to define your data structure, that's more tricky.

[ Parent ]
Agreed there (none / 0) (#107)
by pyro9 on Sun Oct 31, 2004 at 10:46:30 PM EST

I'll agree that it gets trickier. However, depending on how you're doing it it needn't be that tricky. For example,if you allow the S expression to name the objects that will be created and dereference already created objects, (alpha (a b c alpha)) will be all you need. In the given example, you will end up with 2 objects, alpha and a where a is an attribute of alpha and alpha is an attribute of a.

If it starts getting more complicated than that, it might be worthwhile to link in a full lisp.


The future isn't what it used to be
[ Parent ]
XML can do this too (none / 0) (#135)
by ttfkam on Mon Nov 01, 2004 at 12:48:06 PM EST

Even back in early DTD days you had ID and IDREF attribute types.  XML Schema and RelaxNG both improved upon this.  S-Expressions have no advantage in their serialized form.  The cyclical references are only made possible by the API.

The XML Document Object Model API doesn't handle cyclical references, but this is not XML.  It is an API to manipulate XML.  There is nothing stopping anyone from writing a high level API that treats cyclical references as an automatically dereferenced pointer (which is really what you're talking about).  In fact, people have.

API.  File Syntax.  Two different things.

If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]

Huh? (none / 0) (#112)
by Meshigene Ferd on Mon Nov 01, 2004 at 04:05:11 AM EST

References are a function of the language, not the expression

An expression is always in some language. Perhaps you mean "a function of the semantics, not the syntax". The library implements no semantics, or perhaps only a trivial semantics of mapping sexprs to trees. Too bad, because graphs with cycles are common and I'd like to have a library that abstracts away their mapping to some external representation.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

Free Software (none / 0) (#114)
by pyro9 on Mon Nov 01, 2004 at 07:21:26 AM EST

I can see the value of that for some uses. If all else fails, it's Free Software, you can have the source...


The future isn't what it used to be
[ Parent ]
The question is. (none / 0) (#116)
by Meshigene Ferd on Mon Nov 01, 2004 at 07:45:32 AM EST

Why should I fiddle with sexprs if there are other tools that do what I want? Like, um, Scheme. Or XML.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

Fine (none / 0) (#121)
by pyro9 on Mon Nov 01, 2004 at 10:18:05 AM EST

If the other tools do what you want, I'd say you should use them! Note that I suggested the lib for situations where including lisp or similar was overkill. Apparently, your situatioon doesn't meet that criterion.


The future isn't what it used to be
[ Parent ]
Hey, that's neat (none / 0) (#182)
by regeya on Tue Dec 07, 2004 at 03:27:27 PM EST

I might have to consider that...

I hated Lisp in college, but I've grown to like it a bit more now.

The thing is, I'm not sure there's any advantage to using this over YAML in Python. Thanks to PyYAML and Syck, I can write a file that's very close to the Python data structure, and read it in to Python and have a nice data structure. And I can use that file in Perl, Ruby, Objective-C, or whatever.

I am intrigued by the idea, though.

[ yokelpunk | kuro5hin diary ]
[ Parent ]

repeat after me (2.00 / 3) (#37)
by the sixth replicant on Sat Oct 30, 2004 at 02:51:59 PM EST

encoding encoding encoding

how does YAML deal with it? I bet you it's just as stupid as XML

Ciao

A subset of Unicode (none / 0) (#105)
by dn on Sun Oct 31, 2004 at 10:22:06 PM EST

YAML 1.0 spec: Encoding
4.1.2. Encoding

A YAML processor must support the UTF-16 and UTF-8 character encodings. If an input stream does not begin with a byte order mark, the encoding shall be UTF-8. UTF-16 (LE or BE) or UTF-8, as signaled by the byte order mark. Since YAML files may only contain printable characters, this does not raise any ambiguities. For more information about the byte order mark and the Unicode character encoding schemes see the Unicode FAQ.

[2] c-byte-order-mark ::= #xFEFF /* unicode BOM */


    I ♥
TOXIC
WASTE

[ Parent ]

So how does this compares with old file structures (none / 1) (#49)
by lukme on Sat Oct 30, 2004 at 07:33:27 PM EST

like the ones used to store files on tape.


-----------------------------------
It's awfully hard to fly with eagles when you're a turkey.
Why I stopped worrying and learned to love XML (2.66 / 3) (#57)
by xL on Sun Oct 31, 2004 at 07:37:41 AM EST

When XML started to become one of those New Things, I went into righteous denial. The concept of using XML formatting for data exchange seemed, well, bloated for most purposes. XML-formatted structured data, at first and second glance, looks harder to parse both for humans and machines. For long, I worked with a standard textformat that served me well for structured data, something along the lines of:

foobar {
  quux="bar"
  foo="baz"
  wobble {
    wewp=42
  }
}

Then came the point that I ran into situations where I had to communicate with software that used XML data. This is where things started to get complicated. A lot of my time started to go into tedious transformation code getting the external XML data and putting it into an internal representation.

I started to get exposed to more and more XML formats over time and the transformation code became more and more sophisticated. Up to the point that the internal representation of data started to follow most of the concepts of XML (keyed dictionaries or arrays of objects with attributes). A sensible text format for storing such data would be quite tricky, it should distinguish the following characteristics for objects:

  • Some form of indication of an object's type or class
  • Optional and/or mandatory attribute values
  • Child nodes with a key attribute or as part of an array
It daunted me that, whatever text format I could come up with to cover all bases would be just as hard to parse in software as XML and not necessarily easier to parse by humans either. That's when I decided to just stop horsing around and use XML natively (with a backdoor option to use binary structured storage in situations where it's important that the machine can parse things easily and legibility for humans is not important).

YAML looks sensible. It's indeed a bit easier to read than regular XML. But, already, it's not a free ride for machine parsing. And, even if you embrace it, you will run into situations where you have to talk to software that expects XML. Keeping code around for two ways of storing and parsing the data is not going to make your life easier.

What would make you happy? (none / 0) (#130)
by BuddasEvilTwin on Mon Nov 01, 2004 at 11:37:56 AM EST

  I've written some similar code in the past similar to your key/value pair encoding scheme, and I'm very interested in your opinions as to how one could modify some of the key concepts of YAML to address your concerns.

Problem 1: Some form of indication of an object's type or class.

I like this idea, and the first thing that came to mind was C++'s scope operator ::, but changing it to a pipe | which would produce something like this:

employee|bob:
  name:  Bob Jones
  position:  Clerk

Problem 2: Optional and/or mandatory attribute values

Are you pretty much suggesting a YAML equivalant DTD or YAML Schema with this objection, or did you something else in mind?

Problem 3:  Child nodes with a key attribute or as part of an array

Does solving Problem #1 help address this problem, or are were you suggesting something more the the direction of YAML Schema?

I'd be very interesting in your thoughts, especially how you would go about addressing these problems.


[ Parent ]

If I knew, I'd use it (none / 0) (#158)
by xL on Tue Nov 02, 2004 at 03:24:57 PM EST

There's certainly a way to work a type indication into an ASCII scheme. There comes a point, though, where the resulting format becomes just as painful to read as xml. Not only do you have to work in a type indicator, you also need to address the distinction between attributes and child nodes. So then you would get something like:

employee|bob(recordowner:hr):
  givenname|name: Bob Jones
  givenname|alias: Thomas Jones
  longint|employee-id: 201847283714

As you can see, it is not impossible to encode this information into any consistent schem you want, but now compare that to an XML equivalent:

<employee id="bob" recordowner="hr">
  <givenname id="name">Bob Jones</givenname>
  <givenname id="alias">Thomas Jones</givenname>
  <longint id="employee-id"> 201847283714</longint>
</employee>

The XML is actually easier to read and more self-evident. Both for humans and machines.

[ Parent ]

XML Configuration Files. (3.00 / 5) (#60)
by bhearsum on Sun Oct 31, 2004 at 12:00:23 PM EST

XML has it's uses, I won't deny that. But this article made me think of XML configuration files.

Ugliest. Files. Ever.

I remember when I wanted to stream some audio to a couple friends I decided to try out icecast2, which is a nice piece of software. Now, normally for me to configure something like this I just open up the config file and browse through it, simple as that. XML makes the ugliest fucking files ever. It's redundant and silly, and one line usually doesn't fit in a standard terminal which is an even greater annoyance.

Let's compare,

<icecast>
    <limits>
        <clients>100</clients>
        <sources>2</sources>
        <threadpool>5</threadpool>
        <queue-size>102400</queue-size>
        <client-timeout>30</client-timeout>
        <header-timeout>15</header-timeout>
        <source-timeout>10</source-timeout>
    </limits>
</icecast>

with,

# Limits
clients = 100
sources = 2
threadpool = 5
queue-size = 102400
client-timeout = 30
header-timeout = 15
source-timeout = 10

I find myself squinting to find the value for each variable. It's ridiculous

</off-topic-rant>

Well you could open it in some xml viewer nt. (none / 1) (#108)
by trane on Mon Nov 01, 2004 at 01:37:55 AM EST



[ Parent ]
Why not a sensible XML structure? (3.00 / 3) (#128)
by ttfkam on Mon Nov 01, 2004 at 11:19:01 AM EST

<icecast>
<limits
  clients="100"
  sources="2"
  threadpool="5"
  queue-size="102400"
  client-timeout="30"
  header-timeout="15"
  source-timeout="10"
/>
</icecast>

or

<icecast>
  <limits
      clients = "100"
      sources = "2"
      threadpool = "5"
      queue-size = "102400"
  />
  <timeouts
      client = "30"
      header = "15"
      source = "10"
  />
</icecast>

or another variation.  If all you need is a property list and that's all you will ever need, knock yourself out.  Then again, your example list has hierarchy (limits and timeouts), but that hierarchy is suppressed and mangled -- allowing arbitrary ordering of unrelated items and redundant suffixes like "timeout".

And of course the other advantage: if you learn how to write for one XML document, you can know the formatting and encoding rules for all XML documents.  The only thing that changes are the element names.  Attributes are always in quotes.  Elements are always closed by well-defined rules.  And there are only three common exceptional character cases: < " &.  And the double quote needs to be escaped only in attributes.  Everything else is simply treated as is.  Even multi-line strings which have been a constant source of irritation for users of flat property lists.

How about a version that includes the version number of the configuration format or program.

<icecast version="2.0">
 ...
</icecast>

This would make tranformation targets easier to manage.  Why not make it so that it can be validated before deployment?

<icecast version="2.0" xmlns="http://www.icecast.org/2004/Server/2.0">
 ...
</icecast>

Or how about including icecast script/plugin configuration info that is guaranteed not to have name conflicts?

<icecast version="2.0" xmlns="http://www.icecast.org/2004/Server/2.0">
  ...
  <add-on code="3rdparty.so" xmlns:plugin="http://www.3rdparty.org/2002/icecast/widget"
    plugin:numclients = "30"
    plugin:sources = "http://www.3rdparty.org/sources"
  />
</icecast>

Or even simpler and more modular:

<icecast version="2.0" xmlns="http://www.icecast.org/2004/Server/2.0" xmlns:xi="http://www.w3.org/2001/XInclude">
  ...
  <xi:include href="3rdparty.xml"/>
</icecast>

The nice thing here is that add-ons are possible without hosing the parser or existing logic.  This is possible because everyone's following the same fundamental rules, and when it comes down to brass tacks, XML ain't that bad.

If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]

Same here .. (none / 0) (#154)
by Highlander on Tue Nov 02, 2004 at 07:18:19 AM EST

I think you will have to show the baby geeks < and > and indented XML files from age zero before the human mind can parse that data.

I can read HTML files, which have similar format, but the difference to XML data is HTML files have lots of text or attributes, and less tags.

Someone pointed out that there are good editors for XML, but that just proves the point that it is a format where it can be hard to find what you are looking for because there's so much wrapping and so many pages to scroll. I mean, if you had an editor, it could as well be binary. Well, maybe one day every editor will have XML-edit-mode, then all would be cool. Actually, I think I'm mostly bitching about the closing tag, and that < isn't a very eye-friendly character.

OT: does anyone know whether there is a way to fix the erratic indenting behavior of the "Context" editor?

Moderation in moderation is a good thing.
[ Parent ]

Not a fair example (none / 0) (#157)
by c960657 on Tue Nov 02, 2004 at 12:56:46 PM EST

Such a simple example is not a fair comparison, at least not for other than simple uses.

A fair example would contain things like

  • very long value strings
  • "reserved" characters in values (e.g. =, space and newline for YAML, and < and & for XML)
  • characters other than 7-bit ASCII characters, e.g. East-Asian characters


[ Parent ]
Misses the point(s) (3.00 / 2) (#63)
by danharan on Sun Oct 31, 2004 at 01:06:39 PM EST

First of all, you shouldn't be starting off with DTDs for a new project, but XML Schema.

YAML could be a neat format for storing one application's data. Heck, people still use good old-fashioned key=value pairs in text files. You can even just serialize objects... Whatever works.

Against XML though, YAML just can't compete. How do you communicate the acceptable structure of a document? This is essential for programs to be able to write valid files- and other programs to verify that what they are reading is indeed valid. And perhaps the coolest thing about XML is XSL- I have a hard time seeing how YAML could offer a robust alternative in that domain.

you have a point (none / 0) (#64)
by xutopia on Sun Oct 31, 2004 at 01:20:51 PM EST

XSL is what makes XML so important. So many people miss the point and believe that XML is the best way to make something readable. I hate it when XML is used for config files but a simple key pair value file could be used instead.

[ Parent ]
Silly argument (3.00 / 2) (#69)
by trhurler on Sun Oct 31, 2004 at 01:53:27 PM EST

YAML sucks, but the motive for using XML is generally NOT because it communicates the structure of a document. That capability isn't very useful most of the time. The reason is simple: a program that "knows" what the data means presumably also knows the format(or might as well know it,) and doesn't need a DTD or a Schema or whatever the toy format specification of the week is. On the other hand, a program that DOESN'T know what the data is is basically useless as anything but a viewer to make it prettier to read or as part of an XML toolchain anyway, so in 99.999% of cases, it will never be FED that data, and hardly needs to know about its structure or validate it or anything like that.

As with many things, when you get down to it, the claims for XML are overblown. It is a nice format for certain things, and nothing more. It does not solve data interchange problems(which are much more complicated than your average XML fanatic seems to comprehend,) it is not the philosopher's stone, capable of turning your crusty data into gold, it is not a Star Trek universal translator, and honestly, it isn't even all that readable(which may be why the first crop of XML programs were basically prettyprinters.)

I certainly like XML as a slightly more sophisticated way to pass data around between programs than whatever someone might have used before, IF efficiency is not a concern(it is NOT the way to pass around OLTP data, for instance, unless your OLTP application is very small by industry standards.) It is ok for storing configuration and so on, but only ok - not great. Storing your application data as XML is a viable idea, although storing it in a database and using the database's native capacity for imposing structure probably makes more sense. If you accept reality and use XML for what it is good for, that's fine. But don't pretend it is the answer to everything. It isn't.

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
Agreed. (none / 1) (#71)
by jolly st nick on Sun Oct 31, 2004 at 02:19:15 PM EST

I think XML's greatest strength is as an archival format (or rather a system for creating archival formats). This is a subset of its strengths; it is also good as in interface between programs where there is loose or no coordination between the organization producing the data and the organization consuming the data (e.g. RSS).

However, as an interprogram communication medium, it sucks in every way except self-documentation. Any place where XML is used over a network, ASN.1 would be a roughly isomorphic tool and far greater efficiency.

[ Parent ]

Yes, well (none / 1) (#72)
by trhurler on Sun Oct 31, 2004 at 02:35:30 PM EST

ASN.1 was invented before most of these kids got into kindergarten, so you can't expect them to use THAT!:)

Seriously though, it is amazing how many dipshits there are who turn 18 and suddenly think they know everything because they spent a few weeks becoming buzzword compliant.

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
You are forgetting something. (none / 1) (#81)
by Meshigene Ferd on Sun Oct 31, 2004 at 04:24:21 PM EST

For instance, XSLT.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

XSLT is not fundamentally interesting (2.33 / 3) (#93)
by trhurler on Sun Oct 31, 2004 at 06:40:14 PM EST

It does not "solve" the unsolvable problem of embedding semantics. It merely provides a standard way in which you can agree with someone to avoid the problem - if he will also agree with you - or perhaps a standard way in which you can do something that you could just as easily and just as beneficially have done in a nonstandard way.

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
Heh. (2.50 / 2) (#110)
by Meshigene Ferd on Mon Nov 01, 2004 at 03:32:46 AM EST

just as easily and just as beneficially have done in a nonstandard way.

Ever tried to take two third-party apps and make them share data? In a nonstandard way my ass.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

Hehe (none / 1) (#144)
by trhurler on Mon Nov 01, 2004 at 07:56:37 PM EST

Um... you realize that XSLT is very nearly useless for that purpose, right? It isn't NEARLY messy enough to handle slightly differing semantics in the actual values and so on. On the other hand, a very small perl script will usually do the job for those of us with half a clue; that's a big part of the job I'm at right now. I mean, sure, if all applications that stored temperature data always did it in Kelvin, that'd be great, but guess what? Not only will some use Celsius or Fahrenheit, but some will use some in-house invention instead of a standard scale, and some of them will insist on storing decimal strings while others will store IEEE floating point and still others will want COBOL style decimal. Even this is simple compared to real world problems, but XSLT can't handle this. Ergo, BAH.

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
If actual values need be converted (none / 0) (#152)
by Meshigene Ferd on Tue Nov 02, 2004 at 03:27:24 AM EST

then obviously you need some kind of real programming language. Still, I'd rather task myself with converting actual values instead of messing with (usually poorly undocumented, if at all) formats, encodings, escape sequences, and all this crap. If a very small perl script works for you, fine, go ahead and use it. Where I happen to still hold a job, nothing small works, ever.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

Hmm (none / 1) (#161)
by trhurler on Tue Nov 02, 2004 at 07:40:14 PM EST

So the problem is your employer:)

In any case, XSLT does the easy part of the job, leaving the hard part yet to be done, and to me, that just isn't very useful.

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
Or RelaxNG [n/t] (none / 0) (#129)
by ttfkam on Mon Nov 01, 2004 at 11:20:41 AM EST


If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]
Your first impulse was correct (2.66 / 3) (#65)
by daviddisco on Sun Oct 31, 2004 at 01:29:59 PM EST

The whole thing would have been so easy if you had just stuck the data in a simple database. A database! They are for storing a retrieving data. They have been around a long time and are very mature. There are free databases. Every language has a data access API. XML is great for moving data and messages between disparate systems. Maybe YAML has a use for human edited files (mostly config files). They are great, but know when to use'em, know when to lose'em
##I run a geography related site at globalcoordinate.com##
they're not really well-suited though (3.00 / 5) (#89)
by Delirium on Sun Oct 31, 2004 at 05:54:02 PM EST

The relational model isn't that well-suited for simple applications with data of this sort. If you've got a person and an email address, you can have "email address" be a data field. But if people can have multiple addresses, you need a separate table for them, which maps person IDs to email addresses so you can have multiple rows for one person. As you end up with lots of those sorts of variable-number fields, you end up with a messy proliferation of tables for relatively simple data.

[ Parent ]
Not so poorly-suited as you may believe (none / 0) (#117)
by mcherm on Mon Nov 01, 2004 at 09:24:47 AM EST

I notice that daviddisco was defending the use of a database not a relational database. Parhaps an object database would be a better fit for the sort of task you are describing.

-- Michael Chermside
[ Parent ]
Perhaps... (none / 0) (#127)
by SoupIsGoodFood on Mon Nov 01, 2004 at 11:17:44 AM EST

daviddisco was talking about relational databases and simply used the word database instead?

[ Parent ]
that would be understandable (none / 0) (#131)
by Delirium on Mon Nov 01, 2004 at 11:59:26 AM EST

As nobody uses the other kind.

[ Parent ]
Sure... (none / 0) (#148)
by ckaminski on Tue Nov 02, 2004 at 12:43:45 AM EST

Why not use an object database?  Good idea.  Several exist, in fact.  Objectstore, for example.

Problem:  Object databases typically support only one or two languages.

Problem:  Most businesses have data processes in multiple languages, some of which are inaccessible by OODBMS (VB is a prime example, perl is another).

Solution:  Native XML database.  Xindice.  Sonic XML Data Server.

If you are not in a relational model, and you're working with the web, OODBMS is a poor choice.  A native XML database implementing XQuery/XPath internally and supporting multiple languages and XML-RPC is a fundamentally better choice.  

I'd have been happy if SQL could have implemented lists or arrays as fields.  I think it's the number one failure of the language.

Something like the following:

company
  location
     billto
     shipto
  employee
    address
    email
      email1
      email2

becomes at least 3 queries, possibly more, which introduces more round-trips which introduces more latency

whereas

xpath("/company[name|id]='value'");

is simpler, IMHO.

It frees me from having to know the data relationships (which arguably an enterprise level RDBMS will enforce through data models).

[ Parent ]

Wow, YAML really sucks (2.50 / 2) (#70)
by trhurler on Sun Oct 31, 2004 at 02:05:44 PM EST

First of all, what does it offer that I don't get with a recursive descent parser that'll take about ten minutes to write(after all, using indentation like that makes it obvious that this isn't intended for anything particularly complicated?)

Second, this automated translation into data structures sounds neat, until you realize that you can write something that does that with any number of data formats in a few minutes' time in any of those scripting languages you're talking about.

Third, your final conclusion vs XML is that it is about the same size, but that you think it is more readable. Er... I hate to bust your balls on this one, because I think I already know the answer, but have you noticed that there are XML tools that will take a DTD or a Schema, let you edit the data in its intended structure without having to mess with tags much at all, make sure you don't do anything that isn't allowed, and so on? I can't imagine that being less readable than this. Also, although it isn't typically used, whitespace is mostly ignored in XML, so you CAN format XML to be very readable if it is important, even with the tags.

--
'God dammit, your posts make me hard.' --LilDebbie

Well put. XML can be prettied up. (none / 0) (#92)
by rs170a on Sun Oct 31, 2004 at 06:28:08 PM EST



[ Parent ]
Which tools? (none / 0) (#118)
by mcherm on Mon Nov 01, 2004 at 09:27:06 AM EST

Have you noticed that there are XML tools that will take a DTD or a Schema, let you edit the data in its intended structure without having to mess with tags much at all, make sure you don't do anything that isn't allowed, and so on?

Could you please enlighten me? What tool have you used that do this smoothly? I certainly understand that there could be such tools, but I haven't used good ones yet. What should I be trying?

-- Michael Chermside
[ Parent ]

You could google.... (none / 0) (#149)
by ckaminski on Tue Nov 02, 2004 at 12:46:50 AM EST

But I'll save you the trouble.

Stylus

Probably the #1 validating parser/editor.

Yes, I used to work for the company that made the product, and I've had extensive experience using it.  There's a free 30 day trial, so have a look-see to realize that I'm not really that biased.

But it is expensive ~$300US IIRC.

[ Parent ]

Biggest mistakes (3.00 / 2) (#73)
by lookout on Sun Oct 31, 2004 at 02:43:01 PM EST

XML: attributes and elements

There is a subtle semantic difference between

<person sex="male">
    <name>John</name>
    <surname>Doe</surname>
</person>

and

<person>
    <sex>male</sex>
    <name>John</name>
    <surname>Doe</surname>
</person>

but was it really necessary to mess up the syntax to encode that semantic difference ? IMO attributes should have been left out.

YAML: whitespace significance

Bad, bad. If you've worked in worldwide teams where everyone insists on his/her own tab settings, you'd know. We've left behind the fixed line layout of Fortran and punched cards a long time ago.


YAML forbids tabs. (none / 1) (#82)
by Rhinobird on Sun Oct 31, 2004 at 04:55:41 PM EST

YAML forbids tabs.
http://www.yaml.org/faq.html


"If Mr. Edison had thought more about what he was doing, he wouldn't sweat as much." --Nikola Tesla
[ Parent ]
No tabs. (none / 1) (#94)
by lookout on Sun Oct 31, 2004 at 06:41:52 PM EST

Again, in the real world there are all kind of people using editors that haphazardly replace a variable number of spaces (3, 4, or 8, depending on personal preferences) by tabs and vice versa. When their stuff is exchanged, it may look weird and disorganized on screen, but at least it will compile and one can do a block autoformat to clean up.

IMO significant whitespace still is a bad design decision; sorry. Don't worry though, even disabled formats have been known to live a fulfilling life.
 

[ Parent ]

So? (3.00 / 2) (#106)
by dn on Sun Oct 31, 2004 at 10:43:12 PM EST

Again, in the real world there are all kind of people using editors that haphazardly replace a variable number of spaces (3, 4, or 8, depending on personal preferences) by tabs and vice versa.
Randomly rewriting a file without the slightest understanding of its contents is a sin. People who do it should—and will—burn in hell forever. Whitespace-significant formats are a holy service because they automatically punish such sinners. The tab character was put in ASCII by God Himself.

Rules to live by:

  • Code is 100% syntax. Anything that does not serve syntax is an abomination against God and his prophet Knuth.
  • Layout is 100% phototypesetting. Attempting to do layout with a character cell font is an abomination, also against Knuth. Just use LATEX—your soul will thank you.

    I ♥
TOXIC
WASTE

[ Parent ]

Elements are for data. (none / 0) (#83)
by Meshigene Ferd on Sun Oct 31, 2004 at 05:28:20 PM EST

Attributes are for metadata.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

I know... (none / 1) (#91)
by lookout on Sun Oct 31, 2004 at 06:25:47 PM EST

...but this distinction is hard to make in practice, and almost never clear-cut.

In first example 'male' is the sex-attribute of the person, in the second example it is supposed to be data. While a given DTD determines whether a certain value is an attribute or data, for the programs exploiting the XML it may be just the other way round.

Syntax should play no role in semantics.


[ Parent ]

No, it's almost always clear-cut. (3.00 / 2) (#111)
by Meshigene Ferd on Mon Nov 01, 2004 at 03:48:15 AM EST

If it describes things it's data. If it describes data it's metadata. If you have a bunch of records about persons, record IDs are metadata and should be attributes. Person attributes like sex are data and should always be elements.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

i hate attributes too (none / 0) (#147)
by maccha on Tue Nov 02, 2004 at 12:22:15 AM EST

The distinction between tea and coffee drinkers is almost always clear cut. You wouldn't want that information in your e-mail address book though.

For the added complexity and hassle, attributes give very little value IMO.


(Or am I just talking a load of crap?)


[ Parent ]
I do happen to want this information in my (none / 0) (#151)
by Meshigene Ferd on Tue Nov 02, 2004 at 02:57:39 AM EST

address book, lol. Value is in the eye of the beholder; I'd hate to litter the data with irrelevancies like display styles or record IDs. The general rule is that if I strip all the markup I should be left with all the data and nothing but the data.
--
‮‫אַ גויישע קאָפּ!‮


[ Parent ]

Sooo lesseee (none / 0) (#166)
by kraant on Wed Nov 03, 2004 at 08:30:57 PM EST

Data is stuff that people view.

Metadata is stuff that programs use to organise data?
--
"kraant, open source guru" -- tumeric
Never In Our Names...
[ Parent ]

Boooooooo (none / 1) (#77)
by sethadam1 on Sun Oct 31, 2004 at 03:28:28 PM EST

Unfortunately, what most people use XML for could be accomplished in .ini files.  In fact, other than RSS feeds, I don't many many who use XML who should either be using simpler text files or should be in a database.  

Not true (none / 1) (#134)
by ttfkam on Mon Nov 01, 2004 at 12:35:04 PM EST

More people write document files than configuration files.  That means the XML/HTML family of markup.

As for INI files, it works if you are willing to (sometimes arbitrarily) flatten your configset to no more than one level of depth.

Why do you think no one writes INI files for Windows programs anymore?  It wasn't just because some wizbang new configuration tool showed up.  It was in large part because INI files suck for all but the most trivial of configuration files.

If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]

INI files don't suck that much. (none / 0) (#145)
by HereticMessiah on Mon Nov 01, 2004 at 08:48:01 PM EST

Seriously, it's a decent format. And the reason why people stopped using them is the dreaded registry. They just use the API that's there sitting for them.

--
Disagree with me? Post a reply.
Think my post's poor or trolling? Rate me down.
[ Parent ]
INI files suck ass. (3.00 / 2) (#150)
by ckaminski on Tue Nov 02, 2004 at 12:53:29 AM EST

[MYAPP.Global]

[MYAPP.INSTANCE1.controldata]

[MYAPP.INSTANCE1.database1]

[MYAPP.INSTANCE1.database2]

Fucking abyssmal.

<myapp>
  <instance1>
     <database1>
     <database2>
  </instance1>
  <instance2>
</myapp>

is much better.

Hell, I can implement XML as really bad regexps if I need to.  :-)

[ Parent ]

Along with YACC? (1.50 / 2) (#79)
by Sen on Sun Oct 31, 2004 at 03:43:07 PM EST

Sorry, the pretentious "yet another" turns me off to both. I used JavaCC, by the way--very good.

What?! (none / 0) (#141)
by awgsilyari on Mon Nov 01, 2004 at 05:00:00 PM EST

Sorry, the pretentious "yet another" turns me off to both.

I think you need a dictionary:

pretentious (adj.): Claiming or demanding a position of distinction or merit, especially when unjustified.

Ah. I guess that explains why they partook in a little self-deprecation in naming it simply "Yet another compiler-compiler..."

Anyway, Yacc was the first program (to my knowledge) to use the "Yet another" nomenclature. It's a little harsh to pick on the trendsetter for being trendy, don't you think?

--------
Please direct SPAM to john@neuralnw.com
[ Parent ]

Yes, pretentious (1.50 / 2) (#159)
by Sen on Tue Nov 02, 2004 at 04:50:25 PM EST

You have heard of irony and sarcasm? To me, "yet another ___" is sarcastic--and pretentious.

[ Parent ]
Been there, done that (1.00 / 3) (#96)
by rtmyers on Sun Oct 31, 2004 at 07:15:26 PM EST

Hey, before you went to all the trouble to spend your exhausting 5ms thinking this stuff up, why didn't you look to see if someone else had already done it, since they have? I'll leave it up to you to find the details, since you obviously need practice doing research on the net, but I'll give you hint: Google for "python-like XML".

IHBT, YTH, HAND (none / 0) (#139)
by regeya on Mon Nov 01, 2004 at 02:09:06 PM EST

LOL

Did you stumble upon this site, thinking that it was Slashdot? If so, you might not have realized (having completely failed to read the article, no doubt, and not having followed any links) you no doubt took the absense of a line reading "BTW, I came up with YAML all on my own, yessiree" to mean that, indeed, I did come up with YAML on my own.

did not. All I did was use YAML.

Why YAML rather than, say, SLiP? Because someone's put some thought into YAML since 2002, while SLiP seems to be a dead project. Ditto with SOX and any other similar project.

HTH.

[ yokelpunk | kuro5hin diary ]
[ Parent ]

STAR format (3.00 / 2) (#104)
by Thought Assassin on Sun Oct 31, 2004 at 10:19:19 PM EST

There's a format called STAR that's been used by various scientists for donkey's years. The files look very similar to YAML, but it already has strong support for a form of schema (known as dictionaries) that is a bit more powerful but sometimes a bit clunkier (numerous legacy tags) than XML schema.

Personally, I think all three schema languages have a long way to go, although there are better (but by no means perfect) alternatives out there for XML, and that the expressiveness of your schema language is far more important than the actual format. So although I favour the terser STAR/YAML style, I use XML personally because the chances are that's where

Disclaimer: I worked on the STAR project for a while. So I guess I'm partly to blame if the dictionaries don't live up to what I wish they were. I hope one day I'll have a chance to work on that stuff again.

human readability (2.50 / 2) (#153)
by maccha on Tue Nov 02, 2004 at 07:04:05 AM EST

Why is it that XML / YAML or whatever all insist on being readable and editable with a plain text editor?

As someone else pointed out, no-one who uses XML seriously would try to get by without tools for efficient viewing, editing and validation. And if you're committed to using tools anyway, why not go for the compactness and efficiency of binary formats?

It's a funny world where entire applications get written in C and C++, but nobody minds the resources involved in storing and parsing those start and end <namespace:tag attr="value"> tags </namespace:tag>.


(Or am I just talking a load of crap?)


I've always wondered this myself (none / 0) (#155)
by squigly on Tue Nov 02, 2004 at 08:21:16 AM EST

But I really dislike config files generally.  Variables in applications should have a default, and if they're changed, they should be changable by the application.

There are so many toolkits available that I can't really see a reason not to have a graphical configuration tool.  

[ Parent ]

Config files have lots of advantages (3.00 / 2) (#164)
by zakalwe on Wed Nov 03, 2004 at 09:53:03 AM EST

There's nothing about config files that prevent you having a GUI editor as well (though you can lose some functionality by doing so).

On the other hand, there are a lot of advantages to config files over binary / gui only configuration:

  • They are editable external to the application, so automatic configuration can be scripted (eg. for a company or distribution-wide policy)
  • They are self documenting - comments, example values can all be embedded right next to where the value is set
  • They are searchable. In conjunction with the above documentation, it is much easier to find the setting you're looking for. I spent 15 minutes today trying to figure out how to disable Word's autocapitalization. After looking through the 11 tabs on the "Tools/Options" menu, the 3 on "tools/Customize", I eventually had to google to find I should instead go to the "Format/Autoformat" menu, press the options button, and go to the Autocorrect tab, and delelect the "Capitalize first letter of sentences". With a text file, I would just have to search for "Capitalize"
  • They are portable. If I move to a new computer / freshly install an application, all I have to do to get my settings back is copy the config file. For gui-only configuration, I have to remember the exact settings, and laboriously reapply them
  • They can be backed up and old configurations can be easily reverted to.
  • They are editable by standard tools, so the method of changing them is consistent regardless of the application. You also gain many of the advantages of text editors (eg. cut and paste) that guis don't provide


[ Parent ]
Those aren't neccesarily advantages (none / 0) (#168)
by squigly on Thu Nov 04, 2004 at 12:48:38 PM EST

They are editable external to the application, so automatic configuration can be scripted (eg. for a company or distribution-wide policy)

Perhaps.  

They are self documenting - comments, example values can all be embedded right next to where the value is set

But a decent application doesn't need documentation.  All the documentation is in the configuration interface.

They are searchable. In conjunction with the above documentation, it is much easier to find the setting you're looking for. I spent 15 minutes today trying to figure out how to disable Word's autocapitalization. After looking through the 11 tabs on the "Tools/Options" menu, the 3 on "tools/Customize", I eventually had to google to find I should instead go to the "Format/Autoformat" menu, press the options button, and go to the Autocorrect tab, and delelect the "Capitalize first letter of sentences". With a text file, I would just have to search for "Capitalize"

Well, I sort of see your point, but I think the problem here is that Word has a really sucky configuration interface, that's even worse than a text config file.  Of course, you have to know you want "capitalize" as opposed to "capitalise", "AutoCaps", or "FirstLetterUpperCase".  Or maybe it isn't there at all.  So you need to look in the documentation to find out you actually need to add "NoAutoCapitalize = 1".

They are portable. If I move to a new computer / freshly install an application, all I have to do to get my settings back is copy the config file. For gui-only configuration, I have to remember the exact settings, and laboriously reapply them

A binary file can be copied as well.  There may be issues with upgrading, but really that just means you need a versioning system.

They can be backed up and old configurations can be easily reverted to.

Agreed.  This is a useful feature.  

They are editable by standard tools, so the method of changing them is consistent regardless of the application. You also gain many of the advantages of text editors (eg. cut and paste) that guis don't provide

This is what I see as a problem rather than a benefit.  I don't want to edit them with standard tools.  I want to edit them in a tool that will tell me what everything does.  I rarely want to copy and paste with a config file.  

A good config file can be better than a bad configuration interface, but I find that the usual cause of bad configuration interfaces is that they're too dependent upon the config file.

[ Parent ]

Advantages (none / 0) (#178)
by zakalwe on Mon Nov 08, 2004 at 11:11:13 AM EST

But a decent application doesn't need documentation. All the documentation is in the configuration interface.
Yes, thats what I mean. In a text file, it will be inline comments, in a GUI: tooltips, or onscreen descriptions. The text files version has the advantage that the documentation, as well as the values are searchable.

Of course, you have to know you want "capitalize" as opposed to "capitalise", "AutoCaps", or "FirstLetterUpperCase". Or maybe it isn't there at all. So you need to look in the documentation to find out you actually need to add "NoAutoCapitalize = 1".
No, generally all I'd do is type control-s capit, and I'd be there (or at least at some setting involving capitalisation - repeat the control-s till I'm at the right section). Incremental search is another great feature of text editors. Even if the configuration value is called "FirstLetterUpperCase", there should almost certainly be a comment mentioning capitalise nearby.
A binary file can be copied as well. There may be issues with upgrading, but really that just means you need a versioning system.
True, but if a user only ever sees the configuration through the GUI, will they even know where the file is stored? Binary files also tend to be much more fragile across different platforms and program versions.
I don't want to edit them with standard tools. I want to edit them in a tool that will tell me what everything does. I rarely want to copy and paste with a config file.
I guess we have very different habits then. I couldn't live without copy & paste, at least for any reasonably complicated file (eg. the apache config file). I always find it a pain making tedious changes in IIS that I know I could do with a few dozen keystrokes in apache just by pasting in the right values. Reverting, and saving off useful snippits is similarly easy.
A good config file can be better than a bad configuration interface, but I find that the usual cause of bad configuration interfaces is that they're too dependent upon the config file.
Even a bad config file is a lot better than a bad configuration interface, thanks to the tools that are available. In my opinion, a good config file also beats out even a good configuration interface. I have yet to see any GUI editor that matches the usefulness of the facilities my text editor provides.

[ Parent ]
no because... (none / 0) (#174)
by maccha on Sat Nov 06, 2004 at 07:56:45 AM EST

The advantages which you described mean that it's desirable to have a standard data format, rather than each program storing data in a proprietary manner. But everyone knows that already.

A standard binary format could have standard editors with all the features you describe. And it would be able to present them in a manner appropriate to the data's structure (probably a tree) as opposed to the left-to-right and top-to-bottom browsing which is only useful for text.


(Or am I just talking a load of crap?)


[ Parent ]
Right (none / 0) (#177)
by zakalwe on Mon Nov 08, 2004 at 10:49:06 AM EST

But right now, text is as close to a standard data format as you're likely to get. Given the diversity of applications, I wouldn't hold my breath for a better one arising anytime soon. Attempts to create the "one true data format" generally end up creating just one more incompatible standard.

Also, even if you do manage to get some standard format, theres no reason it couldn't be built on top of a text format. Theres nothing stopping you from writing tools that read and write text files and present them in your desired format. This way you preserve the ability to read and write them with the lowest common denominator of tools.

Don't knock text files "left to right, top to bottom interface" either - it has many advantages that I've yet to see in any GUI configuation screen. I've mentioned some in my original post - cut and paste being the main one. GUI's tend to be much more finicky about what you can cut and paste - you can't copy just half of one section's settings from a different file, comment out a few lines, and then change the name of a few of the settings. Other big ones are incremental searching, regular expression based search/replace and macros. Show me even one existing GUI based configuration with that kind of power.

Admittedly, GUIs do tend to be easier to use for novice users, who don't tend to use such features anyway, but it is a fundamentally bad idea to throw away all the benefits it gives to those who do.

[ Parent ]

Biggest advantage for me: (none / 0) (#179)
by warrax on Thu Nov 11, 2004 at 07:05:27 AM EST

Version control actually works without any special support. This is a huge boon if you're co-managing a server configuration (because there is an audit trail and isolating configuration problems becomes much easier with a VCS to help you)

-- "Guns don't kill people. I kill people."
[ Parent ]
plain text (none / 0) (#156)
by Viliam Bur on Tue Nov 02, 2004 at 09:57:31 AM EST

Why is it that XML / YAML or whatever all insist on being readable and editable with a plain text editor?

Because some people really edit this with a plain text editor. Even if you do not want to edit a whole document, the ability of easily "search and replace" using Notepad/KWrite is nice.

As someone else pointed out, no-one who uses XML seriously would try to get by without tools for efficient viewing, editing and validation.

Maybe my favourite text editor has some advantages above the "serious" XML editor. For example, Notepad is a part of standard Windows installation. Also M$IE does relatively good work viewing XML.

And if you're committed to using tools anyway, why not go for the compactness and efficiency of binary formats?

Perhaps because of a difference between situations when you can use specialized tools, and when you have to use specialized tools.

[ Parent ]

What's wrong with human readability? (2.50 / 2) (#160)
by regeya on Tue Nov 02, 2004 at 04:56:48 PM EST

Can someone explain to me why it's bad to be able to read a file format without either needing a detailed specification or a Godlike talent to interpret what one sees in a hex editor?

[ yokelpunk | kuro5hin diary ]
[ Parent ]

obviously... (none / 1) (#162)
by maccha on Wed Nov 03, 2004 at 07:59:37 AM EST

Everyone wants readibily, which is why, if you need to examine more than a tiny amount of data, you have no choice but you use a dedicated tool.

If you don't understand what I mean, try opening a couple of megabytes of XML using notepad.


(Or am I just talking a load of crap?)


[ Parent ]
ridiculous example (none / 0) (#163)
by regeya on Wed Nov 03, 2004 at 09:07:45 AM EST

And besides, throwing a largish set of data into an XML file is stupid. Can anyone tell the class what a DOM-based solution will do with that file? Now let's expand that file to the point that it's 10x the size it is now. Can anyone tell me what a DOM-based solution will do with that file?

[ yokelpunk | kuro5hin diary ]
[ Parent ]

not so ridiculous (none / 0) (#167)
by maccha on Thu Nov 04, 2004 at 03:50:57 AM EST

And besides, throwing a largish set of data into an XML file is stupid.

Whether XML data sets should (in your opinion) be large or not is less important than whether people are actively creating huge XML files and (even more bizarrely) apps that read them. Check out Jim Breen's JMDict if you think I'm making this up.

IMO, XML was a bad solution to a very real need... and now all sorts of people are using XML even if they didn't have the need in the first place.


(Or am I just talking a load of crap?)


[ Parent ]
I see. (none / 0) (#170)
by regeya on Thu Nov 04, 2004 at 08:57:27 PM EST

So having people create ultra-large XML files is the best example you can come up with for why human-readable formats are a bad idea?

Oh, please...

If you're going to argue against human-readable files, at least come up with a good example.

[ yokelpunk | kuro5hin diary ]
[ Parent ]

It's ugly and bloated (none / 0) (#169)
by squigly on Thu Nov 04, 2004 at 01:01:56 PM EST

Text files take up a lot of space, require a lot more code to parse, and are more likely to contain errors that may cause a crash or undesirable behaviour.  Look at how different web browsers handle broken HTML.

The idea with binary formats is you don't need to use a hex editor.  You have tools that can edit the data.  Instead of standard text formats, you use standard binary formats.  You have editors that can edit that format.  

[ Parent ]

Ease-of-use (none / 1) (#165)
by baloo on Wed Nov 03, 2004 at 05:12:49 PM EST

The main reason IMHO for human-readable file formats is because despite there being a wonderful parser implementations available for a binary format, many people for various reasons will not be able to use that and will have to implement their own parsers. If you've ever tried to do parsers for both paradigms you quickly learn how incredible faster it is to create a basic parser for a human readable format. This makes the human readable standard more readily used which will yield larger general acceptance - which in many ways is the purpose of any standard.

[ Parent ]
no way! (none / 1) (#175)
by maccha on Sat Nov 06, 2004 at 08:07:53 AM EST

Knocking up a .ini file parser from scratch is pretty easy in a scripting language, but what's the point since there's always one in the standard library?

On the other hand, writing a conformant XML parser is a mammoth job. No I mean really really large. Making it efficient is even harder. You would be crazy not to use a existing package like expat.

BTW, I do have a fair amount of experience in writing parsers, and generally I find binary much easier to parse because of the fixed field sizes and fewer worries about text-encoding.


(Or am I just talking a load of crap?)


[ Parent ]
Ah, but you're forgetting (none / 0) (#180)
by baloo on Thu Nov 11, 2004 at 10:22:18 PM EST

that parsers come in different varieties. :-)

I myself (also) have extensive experience to write software which translates one format into another. A couple of times a year I find myself in a situation where there are no suitable parsers available. In those cases I often end up in writing a task-specific mini-parser that does what I need it to do (and nothing more).

For instance I once wrote an (not fully compliant*) SGML-parser that had a decent degree of fault-tolerance and excellent error-reporting (which at the time was more or less impossible get from existing libraries). But I agree that it was a big undertaking.

* In fact it really didn't need to be fully compliant since the target SGML domain was rather narrow and the (not so few) errors where introduced by slippery human hands.

When it comes to the issue of if parsers are easier to write for human readable or non-human readable formats, I guess we've encountered different real-world situations.

[ Parent ]
Let me tell you why (none / 0) (#171)
by danb1974 on Fri Nov 05, 2004 at 12:45:53 AM EST

Ever had a 2 meg .doc file that got completely screwed because 1 bloody reverted bit? And because it is a binary proprietary format you can kiss it goodbye? That's the point your wonderful graphic interface is completely useless with that wonderfully painted windows telling you the file cannot be opened - and you start dreaming at human readable files...

[ Parent ]
your argument is insane (none / 0) (#176)
by maccha on Sat Nov 06, 2004 at 08:27:17 AM EST

Sorry, I just don't have the energy to explain why at the moment. Please swot up on Shannon and information entropy.


(Or am I just talking a load of crap?)


[ Parent ]
YAML must die (1.00 / 3) (#181)
by aminorex on Sun Nov 14, 2004 at 08:50:21 PM EST

Anything where whitespace is significant is worse than nothing at all.

Why YAML? Why not? | 184 comments (166 topical, 18 editorial, 2 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!