Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Time for XML rcfiles

By phinance in Technology
Thu Jan 11, 2001 at 01:10:36 PM EST
Tags: Software (all tags)
Software

It's time to switch to XML as the standard rc-file format for UNIX services being provided by free/open source software. This would reduce duplicated work, reduce feature bloat of applications, and make writing Linux administration tools a lot easier. I'll use mail filtering as an example, but you're free to keep your own pet peeve in mind as you read ;)


XML in 69 Words

XML formatted files are structured, and their structure is described in a standard (also structured) file called a Document Type Definition, or DTD. All this structure, put simply, means that the files can be processed by a computer program automatically (using, incidentally, freely available libraries like expat). The important implication here is that the files can be processed by any interested program, as long as the DTD is available.

Mail filtering
procmail filters mail very well. KMail -- which I love and use and am grateful for -- also filters mail, but it's filtering is not nearly as powerful as that of procmail. Why? Because the KMail filtering was written from scratch as part of the development of KMail when the developers had a million other KMail-related problems to solve.

A better solution might have been to write a configuration dialog (or perhaps a configuration wizard!) for procmail and not provide any filter services. My guess is that the formidable task of parsing and rewriting someone's procmail run control file (.procmailrc) without ruining it (or having to add comments like "Don't change this file by hand! You must forever use the KMail configuration dialog! Sorry we didn't mention that earlier!") was the obstacle to this approach.

If .procmailrc were an XML file there would be no such obstacle. KMail would be able to read and write the file, changing only the options that were currently supported by its configuration dialog. All of the filtering work would be done by (tested, powerful, and preexisting) procmail. This new configuration dialog would enhance the usability of procmail and the feature set of KMail, and save KMail developers the time it took to write the mail filtering code. This kind of cooperation -- at least, the potential for it -- is what makes open source software so great.

So what are people waiting? Are the problems with XML that I haven't thought of?

Related articles:

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Related Links
o expat
o procmail
o Expanding the use of XML in Linux?
o There is more to XML than roll-your-own HTML
o Also by phinance


Display: Sort:
Time for XML rcfiles | 110 comments (109 topical, 1 editorial, 0 hidden)
New stuff (2.86 / 15) (#1)
by trhurler on Thu Jan 11, 2001 at 12:02:40 PM EST

Don't get me wrong. I sincerely hope XML changes the world for the better, makes my life easier(I'm a programmer,) and all that. However, you must excuse me for being somewhat skeptical. I seem to recall that the same promises have been made about every new technology developed in software for the last 20 years or so, and yet the easiest, best way for me to do my work is still a mix of old scripting languages(perl, shell,) and C, and the easiest way to deal with other programs is still to kludge them together. It could happen, don't get me wrong - but I try not to be a hype bandwagoneer. There are enough ZDnet and Slashdot hacks for that job - mine is to write code that works.

Even relative newcomers to computing should remember the hype machine that surrounded Java. We were told it'd do everything, including our laundry and sexual favors. Somehow, that didn't quite happen, and these days, Java is merely another buzzword compliant way to obscure programmatic logic using half-assed APIs and proprietary tools. Whee. See why I'm a bit cynical?:)

--
'God dammit, your posts make me hard.' --LilDebbie

Dont even get me start on this java thing. (1.75 / 4) (#45)
by darthaya on Fri Jan 12, 2001 at 12:07:51 AM EST

Geez, and I thought you were a programmer that you wouldn't be so close-minded. Guess I am wrong. You are not being cynical, you are just showing your ignorance to the general public without censoring it.

[ Parent ]
You should read more carefully... (3.60 / 5) (#59)
by trhurler on Fri Jan 12, 2001 at 11:27:28 AM EST

I didn't say Java was any worse than most other languages, which is obviously what you seem to think I said. Yes, Java sucks. So does every other programming language. C sucks less, but it still sucks in a lot of ways. As it happens, Java is probably better than most, but it isn't as good as C. Fortunately, it IS better than C++, which is an abomination loosed upon mankind. However, the main thrust of my previous claim, which happens to be true, is that Java's hype far outstripped its ability to deliver the goods. The fact that most of the claims hyped never came true is the proof. Have a nice day:)

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
Simplicity (4.07 / 13) (#2)
by Kaa on Thu Jan 11, 2001 at 12:09:11 PM EST

Simplicity is a virtue. What's wrong with a simple "key = value" ASCII text file? It, too, can be processed by any interested program and doesn't need any extra libraries.

XML is very verbose, and, depending on the DTD, can be very complex. It is appropriate in some domains, but I would argue strongly against making *every* config file XML -- why introduce unnecessary complexity?

Your example of KMail rewriting .procmailrc is somewhat optimistic. First, changing it by hand has as much probability of screwing up in XML as in the current format. Second, you assume that if another program can understand the format, it can understand the semantics. That is not true. Let's say that you program read my XML config file and correctly figured out that I set variable foo to value "bar". So? Without understanding what it means this information is useless. In your example KMail still has to understand how procmail works and what does changing certain symbols in its config file actually *means*. Parsing the file is not a problem -- understanding it is, and here XML is not much of a help.

Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.


XML is simplicity (3.55 / 9) (#6)
by 0xdeadbeef on Thu Jan 11, 2001 at 12:24:57 PM EST

That is, unless you count the "library" that is every hair-brained parser written into every application that needs configuration information. If name-value pairs were sufficient then we'd be using nothing but environment variables. Coding semantics is trivial compared to coding a useful parser.

The issue of semantics exists regardless with the storage format. But with XML, the syntax is well defined, and you don't have to reinvent the wheel to parse it. The ease at which graphically-based, domain-specific configuration tools can be constructed when the underlying storage is in XML solves the complexity problem to a large degree. And even if there is no configuration tool, there are many tree-view-based XML editors.

About the only thing that Microsoft has done that impresses me is the fact that they standardized on a common configuration format, first with the .ini file, then with the system registry. We can do the same with XML, and we don't even have to use a global database to do it.



[ Parent ]
Standards (3.85 / 7) (#22)
by Kaa on Thu Jan 11, 2001 at 01:39:41 PM EST

Coding semantics is trivial compared to coding a useful parser.

I doubt it.

If your semantics are trivial, then coding them is trivial, too, but then your syntax can be trivial as well. Parsers are well-understood, there are ready-made parsers available to read a whole bunch of various syntax, and tools to make new parsers less painful exist. Semantics, on the other hand, is purely application-dependant. Unless I am trying to be weird, I can pick a ready-made syntax/parser pair for a great deal of what I need. Unfortunately, this doesn't work for semantics.

The ease at which graphically-based, domain-specific configuration tools can be constructed when the underlying storage is in XML solves the complexity problem to a large degree.

That's a myth. Complexity is very rarely "solved" by providing graphical tools to manipulate it. Granted, bad tools can increase the perceived complexity greatly, but in the final analysis the problem itself has an irreducible level of complexity that the best tools in the world will not do much about.

As an example, Photoshop provides a nice set of graphical tools for manipulating bitmap images. Yet, the complexity of, say, color-correcting a photograph, remain. People who think that it's all a matter of clicking a few buttons usually find quite rapidly that it's not just as simple.

Common configuration format is a good thing, but trying to force a single one on applications that need just the values of a couple of variables, and on applications that expect complete rulesets from their config files will probably be very ugly.

System registry, by the way, wasn't such a great idea. Any stupid program can mangle it, and trying to fix a broken registry (other than reloading from a backup) is usually hopeless. I am willing to bet that subtle and not-so-subtle registry errors are responsible for much of Windows' stability problems. I know that on my machine reloading a known-to-be-decent registry once in a while has huge effect on machine stability.

Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.


[ Parent ]

Logic is easy, good structure is hard (3.50 / 4) (#30)
by 0xdeadbeef on Thu Jan 11, 2001 at 03:47:11 PM EST

The semantics don't change with the syntax of your configuration files, either XML or other. The the semantics of the typical configuration file are trivial. The semantics of configuration in general are far simpler than that involved in an application's execution. You're mixing apples and oranges here.

Besides, as my subject alludes, most applications are far more complex in their data organization than they are in their execution, and the semantics are more easily expressed in that organization. As one early software engineering disciple put it: "forget the flowcharts, show me the tables". Photoshop is an unlikely example.

And if parsing is so trivial, why is writing parsers still considered a "black art" of programming? Why do custom languages and hand-rolled parsers always suck so bad? (And why do so many programmers ignore the tools that already exists?) I actually have a some experience in this area, and while I find it straightforward, I don't consider it trivial. It certainly isn't something the typical VB programmer is capable of, yet they seem to have no trouble configuring their applications with the registry.

The distribution of Unix's configuration among different files in different locations, and the textual nature of XML would prevent the sort of problems cause by corruption of the Windows registry. The only critical piece is the part that tells you where files are located, and that can be fixed with a file-system crawler and a standard way of identifying to which program a configuration belongs.

[ Parent ]

parsing as black art (none / 0) (#78)
by mikpos on Fri Jan 12, 2001 at 06:40:46 PM EST

And if parsing is so trivial, why is writing parsers still considered a "black art" of programming?

Because it's not? Every serious language has (or should have) at least good BNF or (more likely) BNF-like tool available. It's only a black art if you go at it without any design or any idea of where you're going with it, as is with any other aspect of computer science.

And why do so many programmers ignore the tools that already exists?

I haven't found this to be the case. Unless it's a simple key-value pair style rcfile (in which case a scanf-style approach would probably be best), people always almost use YACC grammar, usually in conjunction with Lex. If you already know how what syntax you want, and you're comfortable with BNF grammar, coding a parser truly is trivial (though dealing with the semantics is quite another story entirely).

[ Parent ]

What a not-so-silly idea! (3.11 / 9) (#3)
by djkimmel on Thu Jan 11, 2001 at 12:13:23 PM EST

My first reaction to this was "What a silly idea!", but after reading your example, I agree, to an extent.

The advantage of XML is, of course, that parsers are already available and let the application writer concentrate on writing the application, not the silly little things like a config file parser.

However, I think that XML should NOT be used for every configuration file. For example, the boot process in FreeBSD is controled by /etc/rc.conf. This is just a shell script that sets variables, such as sendmail_enable="YES". To handle editing, the configuration program (/stand/sysinstall) simply appends changes to the end. So if it appends sendmail_enable="NO", then that overrides the sendmail_enable="YES" that appeared before.

The advantage of this format is that it is easily handled by shell scripts, since it is a shell script itself. I don't know how easy or practical it would be to use XML for this particular task.

I do like the idea of using it for programs that provide a "common" feature that can be used in different applications, such as procmail and mail filtering. Your example clearly shows the advantages of this.
-- Dave
Why I believe this men! (2.83 / 6) (#4)
by Dries on Thu Jan 11, 2001 at 12:20:13 PM EST

I did a:
    wc -w
(alas "word count") on his "XML in 69 Words" paragraph and must admit that this title is 100% justified: it are 69 words no matter what.
-- Dries
Ha! (3.00 / 1) (#9)
by phinance on Thu Jan 11, 2001 at 12:48:48 PM EST

Thanks for the vote of confidence. ;)

Dave
Read, annotate, and discuss open source documentation.
Andamooka: Open support for open content.
[ Parent ]
The hype (4.10 / 20) (#5)
by evvk on Thu Jan 11, 2001 at 12:22:40 PM EST

What's this hype with using XML for everything it was not certainly meant for? Programming language, configuration language, everything.
Some examples to consider:

Text formatting -- clearly a programming (or configuration) language doesn't suit this:

XML-syntax:
This is an <em>example</em> of <b>formatted</b> text.

LaTeX (nicer):
This is an {\em example} of {\textbf formatted} text.

Programming language:
output("This is an ");
emphasize("example");
output(" of ");
boldface("formatted");
output("text.");


Programming -- clearly XML sucks at this:

Hypothetical XML:
<for variable=i value=1:10>
   <assign variable=x value=i^2>
   <print value=x>
</for>

Programming language:
<pre>
for i=1:10
   x=i^2;
   print(x);
end
</pre>


Configuration file -- just look at how oververbose XML is:

XML:
<kpress>
<key>F2</key>
<function>exec</function>
<argument>xterm</argument>
</kpress>

One possible configuration language:
kpress "F2", "exec", "xterm"

and in another, more extendable way:
kpress {
   key "F2"
   action "exec", "xterm"
}

The point is, use the language for the purpose it was meant for. XML is meant for embedding metadata in data, LaTeX is great for writing documents, a programming language for programming, etc. I do, however see that a more standard configuration language is needed, but it should be something that is nice to edit by hand.


Dammit (2.50 / 2) (#7)
by evvk on Thu Jan 11, 2001 at 12:28:55 PM EST

Wasn't thinking that XML has the slash at the end of the closing tag, not the beginning. Well, shouldn't affect the point anyway. (And those extra <pre>s... must. try to check for mistakes..)


[ Parent ]
The slash _is_ at the start of the closing tag (none / 0) (#37)
by reeses on Thu Jan 11, 2001 at 07:01:22 PM EST

Wasn't thinking that XML has the slash at the end of the closing tag, not the beginning.
Actually, if you have matched-pair tags <foo>blah blah</foo>, then it's just like HTML: the slash is the first character inside the brokets.
If you have a monatomic tag <foo arg="blah blah" /> then you can close it with a slash as the last character inside the brokets. <foo arg="blah blah" /> is equivalent to <foo><arg>blah blah</arg></foo>.
(I'm sure someone will correct me, but for the issue I'm quoting, the example is sufficient.)

[ Parent ]
Not so, grasshopper... (3.60 / 5) (#11)
by Whizard on Thu Jan 11, 2001 at 12:56:04 PM EST

While I agree with you regarding programming languages, I'm forced to disagree with you here:
XML: <kpress> <key>F2</key> <function>exec</function> <argument>xterm</argument> </kpress> One possible configuration language: kpress "F2", "exec", "xterm"
What would be so wrong with:

<keypress key="F2" function="exec" argument="xterm"/>

? That's perfectly valid XML. I personally think that having a common parser available that all apps could use to parse their config file would be a wonderful thing...just think about all the redundant work done by each application on your system that has to parse a different configuration file...what if each of those applications could just call a common XML parser library, and be done with it?

Oh, and by the way, it's only non-closed tags that have the '/' at the end. i.e. <b> </b> is a valid pair, and <p/> is valid.


--
So Lawrence Lessig, John Perry Barlow, Rusty, and Prince are having dinner...
[ Parent ]

Multiple arguments (3.75 / 4) (#14)
by evvk on Thu Jan 11, 2001 at 01:10:14 PM EST

> <keypress key="F2" function="exec" argument="xterm"/>

How about unspecified number of arguments? You can't really do it that way, can you? And still, the syntax is far from nice to edit or read IMHO.

> just think about all the redundant work done by each application on your system that has to parse a different configuration file...

Yes, but why would the "standard" configuration file format have to be just XML? A better alternative can certainly be devised for that purpose. I see that by starting to using XML, we could as well start using binary configuration files and lose nothing and win by the "parsing" being more efficient.


[ Parent ]
Except hand-editing... (3.40 / 5) (#20)
by Whizard on Thu Jan 11, 2001 at 01:24:55 PM EST

Ok, then go with:

<keypress key="F2" function="exec">
<argument>xterm</argument>
</keypress>

I don't find that bothersomely ugly to type once in a while...and if it were something that needed to be edited frequently, it's very easy o write an interactive config editor in XML. I did one for the application I'm doing for work in about 2 days, and the config file for it is not pretty.

Yes, but why would the "standard" configuration file format have to be just XML? A better alternative can certainly be devised for that purpose.

XML is here. XML is more than adequate for the purpose. Why would we want yet another standard? Perhaps we should embrace and extend XML! ;->

I see that by starting to using XML, we could as well start using binary configuration files and lose nothing and win by the "parsing" being more efficient.

Except that with binary config files you lose the ability edit them by hand. Despite what people seem to think. XML is very easy to edit by hand. Ever written HTML by hand? It's the same thing!


--
So Lawrence Lessig, John Perry Barlow, Rusty, and Prince are having dinner...
[ Parent ]

Missing the point (3.83 / 6) (#23)
by evvk on Thu Jan 11, 2001 at 01:42:53 PM EST

> it's very easy o write an interactive config editor in XML

But my plain old text editor is much better, if just the syntax were good.
Practically requiring a specialized editor is just the same as with binary configuration files. I could edit them with a hex editor (or any editor having converted to ascii/hex code), but it is not very nice.

> Ever written HTML by hand? It's the same thing!

If you think so, you have missed one of the points of my original post. In HTML documents (with some information content; not just graphics and scripting crap) most of the characters are data (text) and there's very little tags compared to that. That's why a tagged language suits the purpose. In configuration, like programming languages, there really isn't any data, just the commands and thus tags are cumbersome to use.

[ Parent ]
re: Missing the point (3.75 / 4) (#28)
by ScottBrady on Thu Jan 11, 2001 at 03:00:25 PM EST

>> Practically requiring a specialized editor is just the same as with binary configuration files. I could edit them with a hex editor (or any editor having converted to ascii/hex code), but it is not very nice. <<

I see you making the same incorrect assertions again and again. Are you certain you understand what XML really is?

XML is:

  • Not Compiled.
  • Stored in ASCII.
  • A standard data format that has a definable structure.

If you want a standard format that is extensible, you need a way to define content classifications. Yes, that does create code bloat, but it is much less obtuse than a program-specific implementation.

Case in point, CRON:

0 1 2 3 4 /foobar

What the hell are those first five values? Not only does a human have no idea what that means but software does not know either unless the data format is hard coded in.

XML'ed version:

<job>
    <minute>0</minute>
    <hour>1</hour>
    <dayofmonth>2</dayofmonth>
    <month>3</month>
    <dayofweek>4</dayofweek>
    <exec>/foobar</exec>
</job>

The above is definable with a DTD and parsable with widely available XML engines. Also, in the absence of configuration software (or lack of desire to use such software) can be hand edited without reading the man page (not that you should ever do such a thing).

--
Scott Brady
"We didn't lie to you... the truth just changed."
YHBT. YHL. HAND.
[ Parent ]

You are missing the point (3.00 / 2) (#29)
by evvk on Thu Jan 11, 2001 at 03:19:09 PM EST

No, it is you who is not understanding or just are too blind to see anything from the hype.

> <job>
> <minute>0</minute>
> <hour>1</hour>
> <dayofmonth>2</dayofmonth>
> <month>3</month>
> <dayofweek>4</dayofweek>
> <exec>/foobar</exec>
> </job>

job {
   minute 0
   hour 1
   dayofmonth 2
   month 3
   dayofweek 4
   exec "/foobar"
}

See! Nothing cryptic about that, options in clear english. Now, which one is easier to read? Both have a standard format in addition to that the "config format" distinguishes strings and numbers.
There's no sense in having tags (and especially end tags) for defining values to variables when they're not embedded in data.

> can be hand edited without reading the man page (not that you should ever do such a thing).

And where am I supposed to find all the options? The manual page or comments in the file/DTD?


[ Parent ]
what's the difference? (3.33 / 3) (#34)
by speek on Thu Jan 11, 2001 at 05:41:35 PM EST

So what's the difference here? You like one way, others like the XML way. Both are striving to make config files easier to deal with. The computer could care less which way you do it. To humans, both of the above are fine. The XML has more "stuff", but I don't see that as a big deal. The benefit to XML is that it's more extensible, and ot's a well-defined standard with lots of tools out there already in existence.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

Or (3.00 / 3) (#35)
by ucblockhead on Thu Jan 11, 2001 at 05:41:55 PM EST

Perfectly valid XML:

<Job
Minute=0
Hour=1
Dayofmonth=2
month=3
dayofweek=4
exec="/Foobar"
/>


-----------------------
This is k5. We're all tools - duxup
[ Parent ]
Just a minor detail... (4.00 / 3) (#42)
by bigdan on Thu Jan 11, 2001 at 09:54:36 PM EST

What you posted is not valid or well-formed XML.

Valid or well-formed XML must have the attribute values in quotes. To turn that in to well-formed XML it would have to be

<job
minute="0"
hour="1"
dayofmonth="2"
month="3"
dayofweek="4"
exec="/Foobar"
/>


[ Parent ]
Ugly (1.00 / 1) (#48)
by evvk on Fri Jan 12, 2001 at 02:52:58 AM EST

This is ugly and has the problem described earlier in this thread: it is not possible (or convenient) to have multiple attributes of the same name.
<foo bar=baz bar=quk>

[ Parent ]
Not quite (5.00 / 1) (#64)
by costas on Fri Jan 12, 2001 at 02:52:47 PM EST

You're not thinking of XML in the proper sense; try to think of it as an OOP data structure. If 'foo' was a class, you couldn't really have two attributes named 'bar'. What you could do in OOP (and it's valid XML) is to either have two children of type 'bar' or make the attribute 'bar' a list. I.e.:
1) <foo><bar>baz</bar><bar>quk</bar></foo>
2) <foo bar="baz quk" />

The choice between the above two is a design compromise. If 'bar' is a plain text/numeric attribute, you might as well use (2) and have a cleaner layout. If, OTOH, bar may be expanded in the future --i.e. instead of a 'basic' type, it has to be an 'object'; note the OOP terminology which doesnt necessarily correspond to XML-- you should use (1). If (1) is too verbose, you can make it less so:
3) <foo><bar id="baz" /><bar id="quk" /></foo>

Which can actually let you do way more than (1). E.g. in (3) you can force 'id' to be unique across all 'bar's in the XML file (sort of like a DB key or a memory address in a 2-3GL or a dictionary key in a 4GL).

XML is really way more than structured, hierarchical data. Besides the above, you can enforce rules, apply mixed namespaces (w/ Schemas, not DTDs), etc. It's exceptionally powerful and it's here *now* with a lot of accompanying tools in a lot of languages on almost any platform. I don't see the hang-ups here guys, besides (justified) buzzword aversion. But sometimes, buzz *is* justified.


memigo is a news weblog run by a robot. It ranks and recommends stories.
[ Parent ]
Leave my crontab alone! (1.00 / 2) (#46)
by tmoertel on Fri Jan 12, 2001 at 12:15:02 AM EST

Case in point, CRON:
0 1 2 3 4 /foobar

Not to let reality interrupt your discussion, but maybe we ought to consider what a crontab file is and who uses it before trying to determine what the best format for it is. To wit: No admins I know would want to enter a bunch of entirely unnecessary open and close tags in order to add a new cron job.

In my opinion, the existing crontab format is near optimal. It's simple, easy to understand, and easy to change. It's "just the facts, ma'am," pure and simple.

If you find it too cryptic, try this out:

# min hr day month weekday commandline...

It's called a comment. See, now how hard is it to remember the entry format?

Please, before we go deciding to bloat up every possible configuration file with extraneous garbage, let's think about whether it's really necessary.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
Not human editable (5.00 / 2) (#67)
by costas on Fri Jan 12, 2001 at 03:07:33 PM EST

You guys (anti-XML fans :-) are forgetting one big thing: these files should also be machine-readable, not just human-readable. Why? because then I could have a linuxconf that *works* instead of the mess we got now. Because network management could work seamlessly instead of trying to parse, edit and then rewrite a gazillion different rc formats.

As for human readablity/editability: editing an XML file in emacs/vi should be a *last resort*, something that follows a system failure. Under normal operating conditions you should be changing cron through a dedicated editor (dedicated to XML, not cron; which is huge) or through a script.

Let me show you something... if this was a crontab snippet:
<job id="apachelog" hour="5" minute="10" command="/root/run_analog.sh" />

it would literally take me 5 minutes to throw together a Python script that would do this (I do this already though, so I got the code library ready :-), 2 hrs maybe from scratch):
print "Apache logs run at: " + cron.apachelog.hour + ":" + cron.apachelog.minute

(you can override '.' in Python, and since the id is guaranteed to be unique, you can do some neat stuff :-)



memigo is a news weblog run by a robot. It ranks and recommends stories.
[ Parent ]
XML is one tool of many (4.00 / 1) (#90)
by tmoertel on Sat Jan 13, 2001 at 01:02:20 AM EST

You wrote:
You guys (anti-XML fans :-) [...]

First off, I'm not anti-XML. I'm a big fan of SGML, XML, and even -- heck! -- XSL's big, nasty older brother DSSSL. Go search old archives of comp.text.sgml or the DSSSList. You'll find me there.

When I say that I don't think that XMLizing every configuration file is a good idea, I'm speaking from experience.

XML, like just about everything else, is a tool. Where it makes sense, use it; where it doesn't, don't. If we're talking about complex configuration files that cannot have a simple structure, say, for Apache, SGML/XML is a great way to go. Do it. But for crontab? No way.

Let me show you something... if this was a crontab snippet:
<job id="apachelog" hour="5" minute="10" command="/root/run_analog.sh" />

If that were a crontab snippet, you would have chopped cron's legs off because you've made command an attribute. Because of XML's attribute-value normalization rules, you have made multi-line commands impractical (unless you consider one-insanely-long-line to be practical), and that's A Bad Thing. XMLizing some of these configuration files isn't as straightforward as it might seem.

Finally:

As for human readablity/editability: editing an XML file in emacs/vi should be a *last resort*

Not a chance. Please don't forget about all the environments in which visual tools are not possible and where simple text editors may be all that's available. Single-disk distros, installations gone wrong, embedded systems, low-memory boxes, remote configuration via modem, older boxes used in developing nations, and so on.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
Lightweight XML editors (4.50 / 2) (#95)
by costas on Sat Jan 13, 2001 at 11:25:10 AM EST

OK, you're correct on the XML snippet above; command should have been the node's contents --that's me making a bad example, not an XML shortcoming, though.

As for XML not being practical for embedded systems, etc. Well, a) I can see a C/C++-based, ncurses XML editor that would fit fine in a single-floppy distro or in an embedded system. And in the long run, the code savings from not having every little utility do its own config parsing, etc might actually make this scheme *leaner* than what we have now.

memigo is a news weblog run by a robot. It ranks and recommends stories.
[ Parent ]
there's the rub (3.14 / 7) (#21)
by speek on Thu Jan 11, 2001 at 01:38:22 PM EST

One possible configuration language:

Exactly - one possible. And another possible, and yet another possible - as many possibles as there are developers. There's value on agreeing on a standard way of doing such things - regardless of what the chosen standard is. XML works and it's informative to read - unlike your other examples.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

Read it again, the last sentences too... (2.25 / 4) (#25)
by evvk on Thu Jan 11, 2001 at 01:50:15 PM EST

... and XML is no more informative to read than the last example and yet the last one is a lot easier to piece together.

[ Parent ]
I disagree (3.50 / 4) (#32)
by speek on Thu Jan 11, 2001 at 05:33:28 PM EST

The xml is more informative - in plain english it tells me what each part is (exec is the function, xterm is an argument, F2 is the key pressed). Your last example attempts to do the same, but it doesn't have any info on what "xterm" is doing there. And I don't see how the last one is easier to piece together. By the computer? By a human? For a human, the XML has more info. For the computer, the xml seems easier, particularly since the parsers are already written.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

Some of us like to be easy to read (5.00 / 1) (#49)
by evvk on Fri Jan 12, 2001 at 03:15:28 AM EST

The other format can have all the same information, just put it like:

kpress {
   key "F2"
   function "exec"
   argument "xterm"
}

(Although I prefer the one with just action.) That is still much more readable than the XML version -- A _lot_ higher signal to noise ratio. It is like listening to a badly tuned radio station versus a CD. And with this kind of syntax, one could have "arguments arg1, arg2, ..." instead of tsomething as hard to read as "<argument>arg1</argument><argument>arg2</argument>...".

Consider

   key value

vs

   <nonsensekey>value</keynoensense>

Like I've said n times, tags suit embedding metadata (information) in data. Most configuration files are just key-value pairs and the others, for highly configurable programs, are code (S-lang, lisp, etc.). There's just no need to use a complicated, tagged syntax.


[ Parent ]
code to represent data? (none / 0) (#52)
by speek on Fri Jan 12, 2001 at 07:48:35 AM EST

Regarding more complicated configs being actual code, I don't understand. Isn't config information just data? Why do I want to use executable code to represent data? Wouldn't it be better to have the just the bare data so that any third-party could come around and use it?

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

Extension languages (4.00 / 1) (#57)
by evvk on Fri Jan 12, 2001 at 11:22:34 AM EST

> Regarding more complicated configs being actual code, I don't understand. Isn't config information just data? Why do I want to use executable code to represent data?

This means programs that have an extension (scripting) language. In most cases, it is no use to use a separate config file format, when you can use the same language for configuration parameters as the extensions. Jed uses S-lang. Emacs uses lisp. Some programs use python, etc. Of course, it is difficult for external programs to use the settings (or that program to save them) but where would you draw the line between settings and extensions? Take keybindings, for example. User can define own routines to handle keys so it is, in my opinion, better to create the bindings from the script and not set them in a special configuration file.


[ Parent ]
is this executable code or configuration data? (5.00 / 1) (#58)
by sayke on Fri Jan 12, 2001 at 11:24:36 AM EST

maximize="right"
minimize="right"
close="left"
rootmenu=menu(
    title="Root Menu",
    items=(
        menu(
        title="Editing",
            items=(
            menuitem(
                title="GNU Emacs",
                exec="emacs",
                icon="/usr/share/emacs.png"
            ),
            menuitem(
                title="gvim",
                exec="gvim",
                icon=""
                )
            )
        )
    )
)

heh. see? no meaningful difference. config information is generally a bunch of key-value pairs, and it's only when you have values that are key-value pairs (sometimes of named types) themselves do things get slightly funky, and then things start looking more like a programming language and less like configuration data...


sayke, v2.3.1 /* i am the middle finger of the invisible hand */
[ Parent ]

data (3.00 / 2) (#63)
by speek on Fri Jan 12, 2001 at 02:39:22 PM EST

That's config data ... until someone writes an interpreter for your well-defined syntax that enables one to pass the file to the interpreter and "run" it.

Let me ask you this - you've probably written multiple programs in your life: do you tend to use the same underlying structure for your configuration files (maybe even what you presented in your post), or do you invent a new one for every program you make - one that's especially well suited for that program? My guess is, you reuse the same structure and hence reuse a lot of your old code for reading it. If not, if you create brand new config-file-parsing code each time...

I leave it to the reader to relate this to the discussion on XML.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

nope. =) (5.00 / 1) (#73)
by sayke on Fri Jan 12, 2001 at 04:49:04 PM EST

the example i wrote is actually *this* close to being gen-yoo-wine python ("[]", not "()" are the list operators in python; and i don't know what python would think of my use of whitespace) code. see, menu() and menuitem() are basically constructors, and in the little pythonic pseudolanguage i used for the example, parens are both argument list delimiters, and general list delimiters. it's basically lisp, but without prefix notation... and if there's one thing that lisp did very well, it was erase the distinction between programs and data. see this definition; particularly the line about "the interpretation of code as data and vice-versa".

see, if i write a config file for program foo, foo is the config file's interpreter. all config files have interpreters; if they didn't, there would be no need for a structured config file format of any kind, as only humans would read them.

i do try reuse the config-file parsing code i write... heh. actually, i use python for just about all my config file needs, as it's very easy to interface to, from c. thus, my config files (well, the ones i've made recently-ish) are actually valid python scripts. i'm sure the same could be done with perl, although i dunno how hard it would be... spiffy, huh? =)


sayke, v2.3.1 /* i am the middle finger of the invisible hand */
[ Parent ]

Learn something everyday (1.00 / 1) (#75)
by speek on Fri Jan 12, 2001 at 05:01:37 PM EST

I had no idea Python looked that awful. And I've actually been thinking of learning it! Of course, everything looks awful the first time you see it, even XML :-)

So, you use Python, and yes, data and code are easily intertwined, especially since interpreters can hardly be separated from that which they interpret. The fact that you've chosen python vs XML for your config needs represents personal choice, no more or less, IMHO. So, if we all sat around and decided there was value in everyone using a standard, then some people won't get their way. Others will. No big deal, I would think, but some here are getting very wound up that XML might win, and I just don't get it.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

well, actually... (5.00 / 2) (#94)
by sayke on Sat Jan 13, 2001 at 11:12:10 AM EST

my example there, had it been python, would have been very lisp-ified python, which, in some circles, is a very bad thing. however, the fact that it's doable is certainly a plus for python; it really comes in handy sometimes... regardless, my example was basically a couple of key-value pairs, and then a very unrolled constructor, that included in it's arguments other constructors. all i did was carefully add some whitespace (and change "[]" to "()") to the following valid python:

maximize="right"
minimize="right"
close="left"
rootmenu=menu(title="Root Menu", items=[menu(title="Editing", items=[menuitem(title="GNU Emacs", exec="emacs", icon="/usr/share/emacs.png"), menuitem(title="gvim", exec="gvim", icon="")])])

see what an opaque block of code that is? notice how clumsy it is, all on one line like that. by whitespacing it the way i did, i made it a lot easier to visualize. however, there are other ways to do it; for example, it loooks a lot different when you don't unroll the constructor so completely, like so:

maximize="right"
minimize="right"
close="left"
submenuitem1=menuitem(title="GNU Emacs", exec="emacs", icon="/usr/share/emacs.png")
submenuitem2=menuitem(title="gvim", exec="gvim", icon="")
submenu=menu(title="Editing", items=[submenuitem1, submenuitem2])
rootmenu=menu(title="Root Menu", items=[submenu])

by adding a few extra names to the mix, it becomes fairly clear that menuitem() and menu() are just constructors, and i'm just constructing some menuitems and stashing em in the menus. do you find that method to be more to your liking? hehe... in all programming languages, There's More Then One Way To Do It. however, i'd say that in python, unlike perl, There's Few Obfuscatory Ways To Do It... or something along those lines. ;p

getting back to your point, i chose and choose python over xml because python is vastly more powerful. xml is a markup language, nothing less; and python has aspects of both functional and procedural languages, and some spiffy other stuff. for example: with python, if i decide my config file is going to need the ability to check for a firewall around the end-user's box, i can include an extra little itty bitty script in my distrib, and bwow, my config file has access to a competent networking library - which i can easily call from my config file in one line. can xml do that? not even close. it's not meant to be a programming language! it's out of it's league. so, in essance, i think xml should stick to being used for markup, and python (or one of the languages in it's class, but i prefer python) should be used for config, and everything else in which programmer time and human-readability is more important then a few milliseconds of execution speed... except markup. ;p


sayke, v2.3.1 /* i am the middle finger of the invisible hand */
[ Parent ]

OT: I'm confused (none / 0) (#96)
by speek on Sat Jan 13, 2001 at 11:35:02 AM EST

I always thought the reason I separated "config" data out from my programs was so that users could have more control over how it ran (because they could easily edit the config file[s]). It sounds as though you are arguing that if we just use Python for code and config, users will have complete control, because python is so human readable.

What if my config data is actually data, and not something I expect to execute? What if I need to give a printshop a means of describing all the finishing options it has to offer customers? And additionally, they want to specify which selections are mutually incompatible (like transparency and duplex)? And additionally which selections require human intervention and which can be passed straight to the printer without human oversight? This is descriptive information. Is that what you call "markup"? If so, then maybe we agree more than not, just that I happen to think that most complex applications could benefit from some "descriptive" information.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

parse = execute; config data is part of program (5.00 / 1) (#107)
by sayke on Mon Jan 15, 2001 at 01:51:01 AM EST

yup, seperating config data out of the compiled code and into something interpreted is often used to give users more control over how programs run; same with command-line options. in fact, i'd go so far as to say that config files and commandline options are really both variations on the "function parameter" theme, but that's another mess completely... ;)

python doesn't work very well where execution speed is of immense importance; it's designers opted to trade execution time for programmer time. in those situations, i'd say go with c. of course, c and python work together very, very well, and making a c program see and use data parsed from a python script is easy...

but yea, i'm arguing that python works very well in traditional configish situations, because they tend to be a bunch of key-value pairs, which python deals with in a clean, easily-implementable, human-readable, cross-platform (not .ini files, not shell scripts), standardized way.

xml does not work so well for lots of key-value pairs. see, markup languages don't work well for simple assignment lists; they look all clumsy when ya try to make em do so. however, for things like formatting conventions (html), they work great; that's their niche, and i think they should stick to it.

things written in a markup language are meant generally to be be parsed and processed by tools made with more then codebase. html can be rendered tolerably by ie, netscape, konquerer, lynx, ad infinitum... markup languages are designed to decouple the information being marked up from the parser of the information. markup languages also tend to deemphasize assignment and equality, and emphasize description and metadata. they are used more to structure data, rather then define it, i think... i'm sleepy.

i admit confusion about your "data != stuff i execute" distinction. is your data going to be processed by many programs by different vendors? is it going to be processed by a program at all? or is it only going to be read by humans? if it's only going to be read and used by humans, ya don't need a structured format to store the data in; you can say "foo must be either transparent or duplex" just fine in english. perhaps i'm misunderstanding... hm. tell ya what: give me a good description of the printshop problem, and i'll see what i can make of it... right now, though, it sounds like it might be a job for english, or a markup language...


sayke, v2.3.1 /* i am the middle finger of the invisible hand */
[ Parent ]

Darned right! (3.00 / 3) (#26)
by wiredog on Thu Jan 11, 2001 at 02:07:50 PM EST

We should never use a markup language for text formatting!

The idea of a global village is wrong, it's more like a gazillion pub bars.
Phage
[ Parent ]

eh? Problems with your examples (3.20 / 5) (#33)
by /dev/niall on Thu Jan 11, 2001 at 05:39:40 PM EST

I have a few problems with the examples you have provided. While I feel you've got the right idea (ie. XML is not the answer to everything) I don't think you are doing it justice. Configuration files are not scripts, and they're not text formatted for display. They're information, pure and simple.

LaTeX (nicer)

Says you. ;)

Programming -- clearly XML sucks at this:
Hypothetical XML:
<for variable=i value=1:10>
<assign variable=x value=i^2>
<print value=x>
</for>

Indeed it does suck... because it's absolutely not intended for this. Nobody who understands XML would suggest otherwise, so this is a poor example.

Configuration file -- just look at how oververbose XML is:
XML:
<kpress>
<key>F2</key>
<function>exec</function>
<argument>xterm</argument>
</kpress>

One possible configuration language:
kpress "F2", "exec", "xterm"

and in another, more extendable way:
kpress {
key "F2"
action "exec", "xterm"
}

It is not oververbose. It's descriptive. ;)

The problem with your first non-XML configuration file is there is no way of telling what each of the fields are (sure, most folks know "key", "action", "program", but how would another application know this?)

Your second non-XML example is better, because it solves this problem. However, I don't see how it's any more or less verbose than XML, and I don't remember ever seeing a standard for it.

XML isn't just a way of storing data, it's a way of storing the description of that data along with it. Armed with a DTD/Schema and an XML file, 3rd party applications can make use of this data in a standardized fashion, always knowing what it can or cannot do, and -- better yet -- using the same APIs and logic to play with other data sources.

It's not the answer to everything, but I really felt you do it a dis-service with the examples you have chosen.


--
"compared to the other apes, my genitals are gigantic" -- TheophileEscargot
[ Parent ]

XML-based programming languages (4.00 / 1) (#108)
by KnightStalker on Mon Jan 15, 2001 at 12:18:45 PM EST

Indeed it does suck... because it's absolutely not intended for this. Nobody who understands XML would suggest otherwise, so this is a poor example.

For an example of exactly *why* it sucks, see XSLT... which was created by people who do understand XML and yet suggested otherwise. Sure, it's powerful and helpful, but the people who understand XML will stop at nothing.

Example...

<xsl:if test="@attribute &lt; 3">
  <xsl:value-of select="$value1"/>
  <xsl:value-of select="$value2"/>
</xsl:if>

That's about as simple as it gets... yeesh :-)

[ Parent ]

XSLT is a different beast (4.00 / 1) (#109)
by /dev/niall on Mon Jan 15, 2001 at 02:13:39 PM EST

For an example of exactly *why* it sucks, see XSLT... which was created by people who do understand XML and yet suggested otherwise. Sure, it's powerful and helpful, but the people who understand XML will stop at nothing.

XSLT is a different beast than what many consider "typical" programming languages, like C, C++, Basic etc., which are sequential. XSLT is declarative in nature... you don't provide a sequence of steps to complete, you describe what you want to happen. It's perhaps better to compare it with SQL.

The problem a lot of programmers have with XSLT when they first start using it is they look for similarities with languages they have experience with (that's the geek way, apply what you know to what you don't know). If you don't have some experience thinking in a sequential language it's very easy to say "This sucks!!".


--
"compared to the other apes, my genitals are gigantic" -- TheophileEscargot
[ Parent ]

Declarative/procedural XSLT (4.00 / 1) (#110)
by KnightStalker on Mon Jan 15, 2001 at 02:37:57 PM EST

For simple XSLTs, the declarative nature works okay, but in my experience, more complex scripts need the procedural elements as well if there are any conditionals. Variable handling is absolutely insane in XSLT. I wrote an XSL/FOP translator to turn an XML-based phone directory into a PDF suitble for printing (using Apache's xalan/fop tools) and let me tell you, it got hairy. Especially since I had to write Java extensions for all the stuff the programmers didn't add until right after I got done, like dot leaders, widow/orphan control, etc. :-) But it was still faster to do it that way than with a completely procedural language, or (God forbid) in Word.

[ Parent ]
Not ready for this yet (4.23 / 13) (#8)
by itsbruce on Thu Jan 11, 2001 at 12:41:32 PM EST

If .procmailrc were an XML file there would be no such obstacle. KMail would be able to read and write the file, changing only the options that were currently supported by its configuration dialog. All of the filtering work would be done by (tested, powerful, and preexisting) procmail. This new configuration dialog would enhance the usability of procmail and the feature set of KMail, and save KMail developers the time it took to write the mail filtering code. This kind of cooperation -- at least, the potential for it -- is what makes open source software so great.

And how easy would it be to hack these files in Vi? To create them from the output of a script or grep them for needed information? How many different DTDs would a *nix hacker have to remember just to edit a few config files? Or are you hoping to get a standard Unix configuration DTD agreed (in which case I'll see you next millenium, sometime)?

Even if you learn the DTDs off by heart, you've lost the ability easily to build and manipulate these files in scripts or from the CLI. Until there is a standard set of command-line tools that give the same power over XML that grep, sed, cut, awk and the like give over plain text, what you ask is not practical.

Beyond that, I'm not even sure that it would be practical or desirable even if those tools were available. Universal config files would have to be very verbose and full of redundancy if they were to cope with every utility that might use them.

So what are people waiting? Are the problems with XML that I haven't thought of?
  • Procmail (usually) stops as soon as it finds a matching rule. Supposing an alternative utility applied every rule unless you told it to stop. How would you right a config file to fit both? I can think of hundreds of examples where two apps do the same job with totally different approaches, making it very difficult to creat a set of configuration settings that would suit both. Sometimes one app just has key/value settings where another parses a set of rules. Writing one config set-up to cover both would be like trying to fit a giraffe into a square hole.
  • What happens when two utilities overlap slightly in function but one or both have a unique subset of functions- as with procmail and kmail? Do they share a config file or does kmail (with the largest function set) read from a whole set of files?
  • How do utilities know which file or files to read for their config - unless you create one big unified config system (windows registry here we come)? That would fix the previous point but create a whole new raft of problems.
  • What do you do if you want to install two equivalent utilities and use one in one situation, one elsewhere and have completely different configurations for both?

Maybe some of those questions would be answered by a standard config DTD and good CLI tools. But I bet some of the answers would be ugly.

The glue that holds a Unix system together is the skilled human being who understands how it works. The virtue of Unix is that it is consistently understandable and configurable by anyone with a certain minimum skillset. Plain text is the foundation of that. The advocates of XML have to do some hard work to prove that they have something better. Since most the configs of most apps aren't that hard to learn, they also have to prove that it's worth it.


--

It is impolite to tell a man who is carrying you on his shoulders that his head smells.
Re: Not ready for this yet (3.57 / 7) (#15)
by phinance on Thu Jan 11, 2001 at 01:13:57 PM EST

And how easy would it be to hack these files in Vi?
They're plain text, so no more difficult than a regular config file. You might even understand more since XML tags are usually English words.

To create them from the output of a script or grep them for needed information?
How is this different from config files today? With XML you could grep on tag names so that you can effectively search by "meaning". Not all config file formats are so nice.

How many different DTDs would a *nix hacker have to remember just to edit a few config files?
How many config file formats do you have to remember? Don't you usually look at a config file and follow the example of what's already there? I contend that since the XML structure is standard (and simple, and you already know it) and the tags are in English, this process should be at worst just as easy.

Even if you learn the DTDs off by heart, you've lost the ability easily to build and manipulate these files in scripts or from the CLI. Until there is a standard set of command-line tools that give the same power over XML that grep, sed, cut, awk and the like give over plain text, what you ask is not practical.
Since XML files are plain text, you haven't lost this ability. That's one of the reasons people like XML so much.
If you use Perl, you can use Perl::XML and make your life even easier.



Read, annotate, and discuss open source documentation.
Andamooka: Open support for open content.
[ Parent ]

XML _is_ text! (3.57 / 7) (#16)
by Whizard on Thu Jan 11, 2001 at 01:15:10 PM EST

And how easy would it be to hack these files in Vi? To create them from the output of a script or grep them for needed information? How many different DTDs would a *nix hacker have to remember just to edit a few config files? Or are you hoping to get a standard Unix configuration DTD agreed (in which case I'll see you next millenium, sometime)?
Um... XML is text. I edit the configuration for the application I'm currently developing by hand with vi every day. It's an XML configuration file. And look! grep works too!

jcfergus@shaun - 12:04:13 - /wrist/production/config$ grep "address" cert.xml
<address>host1.here.com</address>
<address>host2.here.com</address>
<address>host3.here.com</address>
<address>host4.here.com</address>

How many different config file formats do you have to remember just to edit your config files now? Does a bind configuration file have anything in common with your sendmail.cf file?

Beyond that, I'm not even sure that it would be practical or desirable even if those tools were available. Universal config files would have to be very verbose and full of redundancy if they were to cope with every utility that might use them.

I don't believe the original author was talking about universal configuration files, that contain configuration information for everything under the sun, but about a common format for configuration files, so that software could make use of a common configuration file parser, and not have to use a different parser for each configuration file that it needs to deal with. XML fits this bill perfectly, because it's a common parser with a (nearly) completely flexible format.

The glue that holds a Unix system together is the skilled human being who understands how it works. The virtue of Unix is that it is consistently understandable and configurable by anyone with a certain minimum skillset. Plain text is the foundation of that. The advocates of XML have to do some hard work to prove that they have something better. Since most the configs of most apps aren't that hard to learn, they also have to prove that it's worth it.

Again, the point you miss is that XML is plain text. It's no harder to learn than those file formats that are already out there, and contains tons of advantages over most of them for parsing. Was HTML hard for you to learn?


--
So Lawrence Lessig, John Perry Barlow, Rusty, and Prince are having dinner...
[ Parent ]

One reply for 3 (3.80 / 5) (#31)
by itsbruce on Thu Jan 11, 2001 at 04:16:58 PM EST

Since you're all saying much the same.

Again, the point you miss is that XML is plain text.

No, it isn't. It's text-based. It has a minimum structure on top of that. The basic unit is not the character but the tag. For XML to be manipulated in the same flexible and generic (above all generic) way as the multitude of current config files whose basic unit is the alphanumeric character requires either new tools or more work with the old plain-text tools. Considerably more work if you're going to have to deal with any number of tags and have to look-up/validate against multiple DTDs.

Was HTML hard for you to learn?

No. SGML wasn't too hard either. I've written documentation and articles in LinuxDoc and DocBook. They are appropriate tools to that complex task. They are an unecessary overhead for simple configuration tasks.


--

It is impolite to tell a man who is carrying you on his shoulders that his head smells.
[ Parent ]
More work? (5.00 / 2) (#102)
by roystgnr on Sat Jan 13, 2001 at 05:45:05 PM EST

Considerably more work if you're going to have to deal with any number of tags and have to look-up/validate against multiple DTDs.

Oh, darn, you'll have to make sure a config file is valid before you save it?

Wait, you have to do that anyway, with non-XML files, assuming you want the program being configured to still work! The difference is that with the XML file you have a definitive schema to tell you what makes it valid; with most config files you get to pray that the author wrote an adequate man page.

[ Parent ]

Characters != structure (none / 0) (#105)
by Jim Dabell on Sun Jan 14, 2001 at 01:20:51 PM EST

For XML to be manipulated in the same flexible and generic (above all generic) way as the multitude of current config files...

Sorry, but there isn't a "flexible and generic" way to manipulate the current set of config files. That is what this discussion is about. Yes, XML specifies a minimum structure - that doesn't mean that config files have no structure, or that the structure is common across different files/apps. So you can manipulate the files as characters - what will that get you without knowing, at the very minimum, the syntax?



[ Parent ]
Change is scary, I'm here to help (3.87 / 8) (#17)
by 0xdeadbeef on Thu Jan 11, 2001 at 01:16:17 PM EST

And how easy would it be to hack these files in Vi? To create them from the output of a script or grep them for needed information? How many different DTDs would a *nix hacker have to remember just to edit a few config files? Or are you hoping to get a standard Unix configuration DTD agreed (in which case I'll see you next millenium, sometime)?

XML is text. I use VI to make every document I write. How much harder is the syntax than HTML? Sure, it is verbose, but if you were overtly concerned about that, you'd be using a graphical editor.

Go read up on XPath. Then evaluate its power with XML compared to using grep, sed, and awk to extract information out of non-trivial config files. And there are already command line tools for processing XML documents.

How many config file formats do you remember? How is that different from remembering the same structure in an XML document?

All of your concerns apply just as much to the typical zoo of config files as they would to the same zoo in XML. However, a consistent framework, made possible by the unified syntax, could alleviate those problems.

XML makes it easier to make thing easy for the people who want it that way. These people do not touch config files if they can help it. XML also makes it easier for those of us who aren't scared by complexity to do some really powerful things that no shell script can easily accomplish, even using the whole plethora of small Unix tools. A unix that is smartly configured with XML could make something like linuxconf a weekend hack!

[ Parent ]

XML might be useful, but not in *NIX config files (3.60 / 5) (#10)
by gullevek on Thu Jan 11, 2001 at 12:49:29 PM EST

Well, I think XML might be very useful in some X-OS stuff, light transfer of Data. It's very useful if you have plain text and want to format it for output in a web browser and use the same text (with header, body, etc styling) for output as printable (eg postscript) file.

But why putting so much overhead into a config file. Perhaps if ALL *nix programs would use the same style for configuring something. Like key mapping. But the thing in config file is, that it can have just anything in it. Sou you would need to have an extra description file for every config file and you would need to remember every single XML command.

Well, I prefer simple files, which I can read fast and easy. At the moment there is a big XML overflow. Almost everything wants to use it (from SOAP to COCOON, etc). But after looking at it, you ask yourself why writing 10 times as much to make DB access, when there is no advantage after all ...

Another possiblity for XML comes just into my mind. Log files from the system or eg Apache. This might be a point were XML could be quite useful, wouldn't it ?
--
"Die Arbeit, die tüchtige, intensive Arbeit, die einen ganz in Anspruch nimmt mit Hirn und Nerven, ist doch der größte Genuß im Leben."
  - Rosa Luxemburg, 1871 - 1919
Log files (4.00 / 2) (#12)
by evvk on Thu Jan 11, 2001 at 01:03:03 PM EST

> Another possiblity for XML comes just into my mind. Log files from the system or eg Apache. This might be a point were XML could be quite useful, wouldn't it ?

Yes, that or just about any log file might be usefull to be in XML. There's the data (the message) and there's the metadata (date, source, other conditions). The data and metadata most likely take much more characters than the "extra" tags. Just what a tagged language is good for as opposed to config/programming language where the commands are the most part.

[ Parent ]
You're overblowing it (2.00 / 1) (#44)
by Brandybuck on Thu Jan 11, 2001 at 10:59:54 PM EST

But why putting so much overhead into a config file.

What overhead? XML is scalable, and that means in the downward direction as well. An XML config file can be as simple as the following:

<preferences application="fooblarg" version="0.1.2" >
<option key="DefaultName" value="John Doe" />
<option key="AppDir" value="/usr/local/share/freeblarg/" />
<option key="Style" value="Motif" />
</preferences>

And if you're talking about the extra code needed to process XML files, odds are that it's already there. GNOME and KDE have XML support built in, so why not use it? libxml or expat ain't that big for non-DE applications. And if you have application data files to read in, why not use a single format for both the data and the rcfile?

[ Parent ]

Apple thinks its a good idea too. (4.20 / 5) (#18)
by thedward on Thu Jan 11, 2001 at 01:18:40 PM EST

Current developer releases of MacOS X already use XML for many of their configuration files. You can edit them with a text editor, or you can use their "property list editor" that ensures that the file is at least syntactically valid. You can find more details over at Ars Technica.

Eww. No thanks. (3.22 / 9) (#19)
by simmons75 on Thu Jan 11, 2001 at 01:20:35 PM EST

XML is a current buzzword. That's all. It might sound great because so many people extoll the virtues of a unified config file format, and "obviously" the way to do this is XML, but all I have to say is "no thanks."

Personally, I prefer the (eep!) Microsoft .ini file format.

[Display]
GraphicsCard=foo;

to

<SECTION="DISPLAY">
<GraphicsCard="foo">

but that's just me. It's easier to read, IMHO, and easier to deal with in a text editor. One of the things that keeps me on a Linux box (and possibly a BSD system soon :-) is the ability to configure many apps using little more than a shell and vi.
poot!
So there.

More like (none / 0) (#36)
by ucblockhead on Thu Jan 11, 2001 at 05:43:50 PM EST

<Display Card="Foo"/>
<Sound Card="Bar"/>
<Display Card="Blargh"/>

etc.
-----------------------
This is k5. We're all tools - duxup
[ Parent ]
Complexity (3.33 / 3) (#24)
by slaytanic killer on Thu Jan 11, 2001 at 01:47:28 PM EST

I wonder if people are having a backlash against XML because of its possible complexity. XML scales in complexity much higher than normal config files; and perhaps the important point is that at some point, we might be saddled with systems that are no longer so quick-and-dirty. What if the complexity spirals upwards to the point where system administration becomes a study in managing complex systems?

I have no idea about this; I am very tired ATM. But the main points for XML is that it's much more self-documenting than text configs; you are not using punctuation and whitespace to delineate tokens.

Real world example ... (3.60 / 5) (#27)
by retinaburn on Thu Jan 11, 2001 at 02:32:44 PM EST

We use XML files for all of our /complex/ configuration files.

We have a restart manager on 5 different servers, on startup it reads and parses a rather large XML config file ( 7k is the smallest). Its all placed in a nice heirarchy structure with dependencies, parameters and the like. The best thing is its readable in the source form and even better when you read it in a XML parser with collapsing structures, etc. I can't see the use in small config files that are readible as is but for nice big complicated ones go for XML.


I think that we are a young species that often fucks with things we don't know how to unfuck. -- Tycho


Perhaps a common config format that isn't XML (3.91 / 12) (#38)
by extrasolar on Thu Jan 11, 2001 at 07:16:20 PM EST

There seems to be the idea that the options are only a XML format or what we have now. But perhaps a unified Unix configuration format would be a better idea. Reading the discussion here, it seems that XML has at least the following disadvantages:

  • Verbose syntax
  • DTD, XLink, XPath, etc., complexities
  • XML format may not be completely suited for all tasks. Such as should every element of a config file need to be read? XML means it has to, I think.
  • Doesn't scale. Some applications don't need a complicated data structures like trees or graphs.
  • Some applications need or desire a program for configuration, such as Emacs. XML isn't as suitable for programming as a real programming language.

What we have now isn't so great either. Here are some disadvantages from reading the discussion:

  • Inconsistant formats between applications. Knowledge of one format doesn't help at all with another. The user must relearn the configuration format for each application used.

Its a single item, but its a big one though. Configuration is a user task, beleive it or not. It shouldn't be a difficult task.

So it seems if you look at all the configuration files we have now and consolidate them into a new standard somehow, then we get the same advantages as an XML format with none of the disadvantages.

I see something like this.

Every config file has a one line header at the top of the file that contains some information. It could be something like #?RCL:key-value; or #?RCL:language(scheme); or #?RCL:hierarchial;. RCL could stand for Resource Configuration Language. I don't know, I'm just making it up. Then we have a standard for each heading.

Example 1

#?RCL:key-value;
nameserver=209.43.556.2
phone-number="(520) 639-4462"
user="klh"
password="omicron"

Example 2

#?RCL:hierarchial;
maximize="right"
minimize="right"
close="left"
root-menu={
title="Root Menu"
submenu={
title="Editing"
menuitem={
title="GNU Emacs"
exec="emacs"
icon="/usr/share/emacs.png"}
menuitem={
title="gvim"
exec="gvim"
icon=""}
}

Anyway, you get the idea.



What's the difference? (4.00 / 8) (#40)
by bjrubble on Thu Jan 11, 2001 at 09:25:57 PM EST

Sorry, I'm in a quibbling mood after reading all the other anti-XML posts...

XML "disadvantages" :

Verbose syntax -- I still can't figure out why people care so much about this. Is it really so much trouble to write that bit of extra verbiage around your configuration parameters? I think about Apache, which has a bunch of SGML-like stuff in its configuration. I've never resented the fact that I have to type out 'VirtualHost' or 'Directory' -- in fact I'd say httpd.conf is my favorite configuration file; it's extremely readable and nearly self-documenting.

should every element of a config file need to be read? -- Is there some need to be able to change parameters by repeating them, or short-circuit the config parser? These seem like hacks to address shortcomings in the parser. It's trivial to comment out chunks of XML, how does this not accomplish the same thing?

Some applications don't need a complicated data structures like trees or graphs. -- This strikes me as like saying, "this application doesn't need color, so I'm going to find a monochrome display for it." If your config doesn't need complex structures, don't use them. This isn't like choosing between C and Perl to write an app -- if the config processing is slightly slower, I for one care very little. I'm similarly happy to compile programs that spend more time in 'configure' than 'make' -- I'd much rather let my computer work an extra 30 seconds on each compile, than spend half an hour manually tracking down problems on the oddballs.

I'd also elaborate on your single disadvantage of the current anarchy, and say that config formats are also inconsistent among versions. I think one of the primary problems with GUI config tools is that they're frozen at a certain version of their target program, and not only can't take advantage of new features but may actually break on them. XML offers the potential for not just cross-program compatibility, but cross-version compatibility as well. Not to mention GUI-manual compatibility, which I consider the biggest potential win.

Finally, your proposed config format, while undoubtedly more efficient than XML for the examples you gave, doesn't IMO offer anything really substantial over XML. And it lacks a DTD-like key, an API available across all major languages, and a cottage industry of "For Dummies" books.

[ Parent ]
inventing something new for the heck of it (3.20 / 5) (#41)
by speek on Thu Jan 11, 2001 at 09:45:13 PM EST

You're trying to invent something that accomplishes what XML already accomplishes. So I have a hard time understanding why it's so compelling. I think all of your disadvantages of XML boil down to just the first one - verbosity. Some people just can't stand it. But they're missing what you get for it - clarity. There is no ambiguity in an XML document, and in the long run, that is worth every keystroke.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

It's the S/N ! (3.00 / 5) (#47)
by tmoertel on Fri Jan 12, 2001 at 12:47:20 AM EST

I think all of your disadvantages of XML boil down to just the first one - verbosity. Some people just can't stand it. But they're missing what you get for it - clarity. There is no ambiguity in an XML document, and in the long run, that is worth every keystroke.

No, what many of us are concerned about is that certain folks seem to be blinded by the hype surrounding XML and proposing that we XML-ize everything, even if doing so would bloat up extremely important human-readable configuration files and lower their signal-to-noise ratio dramatically. Tags take up space, and when they don't communicate useful information, or when they're largely redundant, they are noise.

S/N matters, especially to humans, who happen to read and edit a lot of these files.

For example, somebody used crontab files as an example of a perfect candidate for XMLizing. I couldn't disagree more. XMLizing a crontab file would cut its S/N to a third of its former value. (Try it. Get a big crontab file and convert it into XML.) To find the entry that kicks off at 4:35 on Fridays, I'd have to wade through block after block of <entry> elements looking at <minute> and <hour> and <dayofweek> tags instead of, with the old format, just reading down columns of nicely aligned numbers to find the one I wanted.

Remember, machines aren't the only things that read and use configuration files.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
XML-editor (2.50 / 2) (#50)
by Steeltoe on Fri Jan 12, 2001 at 04:48:40 AM EST

You could use a program to edit the XML-files, just like you use a program to edit ordinary files. By reading the DTD it could represent the file as plain as possible, even in a common plaintext format.

- Steeltoe
Explore the Art of Living

[ Parent ]
S/N assertion wrong (3.75 / 4) (#51)
by speek on Fri Jan 12, 2001 at 07:39:35 AM EST

.. certain folks seem to be blinded by the hype surrounding XML

Meaningless rhetoric that adds nothing to the discussion. Either my arguments make sense or they don't.

To find the entry that kicks off at 4:35 on Fridays, I'd have to wade through block after block of <entry> elements looking at <minute> and <hour> and <dayofweek> tags...

Actually, no - you'd have the benefit of knowing exactly what to search for. Even vi can handle that.

And frankly, that disproves your implied assertion that tags don't communicating useful information. They are part of the signal.

Have you ever seen a property file that looked something like this:

argument = long_ass_value_1 \
long_ass_value_2 \
long_ass_value_3 \
etc....

Is that particularly readable? You realize that the \'s are there to indicate that the value did not in fact end, as is normally assumed by an end of line in these sorts of files. XML has no such ambiguities and exceptions. You know when a value ends and when it begins. You know the value's place in the whole scheme (or should I say schema?) of things. And that was a simple property file of key-value pairs! It only gets worse as your configuration data gets more complicated.

Change is always hard, but this is a piece of cake in comparison to most.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

S/N assertion *right* (4.33 / 3) (#56)
by tmoertel on Fri Jan 12, 2001 at 11:10:08 AM EST

First, thanks for the response. I'd like to apologize for inserting that bit of rhetoric in the begining of my earlier post. It was uncalled for and, ironically, lowered the S/N of the discussion. Now, let us continue...

You entitled your reply "S/N assertion wrong," and then largely discuss other things. Maybe you're not familiar with the concept, and so I'll share with you that S/N is inversely proportional to redundancy. Most of the XMLized configuration files I've seen floating around here are rife with redundancy, hence my S/N comment.

The only time you really addressed the S/N issue head on is when you claim that the tags in the XMLized crontab example provide useful information themselves, but in that example they clearly do not. (Proof: All crontab job entries follow the same structure, and so it's not necessary to reassert their structure again and again for each entry. Doing so once is sufficient, and this one time can be done outside of the file itself (which is exactly how crontab files are today, i.e., the structure is implicit). Hence, reasserting the structure via XML tags provides no additional information and is therefore noise.)

Another way to see this is to take the crontab challenge. Here's a real-world crontab file. Convert it into XML:

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/
 
# m hr day mon wkday
01 * * * * root run-parts /etc/cron.hourly
14 2 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly

Then consider: Now how big is it? Where did the extra bloat come from? Does the XML version communicate any additional information to either human or machine? Answer: No. (If you disagree, prove it. Tell me what useful information exists in the XML version that cannot be found in the original.) So, the signal is the same in both versions. However, the XML version is massively larger; the extra size is noise; hence, its S/N is lower.

Before you respond, please take the time to try to see the opposing viewpoint. Take the crontab challenge. Experience the bloat for yourself. The original fits on half of an 80x24 screen and is easy to read, understand, and edit. Its structure is readily apparent visually. Its S/N is extremely high.

Can you see why some people might legitimately be opposed to XMLizing configuration files like crontab?

Thanks for your time and consideration.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
I do understand (4.50 / 2) (#62)
by speek on Fri Jan 12, 2001 at 02:16:12 PM EST

I do understand your viewpoint. I understand your crontab file is shorter than an XML version. I realize that in discussing any one particular config file that currently exists, I will not be able to demonstrate any advantage to using XML vs. using a homegrown, focused alternative because there is none.

The advantage to XML is that it could work for everyone's config needs, and everyone could share not only code, but potentially other programs' data with less work. It eliminates ambiguity in the data, which is helpful anytime different people have to work together and understand what was meant by another's hacking. That extra "bloat" is responsible for the increase in clarity, and therefore, it is not noise.

XML is not redundant. The start and end tags are giving information that is needed. In your crontab file, you don't need it, because there are assumptions being made about the file by external agents. For instance, it is assumed that each line represents a single cron job - that there are no lines referring to anything else. It is also assumed that each line is in the correct format. In XML, you might have a <cron> tag that deliminates each cron job, eliminating the need to assume that each line specifies a cron job, and not some other arbitrary piece of data (which raises an interesting question - how does the cronjob interpreter know that the first few lines you gave are not cronjob entries?). The cronjob file doesn't need this data in the file because all agents who read the file are required to assume this information. XML simply moves those assumptions from the reader into the data file - where some of us think it belongs.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

Missing the point: Humans matter, S/N matters! (4.00 / 1) (#71)
by tmoertel on Fri Jan 12, 2001 at 04:28:38 PM EST

You say:
The advantage to XML is that it could work for everyone's config needs, and everyone could share not only code, but potentially other programs' data with less work.

But you seem to ignore that there is a cost to the XMLization of everything, and this cost must be borne -- time and time again -- by the humans who must read and edit these files as part of their daily lives.

Please, take the Crontab Challenge I proposed in my earlier post. From the human perspective, is the XML version easier to read, understand, and change? If you think so, prove it. Be prepared to explain:

  • why having to deal with a file > 2X its original size is easier and more manageable,
  • why it's easier to hunt down and read the XML DTD to determine the correct element and attribute names for the XMLized crontab than it is to look up the original crontab format via man(1), and
  • why it's easier to express common command-line characters like & and < as & amp; and & lt; rather than to use the original characters themselves?

You go on to say:

XML is not redundant. The start and end tags are giving information that is needed.

Are you joking? XML is redundant by design because the redundancy makes XML parsers simpler. Don't you recall this being one of the design goals for XML w.r.t. SGML? As such, XML documents are considerably larger than well-minimized SGML documents because XML eliminates many of SGML's markup minimization features such as SHORTREF and optional tags. And XML documents are downright bloated when compared to tailored-to-the-task formats like the original crontab format. What do you think that bloat is?

It's 100% pure, certified redundancy. There's no getting around it. It's not needed; much of the information provided by the tagging is provided elsewhere already, and providing it again is redundant. Get it? If you can do away with something, it's redundant. If it's redundant, it's noise.

Noise is bad.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
you ignored my points (5.00 / 1) (#72)
by speek on Fri Jan 12, 2001 at 04:48:19 PM EST

You complained previously that I didn't respond to your S/N argument. So, I responded very specifically, explaining why XML is not redundant. I explained exactly why it is not redundant, why it would give needed information missing from the crontab format. I explained that crontab gets around this missing information by coding it into the parsers. I explained that moving that information into the data file makes sense to me, and that, because the information has to be somewhere, it is not bloat. You choose to put it in your parsers, I prefer putting it into the file.

I think bloat is duplicate information. Since I don't think XML is duplicating information, so I don't think it's bloated. The fact that XML would make the file larger is not proof of bloat. Because your particular example of the crontab would not ever use 90% of the capabilities of XML, you could reasonably argue that it's overkill, and, as I indicated previously, in the context of just one application and its config file, I could hardly but agree. In the context of thousands upon thousands of applications, I see things differently.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

No, it's that your points are self contradictory (3.00 / 1) (#77)
by tmoertel on Fri Jan 12, 2001 at 05:38:32 PM EST

I don't know how to respond to your points because your arguments are logically inconsistent. For example, you readily admit that the XML version is larger, yet claim that it is not redundant, despite that the XML version provides no information that cannot found in the original. If it's larger yet provides no new information, it's redundant. By definition.

Next you say, "The fact that XML would make the file larger is not proof of bloat." Really? Then what new information does the extra size provide? Can you show why the same information cannot already be found in the much-smaller original version? If you can't, it's bloat.

If you want to argue that the XML versions are not redundant and bloated, you'll have to show the following:

  1. That the extra size of the XML version provides additional information.
  2. That the additional information cannot be found in the original.

So far, you haven't done either.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
clarity (5.00 / 1) (#80)
by speek on Fri Jan 12, 2001 at 07:25:47 PM EST

That the XML contains additional information that cannot be found in the original is as plain as the names of the tags and attributes.

To you, the current crontab is very clear, because the hidden assumptions are well known to you. They aren't to me, so I don't find it very clear. I would much prefer the XML. Unix is filled with stuff like this - vi is so easy for those who happen to know the commands. For those who don't know them, it's utterly useless.

Verbosity, the bane of the Perl programmer, can be an excellent quality in both a programming language and a data-markup language. The cost of typing in the extra characters is more than worth the gains you get in clarity and communicative power. I do not expect to convince many Linux hackers of this, however. We'll just have to see who wins the war, eh? :-)

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

XML verbosity != clarity (4.00 / 1) (#81)
by tmoertel on Fri Jan 12, 2001 at 08:34:10 PM EST

Why do you persist in equating XML's verbosity with clarity? Please tell me how switching to XML changes the need to look up and understand a config file's format? Is it easier to hunt down and read the XML DTD to determine the correct element and attribute names for the XMLized crontab than it is to look up the original crontab format via man(1)? Is it easier to remember when XML requires character entity references and CDATA sections than to just type plain characters?

You ignore the fact that in addition to verbosity (i.e., redundancy) XML brings a lot of human complexity to the party for the sake of making it easier for parsers to process XML documents.

Please spend some time acquanting yourself with the design goals for XML. Please pay particular attention to goals (4) and (6):

4. It shall be easy to write programs which process XML documents.
6. XML documents should be human-legible and reasonably clear.
(Emphasis mine.)

Notice how the design goals favor ease of parsing over ease of human understanding? XML must be easy for programs to parse, yet only legible and reasonably clear for humans. These words were chosen carefully by the XML design team and debated for considerable time. Please do not try to equate "human-legible" and "reasonably clear" with "easy for humans." If ease of human understanding was the actual goal, that's how it would have read.

Also, please consider design goal 10, which explains why XML is more verbose than SGML and to-the-task formats:

10. Terseness in XML markup is of minimal importance.

Please understand that I'm not trying to be argumentative. What I am trying to do is help you see the following:

  1. Humans must be able to read, understand, and edit configuration files. The easier, the better.
  2. XML, by the nature of its design, makes configuration files larger and lowers their S/N. Hence it makes configuration files more difficult for humans.
  3. In order to understand the format of an XML configuration file, humans must study and understand its DTD. This is not trivial for most humans and certainly no easier than consulting a man(1) page. (Visual XML editors aren't a good solution for this problem because configuration files must often be hand edited in tight environments. Consider single-disk distros, installations gone wrong, and so on.)
  4. Therefore, it is likely that a wholesale conversion of configuration files to XML-based formats will make the lives of the humans considerably more difficult.

Thus, in the end, I must conclude that such a conversion not a good idea.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
verbosity != redundancy (none / 0) (#83)
by speek on Fri Jan 12, 2001 at 08:57:25 PM EST

Verbosity is verbosity. Is it redundant to name my variable "finishingOptionList" rather than "x"? No, it's just more verbose.

I know you're not just being argumentative. It's ok - we just disagree on what is "easy". For your last 4 points, I disagree with points 2,3, & 4. XML files are easier (for me), I do not have to study the DTD (what for? if the xml isn't self-explanatory, there'd better be plain-english explanation somewhere), and no, it wouldn't make things more difficult for humans. Considering the significant failure of Linux to make life easier for me compared to Windows, I'd have to assume we can do better than what we have currently.

All that said, I certainly appreciate the lesson about crontab, and I'm gonna go put my new knowledge to use now. I'll talk to you some other time on some other topic. Ciao.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

One last thing... ;-) (none / 0) (#88)
by tmoertel on Fri Jan 12, 2001 at 11:38:51 PM EST

On whether you need to use the DTD, you wrote:
I do not have to study the DTD (what for? if the xml isn't self-explanatory, there'd better be plain-english explanation somewhere)...

One reason that you'll need DTDs for config files, when for so many other uses of XML you need only well-formed documents, is that config files are meant to be edited by humans (as well as machines). Humans often make mistakes, and so you would need to use a validating parser and a DTD for each config file to make sure that the file satisfies its obligations.

Another reason is that many times when you edit a config file, there's nothing in it yet. It's blank. Without a template of conforming XML to guide you, you'll need to determine what the document's structure ought to be. Imagine an empty XML crontab. What's the structure again? Is minute an element or attribute? I can't remember...

Cheers,
Tom

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
Nope, that's noise (2.00 / 1) (#82)
by tmoertel on Fri Jan 12, 2001 at 08:45:08 PM EST

You wrote:
That the XML contains additional information that cannot be found in the original is as plain as the names of the tags and attributes.

No, that information can be found in the original. The information that the minute attribute provides, for example, can be found in the original crontab file positionally: The fact that a number falls into the first column indicates that it is a minute value. Additionally, in the original no tags are required, and hence the original has a much higher S/N value.

In other words, the tags are noise.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
bold (5.00 / 1) (#84)
by speek on Fri Jan 12, 2001 at 09:12:06 PM EST

I noticed you bolded the word "minute". I'm assuming you used a tag to do that. So, was that noise? shouldn't the browser have known by the positioning and importance of the word that it needed to bold it?

Positioning only tells you something if you already happen to know that. Not knowing it, you'd have to agree that that information isn't really in the crontab file. But I know you know that, because, in your examples, you've added a comment line to communicate these positional facts. However, my crontab (on my machine) has no such comment, nor can the accuracy of the comment be verified without experimentation.

Curiously, I learned something about crontab from that other fellow's attempt to take your crontab challenge that wasn't made clear by your comment. I've never used crontab or read it's manpages or anything, so the whole thing is a mystery to me. Your comment line makes clear what each number is referring to, but it doesn't make clear how they would be used. From reading the XML the other guy wrote, I understood that, if cron.hourly was specified, only the minute value was needed, and the rest was ignored. For daily, the hour and minute number are needed. etc. Seems obvious, but I had no idea from the original crontab file. The XML is just plain more informative to a human than the bare file. I think you're having a hard time seeing this because you know too much, and the tags aren't telling you anything you don't already know. But for me, they are.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

Great example! (5.00 / 1) (#89)
by tmoertel on Sat Jan 13, 2001 at 12:05:23 AM EST

First, glad to have your response. I'm glad somebody's reading this stuff. ;-)

Okay, now on to point numero uno:

I noticed you bolded the word "minute". I'm assuming you used a tag to do that. So, was that noise?

Nope, it wasn't noise. Because in that context there was no other way to determine the same information, and so the tagging provided new, useful information. This is in contrast with the crontab example, where every job entry has the same structure, and so the repetitious tagging of the entries is redundant. (You'll find that the same applies to any tabular data when it is expressed in XML.)

Curiously, I learned something about crontab from that other fellow's attempt to take your crontab challenge that wasn't made clear by your comment.... From reading the XML the other guy wrote, I understood that, if cron.hourly was specified, only the minute value was needed, and the rest was ignored. For daily, the hour and minute number are needed. etc. Seems obvious, but I had no idea from the original crontab file. The XML is just plain more informative to a human than the bare file.
(My emphasis)

This is a perfect example of the low S/N I was talking about. You see, what you inferred from the XML encoding of the crontab file was wrong. The tagging mislead you. (If you don't believe me, man 5 crontab.)

The "specification" of cron.hourly is not a specifcation at all and has nothing to do with whether minutes, hours, and so on are ignored. In fact, it's the other way around. The numbers on the left specify when to run the command on the right (such as cron.hourly). So, the "cron.hourly" part of the line is ignored and passed straight through to a shell whenever the time specification on the left is satisfied.

See what I mean about noise? ;-)

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
Damn that "other fellow"! (4.00 / 1) (#92)
by speek on Sat Jan 13, 2001 at 08:31:49 AM EST

Damn him! Damn him straight to hell!

Heh. Oh well, although I did misunderstand the xml, I didn't understand the plain file any better. I was mislead by the fact that each example was using a cron.* as it's command line, leading me to think that those were the only options. In essence, I thought the cron.* were enumerated values, rather than just "strings". It's surprising how much information is missing from these files. In your other post, you argue that config files would require a DTD to validate against, but why doesn't that apply to your bare crontab file?

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

Good Q (none / 0) (#93)
by tmoertel on Sat Jan 13, 2001 at 10:45:21 AM EST

You axed:
In your other post, you argue that config files would require a DTD to validate against, but why doesn't that apply to your bare crontab file?

The reason the XML-based crontab file requires a DTD and the original to-the-task file doesn't is because XML parsers are general and must be told the legal grammar of the documents they're asked to parse if they're to catch grammatical errors in those documents. Without the DTD, which provides the grammar's definition, the XML parser would be forced to accept any well-formed XML document as valid, even if its contents (owing to a human author's error) did not represent a legal specification of cron jobs. For example, you could leave out the command portion of every single job entry and the parser would be none the wiser.

The hand-coded parser for the original to-the-task crontab doesn't have this requirement because it accepts only one grammar, the one that was hand-coded into it.

So, in order to ensure that documents are valid w.r.t. a particular grammar, you'll need to go one of two routes:

  1. create a dedicated parser that embodies the grammar (which is what the original crontab does), or
  2. teach a general parser the grammar via an external grammar specificaion (which would be the XMLized crontab route)

Cheers,
Tom

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
out of curiousity (none / 0) (#97)
by speek on Sat Jan 13, 2001 at 11:36:34 AM EST

Do think using XML is ever warranted? If so, when?

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

XML is a good tool, when it's the right tool (5.00 / 1) (#99)
by tmoertel on Sat Jan 13, 2001 at 01:14:56 PM EST

Yes, there are many good uses for XML -- DocBook, MathML, lightweight RPC, and serialized objects, for examples. XML is even a great idea for some configuration files. But it's just one tool in a kit of many. For other uses, XML places too high of a burden on the end user of the documents. See this post where I elaborate a bit on my personal thoughts about XML.

I would say that when the configuration file's grammar is complicated enough that a parser for it can't be created in less than a page or two of code, it might be time to consider XML. (Note that I'm talking about the parser only, not the code that walks the abstract syntax tree and validates/performs the actions it represents. The AST code would be necessary in either hand-coded or XML-based parsers.) Or, when the grammar is similar to a markup language, by all means XML is your tool of choice.

But, to beat our crontab horse into dust, the basic crontab parser requires only a few lines. Here's an overly simplistic rendition:

#!/usr/bin/perl -w

while (<>) {
    chomp;
    if (/^\s*\d/) {
        my ($min, $hr, $day, $mon, $wkday, $job) = my @a = split ' ', $_, 6;
        print("new job: time=(", join('-',@a[0..4]),"), job=[$job]\n");
    } elsif (/^\s*\#/) {
        print("comment\n");
    } elsif (/=/) {
        my ($var, $value) = split /\s*=\s*/, $_, 2;
        print("env var: $var = $value\n");
    }
}

Let's try it out:

# ./crontab.pl /etc/crontab
 
env var: SHELL = /bin/bash
env var: PATH = /sbin:/bin:/usr/sbin:/usr/bin
env var: MAILTO = root
env var: HOME = /
comment
new job: time=(01-*-*-*-*), job=[root run-parts /etc/cron.hourly]
new job: time=(02-4-*-*-*), job=[root run-parts /etc/cron.daily]
new job: time=(22-4-*-*-0), job=[root run-parts /etc/cron.weekly]
new job: time=(42-4-1-*-*), job=[root run-parts /etc/cron.monthly]

Just ten statements, three of which are AST stuff (the print statements), but it does the job.

A lot of configuration files are this simple. For them, XML is probably not a good tool. For other configuration files that are already complicated, converting them to XML doesn't increase their complexity significantly -- editing them by hand is going to be nasty either way -- but it does allow you to use XML editing tools, which might make your life easier. For them, go ahead and use XML. I would. ;-)

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
I'm with you (4.00 / 1) (#100)
by speek on Sat Jan 13, 2001 at 01:31:38 PM EST

Makes me wonder if XML couldn't support a repetitive tabular format better than it does. Something like:
<tabFormat>
<field implicitTagname="minutes"/>
<field implicitTagname="hours"/>
<field implicitTagname="days"/>
<field implicitTagname="months"/>
<field implicitTagname="command"/>
<data separator="whitespace">
1 * * * run-part cron.hourly
</data>
</tabformat>

This would require some additional logic added to, say, a SAX parser that essentially used the information in the field tags to add implicit tags to the white-space separated data elements. Admittedly, still not making you happy for use with crontab, but if you have a list of 100's of items, it wouldn't hurt much at all.

--
al queda is kicking themsleves for not knowing about the levees
[ Parent ]

SGML is pretty cool about this... (5.00 / 2) (#103)
by tmoertel on Sat Jan 13, 2001 at 08:22:47 PM EST

Not to get r-e-a-l-l-y off topic, but you wrote, "Makes me wonder if XML couldn't support a repetitive tabular format better than it does. Something like: [Tabular XML where header is defined first and then a data section provides raw content]."

And, to answer your ipmlied question, you betcha. You could absolutely do that. However, you've basically built a special-case parser for tabular data.

Now, SGML, bad boy that it is, can take care of this stuff for you. For example, I wrote the following document type definition:

<!DOCTYPE CRONTAB [

<!ELEMENT CRONTAB     o o (ENV*,JOB*)>
<!ELEMENT ENV         - - ((VAR,VALUE)*)>
<!ELEMENT (VAR,VALUE) o o (#PCDATA)>
<!ELEMENT JOB         - o (MIN,HOUR,DAY,MONTH,WKDAY,COMMAND)>
<!ELEMENT (MIN|HOUR|DAY|MONTH|WKDAY) o o (#PCDATA)>
<!ELEMENT COMMAND     - o (#PCDATA)>

<!ENTITY  stVAR     STARTTAG "VAR">
<!ENTITY  stVALUE   STARTTAG "VALUE">
<!ENTITY  stHOUR    STARTTAG "HOUR">
<!ENTITY  stDAY     STARTTAG "DAY">
<!ENTITY  stMONTH   STARTTAG "MONTH">
<!ENTITY  stWKDAY   STARTTAG "WKDAY">
<!ENTITY  stCOMMAND STARTTAG "COMMAND">

<!SHORTREF mpENV    "&#RS;"   stVAR>
<!SHORTREF mpVAR    "="       stVALUE>
<!SHORTREF mpMIN    " "       stHOUR>
<!SHORTREF mpHOUR   " "       stDAY>
<!SHORTREF mpDAY    " "       stMONTH>
<!SHORTREF mpMONTH  " "       stWKDAY>
<!SHORTREF mpWKDAY  " "       stCOMMAND>

<!USEMAP   mpENV     ENV>
<!USEMAP   mpVAR     VAR>
<!USEMAP   mpMIN     MIN>
<!USEMAP   mpHOUR    HOUR>
<!USEMAP   mpDAY     DAY>
<!USEMAP   mpMONTH   MONTH>
<!USEMAP   mpWKDAY   WKDAY>

]>
which can teach an SGML parser to parse an existing crontab file, with just a few minor modifications. Here is the crontab file from the Crontab Challenge, with a few small tweaks (in bold) to make it legal SGML according to my DTD:
<ENV>
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/</ENV>

<JOB>01 * * * * root run-parts /etc/cron.hourly
<JOB>02 4 * * * root run-parts /etc/cron.daily
<JOB>22 4 * * 0 root run-parts /etc/cron.weekly
<JOB>42 4 1 * * root run-parts /etc/cron.monthly

That's all it takes and it parses just fine:

CRONTAB
  ENV
    VAR
      SHELL
    VALUE
      /bin/bash
    VAR
      PATH
    VALUE
      /sbin:/bin:/usr/sbin:/usr/bin
    VAR
      MAILTO
    VALUE
      root
    VAR
      HOME
    VALUE
      /
  JOB
    MIN
      01
    HOUR
      *
    DAY
      *
    MONTH
      *
    WKDAY
      *
    COMMAND
      root run-parts /etc/cron.hourly
  JOB
    MIN
      02
    HOUR
      4
    DAY
      *
    MONTH
      *
    WKDAY
      *
    COMMAND
      root run-parts /etc/cron.daily
  JOB
    MIN
      22
    HOUR
      4
    DAY
      *
    MONTH
      *
    WKDAY
      0
    COMMAND
      root run-parts /etc/cron.weekly
  JOB
    MIN
      42
    HOUR
      4
    DAY
      1
    MONTH
      *
    WKDAY
      *
    COMMAND
      root run-parts /etc/cron.monthly

Cheers,
Tom

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
XML Will Reduce Total Bloat (5.00 / 1) (#85)
by LukeyBoy on Fri Jan 12, 2001 at 09:41:05 PM EST

Yes, using XML will reduce total application bloat. So what if the config file increases in size? Make an application that has a portion requing one key-value pair from an enormous config file. Which is smaller? Writing a whole parser for your own little non-standard format .rc text file? Or calling a one line function from the libxml2 library using XPath to refer to the correct node value (e.g. "//screen-saver-conf/spiders/spider-color"?

I can already hear you quoting the size of the XML libraries and their corresponding headers and their readme files and so on. But when you think about a dozen or more applications sharing the parser library, you're removing tons of APPLICATION SPECIFIC CONFIGURATION PARSERS.

When you write an application from scratch and you know there's a good chance the XML parsers will be on the target system, it's exponentially easier to use DOM/SAX interfaces to find configuration info than designing a text parser. Escaping of special characters? Already handled! Case-sensitive ambiguities in node names? Defined and nailed down. I conclude that XML saves you space in the long run, and is a clean balance between machine-readable and human-readable.



[ Parent ]
No, no, no... not the binary, the file itself (3.00 / 1) (#87)
by tmoertel on Fri Jan 12, 2001 at 11:28:43 PM EST

You wrote:
So what if the config file increases in size? ... [Your application code will be smaller because you can rely on XML libraries to take care of the parsing for you.]

Please re-read this thread from the begining. I'm not sure what you think my concern is, but it has nothing to do with the bloat of application binaries. It's the config files. No more, no less. More specifically, the problem is that taking simple config files and XMLizing them will decrease their S/N ratio and bloat them up, making the lives of the people that must read, understand, and edit them more difficult.

Since you readily admit that XMLized config files would be larger (in my words, bloated), I don't think we have any disagreement.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
crontab challenge (4.66 / 3) (#65)
by YellowBook on Fri Jan 12, 2001 at 02:53:14 PM EST

I'm not the biggest advocate of XML rc files, but here goes. I apologize for the nasty formatting, which is K5's fault, not mine. It looked great in Emacs ;)

<?xml version="1.0"?>
<!DOCTYPE crontab SYSTEM "/etc/dtd/crontab.dtd">
<crontab>
<environment>
<var name="SHELL" value="/bin/bash" />
<var name="PATH" value="/sbin:/bin:/usr/sbin:/usr/bin" />
<var name="MAILTO" value="root" />
<var name="HOME" value="/" />
</environment>

<events>
<event minutes="01" user="root" command="run-parts /etc/cron.hourly" />
<event minutes="14" hours="2" user="root"
command="run-parts /etc/cron.daily" />
<event minutes="22" hours="4" days-of-week="sunday"
user="root" command="run-parts /etc/cron.weekly" />
<event minutes="42" hours="4" days-of-month="1"
user="root" command="run-parts /etc/cron.monthly" />
</events>

</crontab>

It's not that bad. "Bloat" is 682 bytes, vs. 299 bytes, so a bit more than twice as big (and I could have made it a little bit smaller by dropping the environment and events containers). On the other hand, I find that I have to consult the man page every time I edit a crontab file (can never remember the order of the fields), and this would save me from that. Most of the increased size is not from the formatting bits, but from the field names, which do provide extra information.

[ Parent ]
Very close... but the cigar must be withheld (4.66 / 3) (#70)
by tmoertel on Fri Jan 12, 2001 at 03:41:38 PM EST

First, my hat's off to you for taking the now-infamous Crontab Challenge.

Second, your XMLization is seems reasonable except that the use of an attribute for the command-line portion of a crontab entry will prevent multi-line commands from being scheduled (owing to XML's attribute-value normalization rules (which will convert newlines into spaces)). Thus, you'll need to make the command portion an element unto itself, and that makes the XMLized crontab considerably more bloat-riffic, degrading S/N further.

And that's why the Crontab Challenge makes a particularly good example of why not everything should be XMLized.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
the crontab challenge (4.50 / 4) (#74)
by YellowBook on Fri Jan 12, 2001 at 04:49:08 PM EST

Well, it was just a first go. Probably you want to make the <event> element non-empty to fix this, so:

<event minutes="01" user="root" command="run-parts /etc/cron.hourly" />

would become:

<event minutes="01" user="root">
run-parts /etc/cron.hourly
</event>

Yes, it's less concise than the original. I'm not sure that's terribly important (at least to me). In the case of crontabs, I think the greater clarity is worth it (it might not be in some, or even many, other cases).

I also think that the real benefit of XML rcfiles would be in network effects. An XML rcfile won't be better than an ideal rcfile format for a given application (look at fetchmail's config file for an example). However, using XML rcfiles for everything would be better on average than using different formats for every rcfile for every application (easier for scripting changes, easier for authors to do right). XML isn't inherently better for this than any other standardized rcfile format for which decent libraries exit (windows 3.x .ini, NeXT PropList, etc); it's just the one that's most likely to happen. That is to say, universal XML rcfiles have a snowball's chance in Florida of taking hold, rather than a snowball's chance in Hell.

[ Parent ]
But machines do read them (4.66 / 3) (#61)
by 0xdeadbeef on Fri Jan 12, 2001 at 01:53:50 PM EST

... so why make it hard on both of us?

Heh, you assume the table is ordered, and that the columns line up.

Here's an XPath expression that will do it, regardless of the order and neatness of the file:

//entry[hour=4 and minute=35 and dayofweek=5]

Dang, that was easy. Now lets see you do that with grep and awk.

And the cool thing is that if you had a decent XML editor (like XMLSpy), you could tell it to table-ize the sequence of entry nodes, so that every entry becomes a row and every sub-element's value becomes a cell. There's your visual representation. Sort? Ok, on which column?

[ Parent ]
Raise your hand if you understand XPath (3.00 / 3) (#69)
by extrasolar on Fri Jan 12, 2001 at 03:31:03 PM EST

Raise your hand if you understand XPath. Or XLink. Or the syntax for a DTD. I sure don't know any of the above, and I've been trying!

If I was writing an application right now, I won't use these technologies because I don't understand them. I heavily dought that I am alone. In fact, I would say that even given a snowball effect of standardizing on XML, at maybe a third of the free software projects won't use XML. And this looses the benefit of a standardized configuration format in the first place.

The means to kill the end.

[ Parent ]
Right here! (4.00 / 2) (#86)
by LukeyBoy on Fri Jan 12, 2001 at 09:53:43 PM EST

My hand is raised. I'm a Java developer, and lead a team of programmers from fresh-from-university juniors to seasoned veterans, and we all use every technology you just quoted on a day to day basis. I'm guessing (I apologize if I'm wrong) that you're not heavily involved in the software development cycle. XML, DTDs, and XSLT have saved me an extraordinary amount of coding time. New programmers at my company enter with the usual schooling in C/C++/Java and pick up the uses of XML within two weeks of simply reading on their own.

The whole idea of a standard data containment format and the support systems to go with it (such as XLink) were designed with real-life software engineering needs in mind. And it's not only sweet for corporate-oriented production of software - XML is also simple to work with for home projects. Freeciv configuration data and game rules stored in XML? Why not?



[ Parent ]
(no subject) (2.00 / 1) (#91)
by extrasolar on Sat Jan 13, 2001 at 02:00:20 AM EST

True, I am not heavily involved in the software development cycle. In fact, my own programming projects are small, smattered, and rarely complete or useful. So I consider myself more of a user.

So, no offense taken :-)

[ Parent ]
No, that's not it at all (none / 0) (#68)
by extrasolar on Fri Jan 12, 2001 at 03:19:47 PM EST

The idea isn't to reinvent the wheel.

Just because there is this generic file format (XML) doesn't mean it needs to be used for all types of data.

The idea behind that example format that I wrote simply for discussion is two-fold:

Levels of Complexity

In order to get the advantages of a common format, you need to persuade as many application developers as you can to use it. Remember, different applications have varying needs as for their configuration files. Some need a simple key-value file. My example was the resolve.conf file that many users already edit by hand anyway. The GUI tools are simply (relatively) complex tools for doing the same thing. Others need a tree like structure. My example was defining menus for a window manager. Yet others require or want an extension language, like Emacs or the Gimp. They use Scheme. These are different levels of complexity that vary with the application. In my example, the RCL header signifies this level of complexity. I think that greater levels of complexity should "inherent" simpler levels, just for a common interface.

I think the XML format imposes a specific level of complexity that for some applications, is too high, for others, is too low. This is probably the leading reason for resistance to the XML idea right now. Granted, XML does allow for a lot of configurability in this aspect as well. For instance, DTD's can require a flat namespace for applications who only need key-value pairs. On the other hand, I know of no way of incorporating an existing programming language into XML.

Also, for applications with simple needs the application should be able to parse the format within itself without the need to call outside functions. This is just a simple truth. Simple things should be simple. Vi shouldn't have a dependency of libxml to install. GNOME might, but its scope is much larger.

Consolidating Existing Configuration Formats

Perhaps my example isn't a good example of this, but it was in my mind when I wrote it.

Take the many different configuration formats that already exist, and learn from them. The standardization of XML didn't consider this at all. XML was based upon studying markup, that is, written works. It is derived from SGML---it was made for documents. Already someone has had the example of procmail that doesn't read an entire file. With XML, you have to. Its part of the standard.

Now, the same thing will inevitably happen with any format we create. But we already have a vast base of formats to base our standards on. We know that there exists certain applications that would rather short-circuit some of the file. So we can indicate this in the header...or not specify this at all. Just as in my example there are #RCL:key-value; there could be #RCL:filter; for anything needs for filtering or #RCL:language(bash); for .bashrc.

The idea is to take what we already have and consolidate and make it into a standard so that you only need to learn a single format to configure your operating system.

I understand why you people advocate XML for this. But I think it is far more important for there to be a consistant format for configuration than for a specific format to be used. I have taken quite a bit of time studying what XML has to offer, and I still don't understand it all.

Verbosity is a problem. It takes longer to write and it looks inelegant to a lot of people. There is no ambiguity for my format either. Or any other configuration language.

Other problems in XML make it somewhat ambiguous as well. Such as, which do you choose? element or attribute? It has been already asked and people will continue to ask it and there is no good answer for it. It is one of them things that is left to the whims of personal preference.

XML isn't the answer, it is a answer. And the only real benefit you can give to XML as opposed to my format or any other standard format is that it already exists. And by consolidating the configuration formats we already have, you loose that benefit as well.

Best Regards,
Kevin Holmes



[ Parent ]
inertia's gonna getcha (4.00 / 8) (#39)
by clover_kicker on Thu Jan 11, 2001 at 09:20:50 PM EST

I've seen a lot of people floating this idea about "XML config files everywhere" for the last few months.

I'll come out and say that I don't think this would provide much benefit. For the sake of argument we'll just pretend that I like the idea.

The big practical problem here is that you've got a helluva lot of code to rewrite before we see any payoff.

Lots of folks will disagree with you about the value of rewriting $application's config files in XML. Are you prepared to fork (and maintain!) a version of cron-XML and named-XML and ssh-XML and sendmail-XML and qmail-XML and xntpd-XML and $deity knows what else?

That's just the big "server" stuff, what about all of my trivial "desktop" apps (slrn, xscreensaver, xmms, whatever) ?

I'm sure there are a ton of brain-dead apps/scripts that manually parse /etc/hosts or /etc/fstab or /etc/passwd instead of going through the appropriate system calls; you'll be breaking them too. Some of those brain-dead proggies are going to be big $$$ commercial apps whose vendors won't want to change, I almost guarantee it.

There is soooo much inertia here that it's going to be very hard to get the ball rolling. Certainly not impossible, but hard.

Hey, prove me wrong. Get in there and start coding. Next time I see you, I'll buy you a beer for every app/syscall that you XML-ize. :)



--
I am the very model of a K5 personality.
I intersperse obscenity with tedious banality.

I've already done it... (3.66 / 3) (#43)
by Brandybuck on Thu Jan 11, 2001 at 10:43:25 PM EST

I've already done this with one of my applications, and am in the process of doing it to another. I'm a big Qt fan, so as soon as it introduced QDom, I started switching over.

Two big advantages: 1) You only have to use one file format. Your rcfile and your data files can both be XML. 2) It can handle complex data better than the traditional "name/value" rcfiles.

It doesn't have to be XML. Just as long as it's human editable, suitable for both rcfiles and app data, and can handle complex data, it will work for me. That XML has some small measure of popularity is a plus.

Strategy design pattern? (3.25 / 4) (#53)
by slaytanic killer on Fri Jan 12, 2001 at 08:44:59 AM EST

Staring at the posts, it seems that people hate XML for the small tasks, and like it for the larger-scale ones. I wonder if people are at work on a translation layer for common & stable UNIX config programs, that will allow the program to read from either XML or plain text configs. That might be an interesting way to make Linux more "Enterprise-ready," whatever that buzzword means.

More and more overhead... maybe this could be done in a way that all the overhead is shifted to the case where there's an XML file to read.

Hmm, I will rephrase that (none / 0) (#60)
by slaytanic killer on Fri Jan 12, 2001 at 12:37:57 PM EST

This is for the person who modded me down; this is a response to what I guess to be his reply.

In reality, people will not add complexity to a unix util by making it agnostic WRT the config files it reads. Unix has a more quick-n-dirty philosophy. So, if you know something would scale rather large and messily, you might use XML config files; the overhead is probably worth it. If you want it to scale down, you'd use plain text.

However, when it is not clear how a user might have it scale... I once used jakarta.apache's Ant for XML makefiles. It was developed because of a number of perceived deficiencies in make, including the "dreaded tab problem." It scales very well; and a lot of text editors written in Java support 1-click Ant templates. With crontab, perhaps it's not that useful. But with applications that have data meaning different things depending on all sorts of factors, XML is far better at delineating things WRT meaning.

[ Parent ]
Why XML? (4.00 / 5) (#54)
by job on Fri Jan 12, 2001 at 08:46:32 AM EST

Give me one good reason to use XML instead of a readable format, like any of the ones proposed here. Microsoft INI is much easier both to parse and read, if your needs are limited to key-value settings (which is quite often the case).

The advantage of a standard (shared!) parser amounts to almost nothing, since the format will always be some sort of lowest common demoninator -- to complex for some applications, and much too limited for some.



A couple half-decent reasons ;) (3.75 / 4) (#55)
by 0tim0 on Fri Jan 12, 2001 at 10:32:29 AM EST

I think INI is a good format for simple things. But once you get to a point where your application is complex, you need something more structured. XML allows you to nest configuartion and have lists of items (where, in INI, lists of anything more complex than an atomic item are difficult).

I work for a commercial software company. Our core product always used an INI file for configuration. It was great at first. Now the complexity of the product (and its configuration) has grown and the INI file is the single point of confusion for our customers. The next release will switch to all XML.

The reason I like XML is that it is powerfull enough for complex tasks. I see, in the long run, having a single tool for configuration of your system. The configurator would be able to edit XML in a structured way for the degenerate case. But most programs could include another XML file with a schema that would give the configurator the information necessary to create a custom gui. The schema would also include structured help, so that the tool could tell you what is required, where and why.

Someday ;)
--tim

[ Parent ]

Configuration via S-expressions (4.00 / 2) (#76)
by noc on Fri Jan 12, 2001 at 05:06:41 PM EST

I think S-expressions (ie, the format of lisp programs) serve this purpose wonderfully. If all you need is a set of key-value pairs, it's no more difficult than an INI file:

foo: some thing
bar: 131
baz: no-doodles
vs.
(foo "some thing")
(bar 131)
(baz no-doodles)

One thing that you might notice, however, is that reading in S-expressions generally involves creating data structures, rather than just text. So in the above example, "some thing" is a string, 131 is a number, and no-doodles is a symbol. You can trivially make more complicated data structures:

(foo-translations) ; this defines a null set of translations
(bar-translations (("foo" "bar")
("baz" "bar") ; this should be lined up with the ("foo" "bar") pair but I can't use <pre>
(any-metasyntactic-name "bar"))) ; same here
(baz-translations custom-set-defined-elsewhere)

Note that a file of S-expressions is *not* like your .emacs file: that is a lisp program that is read in, then evaluated. A configuration file of S-expressions would be read in, creating a bunch of data structures, and that's it. I was quite amused to find the lispreader library <http://www.complang.tuwien.ac.at/~schani/lispreader/> because I've been keeping my own code around to read in files of S-expressions from C, and it hadn't occured to me that others might want to do the same thing.

Seriously, though, I'd encourage people to consider this as a configuration file format. It's pretty human-readable and can make arbitrarily simple or complex data structures. And balancing parens is not a problem in good editors (Emacs does this, or course, and I know that some vi clones can).

[ Parent ]
As good as? (2.00 / 4) (#66)
by darthaya on Fri Jan 12, 2001 at 03:05:26 PM EST

Depends on how you define "good programming language", and "suck less".

Java and C/C++, and so are millions of other programming languages, have their advantages and disadvantages, and they could be good and bad depending on the what purpose you use them for. What you were doing is just to put your side of view over all other people's.

What are you talking about? (4.00 / 2) (#98)
by DickBreath on Sat Jan 13, 2001 at 12:27:37 PM EST

Your post seems completely incomprehensible. If you're trying to make a point, I'm just not getting it.

The article has nothing to do with C++, Java or any other programming language.

The article is talking about using XML as a config file format. A good thing IMO. Isn't re-usability of software a worthy goal? He gives an excellent example (procmail/KMail). Many other examples could be pointed out.

As things stand today all the different tools have their own custom config file format. All of these different formats are hard to parse. XML is easy to parse, and powerful in it's capability to represent different things. Many different programming languages already have libraries to parse and create XML.

It seems to me that if (pick any random tool, let's say bind 9) used XML as it's config file format, then tools written in any programming language could easily understand, manipulate, and re-write it's config files. Just imagine how nice it would be if all the config files in the /etc directory were in XML format. You could still edit them with a text editor, but you could also use more powerful general purpose tools.

[ Parent ]
fwbuilder (4.33 / 3) (#79)
by krokodil on Fri Jan 12, 2001 at 07:04:05 PM EST

My friend is doing GPLed project related to this. He's admin and fed up configuring several firewalls using different tools and different file formats. So he created GUI, and XML config presentation. Now you create your config, save it as XML and compile it into your firewall cfg. files.

You can help in refining XML format, GUI, writing compilers for more firewalls (he have ipfilter, ipchains ready and iptable in works).

Firewall Builder Home page

ARRRRGH (4.20 / 5) (#101)
by ksandstr on Sat Jan 13, 2001 at 02:42:05 PM EST

You're forgetting the first rule of everything: "Always use the best tool for the job".

XML isn't the best tool for ALL configuration files, ALL protocols, ALL data - just as Java isn't the best tool for embedded systems, real time rendering of 3D polygonal views or anything with strict resource consumption or reliability requirements. Certainly you could wrap everything in XML, but what would be the point in that? XML is supposed to be written and read by programs, not people - the only reason that I can see for it being in plain-text format is so that you can still write a decent parser for a document format without reading the DTD. (ISTR the W3C is working on a standard for compiling XML into a binary format so that it wouldn't take up as much space - IMO gzip(1) takes care of that well enough already...)

Certainly some configuration files might benefit from using XML. Not all. For example, reading a simple 'name = "value"' type configuration file in perl would take all of five lines:

my %config;
while(<CFG>) {
    /^(.+)=\"(.+)\"$/;
    $config{$1}=$2;
}

And that's it. No elaborate parse trees, no nothing. For lower level languages such as C there are very capable configuration file parsing libraries that will take care of such things with minimal hassle.

Some have even implemented protocols (XML-RPC, for instance) in XML and HTTP. The truth of the matter is, if you care at all about performance you're going to use the XML+HTTP version of your protocol for testing and development only (reading XML for a human is pretty easy, so it's good for debugging when combined with Ethereal or some such) and write a binary packet-oriented version of the same protocol (without the XML semantics, obviously) for the production version.

In conclusion, my opinion on XML is that it's good for interchangable file formats (as long as you also gzip(1) the file, so it won't take up too much bandwidth and/or disk space), and that's it. It has already been hyped like no tomorrow, and I'm starting to see instances of the "ooh, it's using XML so it's gotta be good!" mentality where I'm working (in the suits, mostly).

(I might add that I'm no stranger to XML - I designed and implemented a custom 3D mesh format with it for an old 3d engine project that I've given up hope on. The format had quite sophisticated surface texturing parameters, all in XML. It was kind of nice, but the C++ class that converted the XML into an internal representation of the mesh was ugly to say the least.)



Fin.
Good idea, BUT... (3.00 / 2) (#104)
by WWWWolf on Sun Jan 14, 2001 at 08:09:51 AM EST

XML as the RCfile format is pretty good. In fact, I love XML and I have found it to be very convinient for many uses.

I just shiver when I need to implement anything that parses XML.

I have done, over years, precisely two "projects" that use XML (well, the first one was made before XML came about, so it uses SGML - not a big deal). The reason for this? XML/SGML parsing is difficult to understand, even when using a library to do the job.

In simpler formats (for example, when using Windows *.INI-style files) getting configuration information is easy: "Find section FOO and get key BAR value". In XML? "Parse tree, find branch that has section FOO and find sub-branch that has key BAR." There. Sounds harder, and is harder in practice too!

-- Weyfour WWWWolf, a lupine technomancer from the cold north...


Why? (4.00 / 2) (#106)
by jack doe on Sun Jan 14, 2001 at 06:43:54 PM EST

As far as I can tell, what you're addressing is the inability of programs to reliably use each other's configuration files. With arbitrary text formats, there's no easy way for one program to parse another's.

XML appears to fix this, I suppose, by storing the data in a commonly parseable format - but that's not enough. ALL the owner program's invariants and assumptions about its configuration files must be known to the other programs if they want to operate on those files without breaking them.

I suppose it might, in some cases, be possible to encode those assumptions in a DTD, but it'd be a whole lot of work for the maintainers of both programs. This also forces all configuration information into a rigid and verbose format which often wouldn't match the nature of that information very well.

You might get better mileage by agitating for popular programs to modularize their configuration parsers for use by others. You could read and write .procmailrc as well as procmail could if you were using its actual parsing code, perhaps kept in a shared library!



Time for XML rcfiles | 110 comments (109 topical, 1 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest © 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!