Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
HTML Standards?

By dannygene in Meta
Thu Nov 15, 2001 at 09:14:23 AM EST
Tags: Internet (all tags)
Internet

It seems many people are fighting for the web to move to a fully standardized HTML definition. The question is, what standard should we be migrating to?


In the past, my websites were nothing more than a hodpodge of HTML code, thrown together to get the browsers I wanted to view them to render them somewhat properly. Lately though, I've been trying to conform to the standards to make them as universal as possible. I've been trying to run my pages through the W3C Validator lately, with mixed results. I've discovered a lot of syntactical errors that I never knew about. But it changes depending on the DocType I use.

HTML 4.01 is the easy standard to use. But it looks like more and more people are moving to XHTML 1.0. But why? What are the advantages? I've read through much of the spec, but don't really understand what all the fuss is about.

Ok, so I take it for granted that I should use XHTML 1.0 because it's the latest and greatest, but will there be any advantage to recoding all those old websites to the standard? How many sites will actually even try to fully support HTML 4.01, much less XHTML?

Finally, what is the deal with the strict, transitional, etc. types of both HTML 4.01 and XHTML? It seems to me that there about 4 or 5 different "standards" that everyone should conform to, but if you fully support one, you're breaking another. What is the k5 community's take on all of this?

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
What standard to use?
o HTML 4.01 Transitional 24%
o HTML 4.01 Strict 10%
o XHTML 1.0 23%
o XHTML 1.1 12%
o HTML 1.0 8%
o Yo Mama 20%

Votes: 89
Results | Other Polls

Related Links
o W3C Validator
o spec
o Also by dannygene


Display: Sort:
HTML Standards? | 41 comments (32 topical, 9 editorial, 0 hidden)
it's really a Catch-22 (4.25 / 8) (#4)
by sfischer on Wed Nov 14, 2001 at 01:12:14 PM EST

Excellent question and I hope it gets discussed appropriately.

First, as far as technical differences between HTML 4.0 and XHTML 1.0, I believe they are identical specifications except that XHTML 1.0 is encoded in XML.

I'm a bit confused why you ask "how many sites will ... support" when a better question is "how man browsers" or "how many sites will adhere to". As the corporate desktop catches up with more recent versions of browsers, they'll be able to take advantage of the more recent versions of the standards, which will implicitly enforce greater adherence to the syntactic specifications. But on the flip side, the corporate desktop tries to stick with browser versions that can view the most pages. Both have to come up to speed together and that's a problem.

So while it's nice to be syntactically correct, there's little or no benefit for a site to adhere to standards, and actually some strong reasons not to adhere here to more recent standards. Pick something in the middle like HTML 4.01 Transitional and it will be readable by almost everyone.

-swf

Right (3.66 / 3) (#7)
by dannygene on Wed Nov 14, 2001 at 01:17:50 PM EST

I think that's what I was getting at, what browsers AND sites will move to this. Basically, is it worth it to try adhere to the standard, when it's going to be made obsolete in a few months? Case in point, XHTML 1.1 removes the "name" attribute, which XHTMl 1.0 deprecrated, which was a cornerstone of HTML 4.01! Argh!

Life is too serious to be taken too seriously.
[ Parent ]

Web markup standards (4.36 / 11) (#6)
by Prominairy on Wed Nov 14, 2001 at 01:17:04 PM EST

    I think the web markup standard to follow at the moment is XHTML 1.0, because of the following reasons:
  • XML will be part of a lot of a lot of Internet implementations eventually (personal opinion only).
  • XHTML 1.0 is XML compliant in that it any document conforming to the XHTML 1.0 standard is also a well-formed XML document (except for the definition as such).
  • XHTML 1.1 (which is much like XHTML 1.0 with support for modules) will be much more dynamic with regards to what content designers can do, but with the rigid standards of well-formed XML.
    I'd personally go for XHTML 1.0 Strict, in order to better separate layout from contents.

    The reason there are several flavors of markup standards is that essentially there needs to be; there isn't one standard that perfectly allows for creating any type of document without resorting to breaking the standard or submitting to redundancy. For example; XHTML 1.0 Strict is meant for documents that don't deal in itself with layout (if you want to use layout anyway, you can do so using CSS), while XHTML 1.0 Transitional supports a lot of layout specific elements. XHTML 1.0 Frameset should be obvious when to use. This is quite similar to HTML 4.01's versions "Strict", "Transitional" and "Frameset".

-~-~-~-~--~-~-~-~--~-~-~-~--~-~-~-~-
"Work like you don't need the money.
Love like you've never been hurt.
Dance like nobody's watching."

Strict vs. Transitional (4.00 / 2) (#10)
by SlydeRule on Wed Nov 14, 2001 at 02:31:20 PM EST

The problem with going with "strict" can be summarized quite simply: Netscape 4.

There are a number of areas where Netscape 4 will ignore the CSS style sheet, and you have to resort to "transitional" attributes in HTML to have the same effect. The most glaring one is that NS4 ignores img {border: 0}.

Once you have made the decision to go "transitional", you are also freed up to do some basic things for older, non-CSS browsers, like setting colors and background images. Those settings are overridden by the CSS file for CSS-compliant browsers, so they do not really hurt much.

[ Parent ]

Netscape Blah (3.00 / 2) (#12)
by dannygene on Wed Nov 14, 2001 at 02:52:12 PM EST

I gave up on supporting Netscape 4 a long time ago. Even my friends who used to be avid netscape-only coders have ditched it, because it's not flexible enough anymore. I say we destroy all copies of Netscape 4! Use no browser released before Y2K! Bwhahaha!

Life is too serious to be taken too seriously.
[ Parent ]

unix problems (3.00 / 1) (#14)
by Delirium on Wed Nov 14, 2001 at 03:31:43 PM EST

I think the problem is UNIX, particularly less-modern UNIX boxen, and particularly ones at academic institutions. My school, for example, has a setup with NEC thin clients talking to a Solaris server. Netscape4 runs tolerably on this setup. Netscape6 and IE/Solaris do not, so despite the availability of those options nearly everyone uses Netscape4. Hell you have to ask to get your default memory quotas (50 MB I think) increased just to get Netscape 6 to start.

This isn't a problem in the Win2k labs though, which run IE6 fine on ~400 MHz computers.

[ Parent ]

This is the sticky wicket. (none / 0) (#30)
by static on Thu Nov 15, 2001 at 06:36:30 PM EST

Netscape 4 can almost be ignored. But not quite. There are still platforms where Netscape 4, for all it's warts, is the best there is. This is something the Mozilla developers should try to face at some point.

Alternatively, how much memory does Opera take on Solaris?

Wade.



[ Parent ]
OS/2 (3.00 / 1) (#29)
by Bad Harmony on Thu Nov 15, 2001 at 04:31:36 PM EST

The problem is that Netscape 4.XX may be the only browser available for some platforms. I use it on my OS/2 systems.

5440' or Fight!
[ Parent ]

Re: OS/2 (none / 0) (#40)
by Moghedien on Mon Nov 19, 2001 at 10:34:44 AM EST

Mozilla is available for OS/2, as is Opera.

---
[57 68 6F 20 63 61 72 65 73 2E]


[ Parent ]
Netscape 4 and HTML (5.00 / 4) (#18)
by jesterzog on Wed Nov 14, 2001 at 09:41:39 PM EST

I gave up on bothering explicitly supporting Netscape 4.0 with pages that I make now. (I'm not a commercial designer, so take it in that respect.)

Especially since I hand-code most of my pages without CGI, I just got sick of annoying, hackey workarounds to make things compatible with buggy browsers. These days I pretty much stick with XHTML strict, and do any formatting or layout with CSS.

This cuts down dramatically on page size and transfer bandwith, too, because if you look anywhere on the web today something like 70% of volume of any commercial page is kludgey formatting crap, and it's the same on every page over the site. This way the formatting information gets transferred once before browsers can cache it, and everything else is just requesting the information.

It would work perfectly okay, except another one of the stupidities in Netscape 4 is that it acts as if it supports CSS and then it completely screws things up and breaks the page if it doesn't crash before that. I got around this by denying Netscape 4 any style sheets, so Netscape 4 users don't see much more than headings and paragraphs and other basic markup - which is often more useful in any case.

At first glance, there's no nice way (apart from cgi) to tell Netscape to ignore the style sheets. I got around this by discovering another netscape bug. If you use the 'media' attribute of the link tag and specify several medias comma-separated, Netscape 4 ignores the link altogether. So I just went through and made all the links media="screen,print", or something similar.

So voila. Popular browsers that actually do CSS relatively correctly get the CSS formatting, while Netscape 4 and other browsers get nicely marked up information. You're not blocking anyone from the information - only pretty formatting, which they don't really need anyway. Unfortunately I'm sure this wouldn't sell as easily to some people who want commercial websites.


jesterzog Fight the light


[ Parent ]
My view of the matter (3.00 / 3) (#8)
by boxed on Wed Nov 14, 2001 at 02:20:17 PM EST

I personally use XHTML transitional, the reason I don't use strict is...well, I haven't got around to it. XHTML is a Good Idea though since it's XML which is really easy for a program to parse in a nice way. This enables browsers for low-end devices basically.

XHTML separates syntax from semantics (4.54 / 11) (#11)
by jesterzog on Wed Nov 14, 2001 at 02:49:46 PM EST

HTML 4.01 is the easy standard to use. But it looks like more and more people are moving to XHTML 1.0. But why? What are the advantages? I've read through much of the spec, but don't really understand what all the fuss is about.

To me, it's similar to HTML and style sheets. Just as CSS and other style sheet languages help to separate the formatting from the information, XHTML helps to separate the HTML syntax from the semantics.

In general it is so much easier and simpler to write a parser for XHTML (or anything XML) than it is to write for HTML 4 or below. You don't have to worry at parsing time about silly semantic rules specifically to do with the language, like whether the start of a ul element means the end of the most recent p element.

Instead, you can read it in with any pre-existing XML parser and then check the object model to make sure it doesn't absolutely violate the DTD. (Like having an LI directly inside a paragraph.) Or even easier, there's XML parser code available that will just validate it with the XHTML DTD automatically. When it's this easy, it makes it much easier to write programs and utilities that deal with HTML and web pages, which is generally a good thing if you ask me.

This is also especially useful now that external formatting (including CSS) has been brought in. It makes it far more obvious to the browser whether the "header" class on that paragraph tag includes all of the elements following it, or if it stops at the next element. It says somewhere deep in the specs what it should do, but how often do most people bother to read the specs?

Also because it's all rigorously defined by the person writing the document (and that's forced), it's easier to be sure that whatever reads your document will be objectively able to understand the semantics of it. XHTML has effectively done away with lots of ambuguities that, while defined somewhere in the depths of the specification, really caught lots of people out because they just weren't clear.

It caused several browser manufacturers to begin ignoring standards in the interests of trying to display "what people meant" instead of what they actually said. In the long run that can just lead to more ambiguity, because nobody really knows what's going to happen when they design a page. This just makes it longer and more frustrating for everyone to do anything, whether it's writing a page or writing a web browser, or trying to interpret a page that's been badly displayed.


jesterzog Fight the light


SGML parsers? (4.00 / 1) (#19)
by driptray on Thu Nov 15, 2001 at 02:54:22 AM EST

Instead, you can read it in with any pre-existing XML parser and then check the object model to make sure it doesn't absolutely violate the DTD. (Like having an LI directly inside a paragraph.) Or even easier, there's XML parser code available that will just validate it with the XHTML DTD automatically.

I don't understand this. How is this easier than using an SGML parser (like James Clark's SP package) to validate your HTML 4 (or 3.2, or whatever) whenever you want?


--
We brought the disasters. The alcohol. We committed the murders. - Paul Keating
[ Parent ]
Yep (4.00 / 1) (#21)
by jesterzog on Thu Nov 15, 2001 at 05:12:19 AM EST

I don't understand this. How is this easier than using an SGML parser (like James Clark's SP package) to validate your HTML 4 (or 3.2, or whatever) whenever you want?

Thanks -- I'll concede this. I didn't think specifically about using an SGML parser, and you're right that the code's also out there to do it. I guess if the dtd is there then then you can validate your older html easily enough. I hope the rest of what I said about XML being generally simpler to parse and understand and use still stands, though.


jesterzog Fight the light


[ Parent ]
A counterpoint: XHTML *is* HTML (4.00 / 2) (#31)
by tmoertel on Fri Nov 16, 2001 at 01:39:48 AM EST

First, XHTML doesn't separate syntax from semantics any more than does HTML. XHTML is HTML; it's merely expressed as an XML application instead of an SGML application, that's all. (In fact, the subtitle of the XHTML specification is "A Reformulation of HTML 4 in XML 1.0.") As such, XHTML maintains all of HTML's unfortunate semantic problems: Content and presentation are still structurally intertwined to the same degree. You must still shoehorn your content into a one-size-fits-all structure.

Second, the benefits you claim to accrue to XHTML by virtue of its XML foundation all exist for HTML by virtue of its SGML foundation. For example:

Instead [of worrying about silly syntactic rules when parsing a document], you can read it in with any pre-existing XML parser and then check the object model to make sure it doesn't absolutely violate the DTD.
Likewise, you can parse any HTML document using any pre-existing SGML parser. And the parser will also validate your document against the appropriate HTML DTD, flagging any structural errors it may have, just as a validating XML parser would do for an XHTML document.
This is also especially useful now that external formatting (including CSS) has been brought in.
CSS works just as well with HTML as it does with XHTML.
It makes it far more obvious to the browser whether the "header" class on that paragraph tag includes all of the elements following it, or if it stops at the next element.
No it doesn't. The grammar rules that determine what a paragraph element is allowed to contain are identical for both XHTML and HTML. The extent of a paragraph tag (and any style class applied to it) is just as unambiguously specified by HTML markup as it is by XHTML markup.

The only time that ambiguity creeps into the picture is when the markup is invalid. And it's just as easy for one lazy author to create invalid XHTML markup as it is for another lazy author to create invalid HTML markup. Likewise, it's just as easy to ensure that HTML markup is valid as it is to ensure that XHTML markup is valid -- just run them both through validating parsers or use one of the many online validators.

HTML doesn't cause ambiguities or bad markup; lazy authors do. XHTML doesn't do away with lazy authors, nor does it make them less lazy. The problem exists equally on both sides of the fence.

XHTML has effectively done away with lots of ambiguities that, while defined somewhere in the depths of the specification, really caught lots of people out because they just weren't clear.
What ambiguities are you referring to? What lack of clarity do you mean? The grammars for XHTML and HTML are virtually identical, and they're both specified in the same way. The same rules that make an LI element illegal inside of a P element in XHTML markup make it illegal in HTML markup as well. Compare the DTDs: XHTML-1.0 Transitional DTD and HTML-4.01 Transitional DTD. What does XHTML let an author get away with that HTML does not? and vice versa? How is it any easier to validate an XHTML document against the first DTD than it is to validate an HTML document against the second DTD?

You're giving HTML a bum rap. XHTML does not solve HTML's problems. As the world migrates slowly from SGML to XML, SGML applications become XML applications. HTML becomes XHTML. That's it. Nothing less, nothing more.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
I disagree (none / 0) (#33)
by jesterzog on Fri Nov 16, 2001 at 05:10:38 AM EST

What ambiguities are you referring to? What lack of clarity do you mean? The grammars for XHTML and HTML are virtually identical, and they're both specified in the same way.

You might have missed my point. I've already replied to the SGML issue that was brought up by driptray's earlier comment.

Other than that, the point I was trying to make was that the strictness of XML's syntax makes it more obvious to XHTML authors and XHTML parser writers exactly what tree the document should be parsed into, without the complication of having to look at the DTD to get semantic information.

With HTML, there can be lots of ways to get the same tree, and because part of it is built on HTML-specific semantics it's not immediately obvious what the tree is supposed to be. Instead, you have to understand the meaning of each element, and its own specific properties of where it starts and ends in relation to other specific elements.

For example, a paragraph ends when a new paragraph is seen, when a list begins (I think), when a header is seen, and so on. It doesn't end when an image is inserted - even though by looking at the page it might sometimes appear intuitively that it does. Just by looking at it, is the text after an image going to be in the same block as the text before it? In HTML this isn't visually obvious, but in XHTML it's required by specification to be visually obvious. These are more naturally semantic rules based on element-by-element special cases, making HTML's syntax more complicated. They're not obvious without knowing the exact HTML specs, making it generally more difficult to write an HTML document that's marked up the way it's supposed to be.

Obviously this can be figured out by looking at the HTML DTD, and an SGML parser can and will validate an HTML document that way. Any experienced author will know to check the DTD or the specs, and they probably will. Lots of authors won't, though.

With XHTML not having more complicated semantic rules determining the syntax, an author is required to specify exactly what tree they want. If they weren't required to as with ordinary HTML, the parser might not return an error when they made a mistake - it could still accept it but use a semantic-based assumption to create an incorrect document tree. IMHO this makes it less likely in XHTML that there will be mis-communication between everyone involved. ie. Less ambiguity.


jesterzog Fight the light


[ Parent ]
Are you sure? (3.66 / 3) (#34)
by tmoertel on Fri Nov 16, 2001 at 09:34:23 AM EST

... [T]he point I was trying to make was that the strictness of XML's syntax makes it more obvious to XHTML authors and XHTML parser writers exactly what tree the document should be parsed into, without the complication of having to look at the DTD to get semantic information.
And, again, no it doesn't -- because in order know whether the document is actually correct (highly dubious in today's web authorship world) both authors and parsers must refer to the DTD. (BTW, DTDs don't provide semantic information. They provide syntax -- grammar. Semantics for SGML and XML applications are specified by human readable documentation. That's why, for example, we have the XHTML DTDs and the XHTML documentation. Syntax is specified by the former; semantics by the latter.)

Further, are you suggesting that HTML doesn't offer the same degree of markup explicitness as does XHTML? Can't authors just as easily close tags explicitly in HTML? SGML's markup minimization is optional. For example, if I'm the kind of author that finds this HTML markup confusing:

<p>Bob had three complaints:
<ul>
  <li>Dogs are stinky.
  <li>Dogs are noisy.
  <li>And dogs are trouble.
</ul>

I can just as easily write:

<p>Bob had three complaints:</p>
<ul>
  <li>Dogs are stinky.</li>
  <li>Dogs are noisy.</li>
  <li>And dogs are trouble.</li>
</ul>

XML's lack of markup minimization does not change the element grammar specified by the XHTML and HTML DTDs. Regardless of which DTD is used, it's still the author's job to understand the grammar before writing documents of that type. And the XHTML and HTML grammars are virtually identical. The same level of understanding is required for both.

With XHTML not having more complicated semantic rules determining the syntax, an author is required to specify exactly what tree they want. If they weren't required to as with ordinary HTML, the parser might not return an error when they made a mistake - it could still accept it but use a semantic-based assumption to create an incorrect document tree.
And just how is this supposed to occur? How is an SGML parser going to "infer" a bogus parse from a document that contains markup minimization? Markup minimization is specified in the DTD, i.e., it's part of the grammar. If an SGML parser notices and uses minimization, it can only do so in the context of generating a valid parse fragment w.r.t. the grammar. In other words, minimization cannot cause an "incorrect document tree."
IMHO this makes it less likely in XHTML that there will be mis-communication between everyone involved.
Continuing the example, if, as you suggest, an XHTML author can be "more obvious" about "exactly what the document should be parsed into", can he make it more obvious that he wants his list to be subordinate to his paragraph?

<p>Bob had three complaints:
  <ul>
    <li>Dogs are stinky.</li>
    <li>Dogs are noisy.</li>
    <li>And dogs are trouble.</li>
  </ul>
</p>

Is this what you mean? If so, then I must admit that XHTML allowed the author clearly to specify the precise parse tree that he wanted.

Too bad that the parse tree is illegal according to the XHTML grammar. Nevertheless, an XML parser will happily slurp up the bogus markup and attempt to use it for real work -- unless the parser happens to be a validating parser and the author happens to have supplied a doctype declaration.

And there's my point. XHTML makes it no easier to create correct documents than does HTML. Contrary to your claim, XHTML authors are still required to understand the grammar specified in the DTD in order to author documents properly. The authoring "win" that you ascribed to XHTML doesn't exist.

Likewise, the practical reality of XHTML is that the parsers must also read the XHTML DTDs to parse documents. Why? Because web-site authors are notorious for producing bogus markup. Without using the DTD, XML parsers cannot validate the markup and instead are forced to trust that the authors knew what they were doing. I hope we can agree that such trust is misplaced.

Thus there's no "win" on the parsing front, either -- DTDs are still required.

So where's the big win? I'm having trouble seeing it.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
win = simplicity/efficiency (none / 0) (#38)
by kubalaa on Sun Nov 18, 2001 at 06:25:54 AM EST

XML parsers can be much faster and easier to understand the SGML parsers. Which lets us add cool things like XSLT to the mix. It's sort of like the difference between using plaintext and Word .doc; the latter has more expressive power for authors, but the former can be easily and efficiently processed by a wide variety of freely-available tools.

[ Parent ]
easier to understand THAN sgml parsers (none / 0) (#39)
by kubalaa on Sun Nov 18, 2001 at 06:29:32 AM EST

I will Preview before Posting.
I will Preview before Posting.
I will Preview before Posting.
I will Preview before Posting.
I will Previe...

Teacher, can I go home yet?

The sooner you finish, the sooner you can go home.

[ Parent ]

Personally... (2.33 / 3) (#13)
by Delirium on Wed Nov 14, 2001 at 03:27:14 PM EST

...I use hand-coded HTML that I learned in some "how to code HTML" book I checked out from the library in 1996 and photocopied some pages out of. W3C claims that it's not valid HTML, but I assume they're retroactively changing the standards or something. I suppose it's reasonably close to HTML 3.0 + Netscape extensions.

XHTML as the wedge to ruin XML (4.57 / 7) (#20)
by driptray on Thu Nov 15, 2001 at 04:54:01 AM EST

I worry about XHTML. I have nothing against it in principle, but I fear that it will be used to break XML.

Currently, if an XML parser finds an invalid X(HT)ML document, it must refuse to do anything with it.

Compare that to what a web browser does when confronted with an invalid HTML document (approx 99.7% of the web). It does its best to "understand" it, and then renders it.

My prediction is that web authors will be about as careful about ensuring their XHTML validates as they currently are with their HTML.

And so we will have a web filled with invalid XHTML. And that means that the dream of being able to use a standard XML parser as the basis of a web browser will inevitably die.

Why? Parsers are required to reject the invalid X(HT)ML, but there will surely be a lot of pressure on parser developers to begin accepting the invalid XHTML and not let all those web developer's work get tossed in the bit bucket just cos it doesn't validate. As soon as one mainstream XML parser (think Microsoft) begins allowing invalid XML, the pressure on the others will be impossible to resist. After all, the Microsoft parser will "work" with all those pages that your parser doesn't "work" with. And so all parsers will inevitably allow invalid XML.

And this will be the death of XML.

Trying to shoehorn the current HTML mess into XML seems like something that is doomed to failure. Better to just begin using XML and begin the process of relegating HTML to the history books.


--
We brought the disasters. The alcohol. We committed the murders. - Paul Keating
why xhtml (none / 0) (#37)
by kubalaa on Sun Nov 18, 2001 at 06:19:16 AM EST

I completely disagree.
  • XHTML will be widely used -- why? XHTML does not replace HTML, it is parallel to it, so there's no reason to write HTML content in XHTML unless you specifically want features like enforced validity. I can't imagine Joe Blow deciding spontaneously to code his site in XHTML by hand.
  • XHTML, when used, will be broken -- Fortunately (or not), XHTML is rather more tedious to enter by hand than HTML. And I can't imagine anyone doing so. This means almost all XHTML will be authored indirectly and computer-generated. If an authoring tool can't produce minimally syntactically-valid XHTML, then it's not going very far.
  • There will be pressure to accept invalid XHTML -- At worst, the browser will simply recognize that the author is clueless and instead interpret the document as HTML. Invalid XHTML is indistinguishable from barely-valid HTML. By your argument, therefore, the existence of HTML should already have caused the "death" of XML.
All of the above is, I believe, derived from a misunderstanding of what XHTML and XML are for. I will illustrate by example. I have a content management system which does its work by parsing content into XML and then passing this through an XSLT pipeline. XSLT can spit out any format it wants, but it can only input valid XML. In a perfect world, the content would be devoid of pseudo-formatting markup like HTML. But in the real world, I probably have a lot of legacy HTML-formatted content that I still want to get through this XSLT pipeline. So what do I do: translate the HTML into XHTML. I still have my legacy formatting, but the XSLT won't choke on it.

Simply put, that is the only reason for XHTML. In a sense you were right, it tries to shoehorn HTML into valid XML. But sometimes this is necessary, and for those who don't care HTML will be around as a seperate language for a long time.

Not to mention that XML, as a concept, has been around for a long time and has no special attachment to XHTML or the web. It's kind of like saying that the bad spelling evident on "Kwik Mart" signs signals the impending death of written language.

[ Parent ]

XHTML 1.1 here (3.50 / 2) (#22)
by dorward on Thu Nov 15, 2001 at 05:26:03 AM EST

I try to keep my pages to the XHTML 1.1 spec (even though it isn't finalised at the moment). I've found that as it forces you to seperate style and content it becomes quite a bit easier to maintain a site (especially if you generate most of the XHTML on the fly with PHP).

I use HTML 2.0 (4.50 / 4) (#24)
by antizeus on Thu Nov 15, 2001 at 11:16:24 AM EST

I generally use HTML 2.0 for my pages, and run everything through validator.w3.org like all good-hearted people.

Sure, 2.0 may be simple, but I tend to use hypertext as a means to insert hyperlinks into text. I really don't see any point in jamming a bunch of fancy layout crap into my documents (though in a few rare cases I can see the need for something like a table).

I wish more people thought like I do. It would make for a cleaner, faster, more universally enjoyable (think of Lynx and small devices) web.


-- $SIGNATURE
What is it that you're writing in HTML? (none / 0) (#41)
by daveq on Tue Nov 20, 2001 at 10:23:08 PM EST

Tables in rare cases? It sounds like you make sites with no more than 10 pages and not very many types of content. The fact is that information is not one-dimensional, and the best way to convey it is telepathy. Failing that, you could try something that allows people to present content/anything else of interest in an effective and asthetically/psycologically acceptable way -- books, the lastest versions of (X)HTML, PDF, etc.

[ Parent ]
HTML 4.01 or XHTML 1.x (3.75 / 4) (#25)
by WWWWolf on Thu Nov 15, 2001 at 11:41:54 AM EST

My general rule (that I often break by mixing up things) is to use XHTML when I'm doing static pages, and "normal" HTML with dynamic pages.

Why, you may ask? Since the "normal" HTML is based on SGML, and SGML is somewhat more flexible than XML (used in XHTML), I can leave tags open and still keep pages valid HTML. Coding end tags in dynamic code is not for the Lazy...

But these days, my general tendency is to move entirely to XHTML, because that's the Future (...and all that blabber...)

Oh, I wonder how much trouble I'm causing to the glorious Implementors of Scoop and K5 with this comment, in regarts to move to XHTML and move away from "tagsoup" HTML parsers - all <p> tags in this comment are unterminated, but are on correct places SGML-wise =)

You know, I think life would be so much easier if the browsers would have only supported strict and valid HTML documents since the beginning...

-- Weyfour WWWWolf, a lupine technomancer from the cold north...


I suppose you're right (3.00 / 2) (#28)
by titivillus on Thu Nov 15, 2001 at 01:01:11 PM EST

I mean, without giving new users a lot of leeway to do bad HTML, the web wouldn't have grown the way it did. We wouldn't have New Media. We wouldn't have cheap hardware. We wouldn't have cheap DSL and cable modems. We wouldn't have had the internet bubble. It would've been a whole lot different.

[ Parent ]
Ah, come *on*... (5.00 / 1) (#35)
by WWWWolf on Fri Nov 16, 2001 at 09:39:42 AM EST

Come *on*. What I was saying was that the browsers would just not have displayed non-valid HTML pages at all.

Would you buy programs from a company that says "Our programs are not written in 'good' C++, we don't use delete() because that's for lazy fools"... or "We ran these Java .classes through 10 obfuscators - be glad that they run at all, fool. Your credit card, please."

And how do you fix an invalid HTML page? Make sure the documents are valid. Just keep tags properly balanced, don't use constructs that are completely invalid according to DTD... and use good design principles.

HTML isn't hard. Valid HTML isn't hard either.

Even dumbest HTML users could know something is wrong when their page can't be seen... and quickly learn to use the validators, and fix the issues with their markup. The fact that you don't need programming expertise has always been true with HTML. If you're going to learn HTML, just learn to do it properly while you're at it.

You know, fixing bad HTML isn't hard - much easier than fixing bugs from program code! Nowadays there's even automated tools to do that...

-- Weyfour WWWWolf, a lupine technomancer from the cold north...


[ Parent ]
I like good HTML (4.00 / 1) (#36)
by titivillus on Fri Nov 16, 2001 at 02:49:38 PM EST

I used to do it for a living. I had a scripts make a link for that page to be run through validator.w3.org. But some of the tricks we use now started as non-valid HTML, tables for creating layout being one. If the only thing you could do with HTML1.0 browsers was 100% valid HTML1.0 pages, then there'd never be HTML2.0 or later.

My aunt's ugly purple GeoCities page with the 17 pictures of her cats should be valid HTML. My desk should be clean. They're not. They're good enough, though. She should HTMLTidy her HTML. I should put my O'Reillys back on their shelves and not leave one open book on top of another open book on top of another open book in front of my monitor. It would be quick and easy for us to do. We probably will do something else instead. And I don't find a thing wrong with that.


[ Parent ]
It's all about compatibility (4.33 / 3) (#26)
by Canthros on Thu Nov 15, 2001 at 12:53:12 PM EST

XHTML is about forwards compatibility: it's XML, so documents written in XHTML now should remain readable by future browsers, even if the standard changes drastically, as there remains some hope that XML will remain for some time. The Transitional DTDs are about backwards compatibility: they're intended to allow easier migration of documents which rely on features which are deprecated or absent in the strict HTML 4/XHTML 1 standards. The strict DTDs are intended to be the current, preferred format. Use them in tandem with stylesheets, and you'll probably be in good shape.

So, there you go.



--
It's now obvious you are either A) Gay or B) Female, or possibly both.
RyoCokey
My take through personal experience (4.20 / 5) (#27)
by ttfkam on Thu Nov 15, 2001 at 01:00:33 PM EST

The first line of a valid XHTML document is

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
or
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
or
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">


The first line of a valid XML document is

<?xml version="1.0"?>


If valid and non-valid were easy to mix up, there might have been an issue. But as you can see, it would be quite difficult for someone to accidentally pretend they were aiming for XML/XHTML compliance if they indeed were not.

Parsers will know the difference and can use a different parser depending upon the content they find. If someone uses a WYSIWYG that pretends to support the standard but fails, it is the fault of that one editor, not the spec or the browser support.

If someone uses a WYSIWYG editor and then makes changes to the document by hand which breaks it, I have a less than impressive amount of sympathy. This would be akin to someone writing most of their document in MS Word and then opening up the document in Notepad to change a sentence or two. People should learn to realize that they're playing with fire.

Will people en masse use XHTML religiously and forget their past HTML transgressions, probably not. Will the quality of markup get better? Probably so due to the increased ease of making an compliant editor (strict standards are easier to implement than vague standards), because, let's face it, most people don't want to spend a day or two to get a basic understanding of HTML. They throw their hands up, say, "this is too tough -- I am not a computer person," and rush off to the nearest computer shop to buy Frontpage, DreamWeaver, etc.

If your page is X(HT)ML, other people can view your content more effectively. XML feeds are more and more popular for this reason; individuals can share data with each other with a minumum of effort. HTML is a pain and a half and requires creative use of a regular expression engine to get close to useful for syndicated content.

If you use HTML, your page will be useful to view with a particular group of (or only one!) browsers and only from the site from which it came. If you use transitional XHTML, the site becomes more efficient for most browsers and content syndication becomes more of a reasonable option. If you use strict XHTML, all new browsers should handle it (Netscape 4 can kiss my ass -- it's way past time for the rest home) and syndication becomes even easier. If you use a dedicated XML schema for the job at hand, content syndication becomes laughably easy, your content contains semantic information (you know what it means as opposed to just knowing how it looks), and you still have the option of transforming it to XHTML later if necessary (or SVG or VoxML or MathML or XSL:FO or...)

Bottom line: If you want your site to be the most useful and flexible, you use XML. If you don't care about it being as useful, you use HTML. XHTML is the middle ground.

If you want to see the big deal and "what the fuss is about," check out an XML-based publishing engine such as Apache Cocoon 2 (http://xml.apache.org/cocoon2/). See what they accomplish and then try to envision how the same would be accomplised with plain HTML.

If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
Keep it clean and simple. And do it right. (4.77 / 9) (#32)
by tmoertel on Fri Nov 16, 2001 at 03:01:09 AM EST

What standard to use? All of them.

I'm not kidding. If the information that you are trying to communicate is important, be prepared to express it in a variety of media -- HTML, XHTML, ASCII, and printed pages, for starters. It's not as hard as it sounds. The key is to retain the full potential of your information.

First, don't you dare throw away your information's true meaning by shoehorning it into a web-presentation medium like XHTML or HTML. Instead, create an SGML or XML application -- DTDs, schemas, and associated semantics -- to capture the full extent of your information's meaning. If you preserve its precious semantics and retain its value, then your information will always be available to convert into the media of the day, be it the HTMLs and XHTMLs of today or the UltimoMLs of tomorrow.

Once captured in its full glory, your information can be converted into a variety of media via straightforward processes. Even if you're targeting only one output medium, say, web media, this is the Right Thing To Do. No only to you buy insurance against change, but you also gain the benefits of application-specific correctness (via validation against your DTDs and schemas) and automation that provide high returns on initial investment.

Okay, enough with the Do-The-Right-Thing sales pitch. If you're only interested in the web of today, here's what I recommend:

  • Use HTML 4.01 Transitional. But keep it simple and use the more-recent features only when you have no choice.
  • Write your HTML by hand. Do not use a graphical HTML "authoring tool." They create unspeakably horrible HTML that looks pretty in common web browsers but destroys the structure of your information.
  • Keep your markup clean. Structured content belongs HTML markup. Presentation rules belong in stylesheets. (But make sure that your site still makes sense when stylesheets are ignored or unsupported by user agents.) Javascript doesn't belong at all. If you must use it, make sure that your site still works when browsed with user agents for which Javascript is disabled or unsupported.
  • Let your "success yardstick" be compliance with the subset of the W3C standards that are widely adopted. Do not use the "does it look good in IE6" test as your yardstick. If you stick to the fundamental standards, your HTML will look good -- in almost all browsers.
  • Validate. Everything. Don't put anything online until you know it's valid. The W3C has online validators for HTML, XHTML, and CSS. Use them. When it's this easy to check your work, there is no excuse for putting bogus code online.
  • Please think about people with disabilities. Navigation bars, tabbed interfaces, rollovers, and most of the eye-candy navigational crap that is so trendy with hip design folk these days is not only worthless but also obfuscating to people who can't use modern interactive browsers. Guess how "cool" a tabbed graphic-rollover navbar is to the blind girl who is forced to sit through forty seconds of worthless ALT tags being spoken to her as a text-to-speech browser reads them off as she visits each and every page of your web site: "Nav tab", "nav tab", "nav tab", ... And don't get me started on Flash. Do the right thing. Check out Usability.gov's accessibility resources for some great starters. Validate with Bobby to check for accessibility problems.
  • Doublecheck your site in Lynx. If your site doesn't work in a text-only browser, something is wrong. You screwed up. You violated some standard or ignored some accessibility concern. If you made a mistake, fix it. If the violation is "by design," ask yourself if your design is worth it. Are you sure that your design isn't locking out more users than you think? If your site breaks in Lynx, there's a good chance it will break in some GUI browsers as well, especially those for which Javascript, styles, or other features have been disabled. Don't forget that many corporations have security policies that mandate disabling these kinds of features on browsers company wide.
  • Doublecheck your site with cookies off. Do you require cookies? Why? Are they really needed for access to all portions of your site?
In a nutshell, that's what I suggest. It may sound like a lot, but if your keep your code simple, validate every step of the way, and work with accessibility in mind, it's surprisingly easy to get all of it right.

--
My blog | LectroTest

[ Disagree? Reply. ]


HTML Standards? | 41 comments (32 topical, 9 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!