Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

Qualities of a good URL

By Holloway in Technology
Tue Mar 13, 2001 at 04:54:34 AM EST
Tags: Internet (all tags)

URLs should be simple, concise, and designed to last forever - reflecting the page's content and hiding the implementation. The days of an URL mapping directly to a file are gone. So instead people treat the URL like command line - passing variables to a script that assembles a page - ending up with a bloated, confusing, and forgettable URL. Like, for example, the one you're looking at now.

Background: I've been working on a government site and have found guidelines on URLs to be entirely shoddy (so I wrote some). It's probably incomplete. I'd appreciate feedback.

Filename extensions
Modern content managers realise that as browsers ignore filename extensions (.php, .html, .asp) it is unnecessary and detrimental to use them on the web (webservers use MIME types). The URL becomes a legacy to uphold when users and search engines expect to find pages at that URL - especially when the URL is bound to a piece of software that may no longer be in use (such as .php, .asp, or even .html). Changing the backend system now involves breaking the legacy or making a convoluted redirection scheme... to the new technology that - in the future - you'll have to redirect from again. A good URL, I think, should abstract from technical implementation.

Apache's mod_rewrite URL Rewriting Engine can map external URLs to a different internal file (/about/tauranga to /about/tauranga.html). This allows you to hide the file extension. In three years XHTML will be much more popular and i'm sure that will have a successor - so serving .html is really missing the point. The technology has nothing to do with the content so remove it from the URL.

Link rot
Designing permanent URLs is a forethought most webmonkeys don't bother with. A good example of instituted link rot is Slashdot that keeps only the last month or two's live content before archiving it elsewhere. So the URL changes - bookmarks fail - search engines lose track. Again, redirection to the new location is an option. It's always an option. But a sound URL foundation saves future work (and irritates fewer users).

Simple and concise
KISS: Keep it simple, stupid. Having a short url is better than a long one. A popular example is to use '/job' over '/employment'. Domain names are well chosen and consist of a word or two, perhaps an acronym - but often 0092115-The_Movie_Troll_Character_Harry_Potter.html is served elsewhere on the server.

The URL isn't a command line
Often people use URLs like a command line and pass variables to a script that assembles a webpage. Currently I'm looking at http://www.kuro5hin.org/?op=submitstory - and that's on a good day. When bad it can stretch to http://www.kuro5hin.org/?op=comments&tool=post&cid=2&sid=2001/3/10/04346/3713#here or even longer to view a story's comments as nested - ordered by highest rating. One couldn't speak that URL out over the phone and it's not to readable. Having variable names there is useful if you want to rearrange the order - you know... for fun, but it's doesn't actually do anything.

These URLs can be simplified. Take kuro5hin's link to a page on IRC, ?op=special;page=irc when /irc is enough of a unique identifier. It's more readable and far easier to remember. A story's URL might be /?op=displaystory;sid=2001/3/12/162340/228 when /2001/3/12/[random?]/[PID] is much cleaner. A user should be able to edit the URL to /2001/3/12 and get a list of all stories posted that day. The URL can become a UI.

Misspelling and synonyms
A good webserver should catch misspellings or synonyms - taking you to the appropriate content or offering a list of near matches. Above I used /employment as an example of a bad URL but one should catch requests for /employment and redirect them to /job. Similarly, I don't know of any site's URL structure that isn't annoyingly brittle... where a /job/hamilton will work but /jobs/hamilton will 404.


Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure


URLs are
o Uniform Resource Locators. 35%
o used by Windows users, I type IP addresses... WITH port numbers. 12%
o fine as they are. 20%
o doomed like all hierarchies. Node based websites are ThE FuTuRe. 8%
o doomed so why patch them 3%
o irrelevant with future schemes brewing. 3%
o not what they were intended to be. They were meant to be hidden. 11%
o Did you know "Ed." is a rip-off of "Doug"? 6%

Votes: 98
Results | Other Polls

Related Links
o Slashdot
o Kuro5hin
o mod_rewrit e URL Rewriting Engine
o Slashdot [2]
o Also by Holloway

Display: Sort:
Qualities of a good URL | 89 comments (81 topical, 8 editorial, 0 hidden)
Software can help... (3.50 / 2) (#2)
by skim123 on Mon Mar 12, 2001 at 08:17:49 PM EST

There's a product for ASP/IIS sites, XBuilder (among others), that will take a data-driven site, one that's entirely built from a database, and turn it into static HTML pages. This is good not only for perf reasons, but also because then you can get rid of things like:


(where 34732 may be the ID field for the Comptuers category of products being peddled) and replace it with something like:


Of course this would be a bit difficult to implement in a site like k5, where you want to keep up with the discussion (as opposed to having to rebuild the static pages, say, every two minutes to keep the content fresh.

Anywho, perhaps rather than attacking the problem from the developer side you look at providing a similar software option that helps map simple URLs to confounding URLs?

Money is in some respects like fire; it is a very excellent servant but a terrible master.
PT Barnum

mod_rewrite (3.00 / 1) (#3)
by enterfornone on Mon Mar 12, 2001 at 08:29:39 PM EST

I think this is what mod_rewrite does. However I've never seen an example of it in practice so I wouldn't know. Perhaps someone could do an article on applying it to a specific situation (such as K5 URLs).

efn 26/m/syd
Will sponsor new accounts for porn.
[ Parent ]
yep. (4.00 / 1) (#17)
by Holloway on Mon Mar 12, 2001 at 11:01:21 PM EST

Yes, this is what mod_rewrite does. It distinguishes between internal and external URLs rather than just passing the url to the script as a command line. It rewrites an external URL into an internal one based on some rather simple rules.

I've been using mod_rewrite for a few months now. I'm no expert but I'll do some more reading and write an article on that (and suggest somethings for the k5 URL).

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

I'd love to see that article (none / 0) (#77)
by Luke Francl on Thu Mar 15, 2001 at 02:37:59 AM EST

Hi, I've been enjoying your responses to this article.

I'd love to see an article on mod_rewrite. I tried to use it a while back, and did some cool stuff with it, but couldn't get it loaded on our webserver for political reasons. *sigh*

If you included some performance numbers (I couldn't find *any*!) that would be wonderful. I had one rule in particular which seemed to bring our developmental server to a crawl!

[ Parent ]
Is it that big of a deal? (2.14 / 7) (#4)
by spacejack on Mon Mar 12, 2001 at 08:48:31 PM EST

How much of the web content out there is intended (or even desired) to be "permanent"? I'd guess very, very little. Is searching /. accurately with an external search engine all that important? :)

People generally click on links rather than typing them in. I almost never type them; even when testing my own scripts, I create desktop shortcuts after about the 2nd test. The only thing I really want to be concise about is the entrance page, after that, it's up to my design skills to help people find stuff.

If I want to send a bookmark to someone, I can automatically "send-to-recipient" on any competent OS. Or be a geek and and cut and paste manually. The only consideration might be the overall length of the link, so they aren't broken up by the email formatting (use small variable names :).

BTW, I noticed recently that another way of hiding a long address or CGI parameters is to put it in a frame target. :)

I kind of like the idea of being able to run a whole website (or sub-section) from a single entry point.. creating additional directories just to avoid parameters seems superfluous. It really becomes an application then instead of a bunch of static pages and locations. No matter how hard you try to design it "right" from the start, you're probably going to need to change your layout at some point.

Just my $0.02. Anyone have any more justifications for highly simplified URLs?

Well, (4.00 / 1) (#20)
by Holloway on Mon Mar 12, 2001 at 11:36:15 PM EST

Sure, people do click on urls more than they type them. But people do write urls - you do see them in many places - and this can make them better.

I'm not sure what you mean about creating directories. There's no need to do this as URLs have never directly mapped to the file system of the webserver (the webserver is in the middle and it translates the request). The URL can be rewritten for internal use so as to make user-friendly public URLs. Most URLs are forgetable and bloated and aren't that user-friendly.

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

ok (none / 0) (#45)
by spacejack on Tue Mar 13, 2001 at 02:56:27 PM EST

Sure, people do click on urls more than they type them. But people do write urls - you do see them in many places - and this can make them better.

Yeah just to be clear I do think that easy-to-remember URLs have their place -- entry points are the big one.

I goofed up on the directories thing though, that's true. I'm a bit of a web programming newbie and this isn't the first time I got confused by differences between the address and the server file system :)

[ Parent ]
one example :) (3.50 / 2) (#5)
by spacejack on Mon Mar 12, 2001 at 08:56:50 PM EST

I built a gallery viewer this weekend. Now try this link:

(a gallery I put up for a friend)

I just used frames to throw together the above script and an index script, while hiding the ugly URLs, while at the same time provide him with an easy-to-remember URL to pass along.

Still, the nice thing about explicit parameters is that you can direct someone to a specific piece of info -- you can't do that from my "framed" version.

By the way (none / 0) (#10)
by Anonymous 7324 on Mon Mar 12, 2001 at 09:55:55 PM EST

... this is OT, but those oil paintings rock! :)

[ Parent ]
he thanks you (none / 0) (#11)
by spacejack on Mon Mar 12, 2001 at 10:23:34 PM EST

I'm sure. I'll pass along the complement. :)

[ Parent ]
Definately. (none / 0) (#13)
by vectro on Mon Mar 12, 2001 at 10:30:17 PM EST

Very much agreed. He should get an online store!

“The problem with that definition is just that it's bullshit.” -- localroger
[ Parent ]
The URL is too a command line (4.75 / 4) (#8)
by anewc2 on Mon Mar 12, 2001 at 09:40:39 PM EST

Personally I like the idea of writing a Python script to encode a database query into a URL and getting back exactly the data that I want. Yahoo, for example, maintains a free, public database of historical stock market prices that can be accessed in exactly this way and I use it often. How will your brave new world meet my needs?

The world's biggest fool can say the sun is shining, but that doesn't make it dark out. -- Robert Pirsig
You're right. (4.00 / 1) (#19)
by Holloway on Mon Mar 12, 2001 at 11:33:17 PM EST

By all means put queries in the URL when you're 'querying away', and I should have said that. But for general use this current URL /?op=comments&tool=post&cid=8&sid=2001/3/12/20643/1807#here is excessive and bloated.

Could you show me where the historic stock market prices website is so I might see the types of URLs it deals with? Or, better yet, some example query urls.

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

For example (4.00 / 1) (#26)
by anewc2 on Tue Mar 13, 2001 at 12:48:23 AM EST


gets daily (g=d) open, high, low, close and volume for Microsoft (s=msft) from 12/9/2000 (a/b/c) to 3/12/2001 (d/e/f) in comma-separated-value (x=.csv) format.

Or, check on current market action with


which gets last trade, change, daily and yearly ranges and other stuff, together wth a small 5-day intra-day graph (d=2b) for NASDAQ composite, S&P 500, DJ 30, NASDAQ volume and NYSE volume (s=...).

The world's biggest fool can say the sun is shining, but that doesn't make it dark out. -- Robert Pirsig
[ Parent ]
Slightly more readable. (3.00 / 1) (#31)
by Holloway on Tue Mar 13, 2001 at 05:15:04 AM EST

I'm not familiar with stock quotes and which values each page will require. The .csv filename here is for immediate download (not for browsing or use online). There is an http for redirects before a download though, I think. If so the table.csv bit can go (if not the x=.csv is redundant). I assume every daily value would need a start and end date - so we could incorporate that (although I don't think this would be a good idea). I think the URL is too short and that it's so abbreviated it's too simple - the variable names are confusing and kinda unreadable. Leaving g=d&q=q&y=0 (which, aside from g=daily, I don't understand to know whether there's redundancy), the date, and the export format. Anyway,


Or slightly more readable,


The second example is a little trickier, even more so as i'm not familiar with stock market notation and where the URL has redundancy. It's not parent/child relation like your previous example and each stock bares no relation to the next. /stocksymbol/stocksymbol/ wouldn't be appropriate as the initial stocksymbol isn't a parent of last stocksymbol. These stocksymbols are best left alone as variables. The "q" is pointless so that can go. As for the last trade, change, daily, yearly ranges "and other stuff" and five day intra-day graph... I can only guess as to what the cryptic URLs values mean. Is the %5 a stock related value (five days?) or a character? (they all seem to have a redundant "%5E") Are the last trade, change, daily, yearly etc. by default or has the URL selected them?

If I were to guess the meaning of the URL I could simplify it as,


== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

Greatly less parsable (none / 0) (#50)
by anewc2 on Wed Mar 14, 2001 at 02:38:11 AM EST


table.csv names the cgi script that I am requesting to run. Not the best name, I grant you, but you need something to identify the script as I am sure there is more than one available on that server.

Putting a date into three variables instead of one makes things a little more complicated for the client, and a little simpler for the server, which doesn't have to parse 12,9,2000 into three strings. You seem to think this is a bad thing. But it is the server that is the bottleneck, not the client, and by offloading processing to the client, the server runs more smoothly for everyone. In this case it probably doesn't make much difference, but the principle -- let the client do as much of the work as possible -- is well-established by now.

The g option has four possibilities: daily, weekly, monthly, yearly, encoded as d, w, m, y.

The y option is used to cut off the table y days before the end date. I have no idea why this would be useful, but I assume it is in there because someone wanted it.

About the q option I am qlueless.

The z option seems to keep track of the previous stock you got a table for. Again, I don't know the reason for it, but there probably is one.

The x=.csv option in fact seems to be redundant. I think I tried x=.tsv to get a tab-separated file, without success. But all this playing around was several months ago so some of these details I may have wrong.

Earlier you complained about URLs that were too long. Now you are complaining that they are too short. What defines the middle ground where it is just right? Shouldn't this depend on the tradeoff between mnemonic power and name collisions, rather than one person's aesthetic sensibility?

The original URL is superior to your alternatives because it conforms to a simple pattern:


This is easy to parse, and to generate, automatically. Your alternative omits the script name, puts one parameter in the place of the script name, and another parameter in the place of the directory name. It's all too arbitrary.

You want URLs to be a human interface, but they will still also need to be a machine interface as well. How will the server deduce from your URL what script it is supposed to run and what the parameters should be? What are your rules for deciding how URLs should be formed? Who will write the programs to interpret the rules?

Nobody needs this URL to be a human interface. I wrote my own human interface for the symbol, frequency and dates. Translating this into a URL was the easiest part -- took a couple of minutes -- precisely because URLs have a rigidly defined structure. (Plus some really nifty work by the Python library authors.) Most people generate these URLs by filling out HTML forms on Yahoo's site, and letting Javascript construct the actual URLs. Only the developer needs to know the details of the URLs. The user will never see it unless he casts his eye to the top of his browser window. Nothing's broken here. No need to fix it.

Similarly for the URLs here on Kuro5hin. I rely on Scoop to show me the page I want based on my mouse clicks. If I want to save the URL I can bookmark it, or copy it and paste it into an <A> tag somewhere else. This is independent of the length of the URL.

If I want to go to the site of Evil Inc. I type www.evil.com (arguably too short due to a misguided attempt to map domain names onto trademarks), or use a search engine.

I fail to see the problem you are trying to solve.

The world's biggest fool can say the sun is shining, but that doesn't make it dark out. -- Robert Pirsig
[ Parent ]
NS (none / 0) (#55)
by Holloway on Wed Mar 14, 2001 at 05:27:06 AM EST

"How will the server deduce from your URL what script it is supposed to run and what the parameters should be? What are your rules for deciding how URLs should be formed? Who will write the programs to interpret the rules?"

Um, this is no new initative by me with new rules for submitting variables. Since the beginning of time people have organised parent/child relationships by deeper and deeper /'s (sometimes it maps to a directory, sometimes it's values for content to be pulled out of a database). These would be others' rules on what should go in the blah/blah/blah bit and how flat data should be ?blah=1&blah=2.

The advantages of a blah/blah are that they are significantly more usable and hackable than a ?b=1&h=2 (even if only to just shorten it). Most folk - upon getting a 404 - have chopped off the end of the URL to see if there's anything remaining of the site. So I don't think that calling the URL an interface is going too far.

When using a script a person needs to know the variable names, appropriate values. When using a blah/blah scheme instead of knowing the variable name you just need to know the order.

Any singular parent/child (such as K5's /year/month/day/SID/ and possibly a PID) benefits from being a blah/blah/blah for all the usability reasons I and others have stated.


Shorter is better than longer, but too short isn't simple. Keep It Simple, Bob Saget. Having variables and values that are too short makes them cryptic. I believe all URLs should be readable and understood so they may be used by.. users... for usability reasons... use use use.


Nobody needs that URL to be a human interface? Nobody? Really?

Assuming the URL is well-defined - what is lost by making it a more human interface?

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

assume makes an ass out of u & me (none / 0) (#64)
by anewc2 on Wed Mar 14, 2001 at 11:38:14 AM EST

Assuming the URL is well-defined ...
There indeed is the sticking point. I have seen nothing yet to justify such an assumption.
what is lost by making it a more human interface?
Simplicity at the server end is what is lost. When you make the client end looser and more user-friendly, you necessarily make the server work harder. The whole net will slow down if servers have to add a whole new layer of processing to translate these new URLs into the format they already understand.

The world is not going your way. XML chucked out those parts of SGML that made concessions to human readers, and is explicitly not a human interface. What will you do when these complicated URLs you hate become even more complicated (but more powerful) XML-RPC calls instead?

Even the W3C knows enough to try and build URNs on top of URLs, rather than try to make a single syntax do double duty for both humans and machines.

The world's biggest fool can say the sun is shining, but that doesn't make it dark out. -- Robert Pirsig
[ Parent ]

Oh... we disagree! (none / 0) (#68)
by Holloway on Wed Mar 14, 2001 at 05:42:22 PM EST

The entire net would slow down with my new fangled ideas? Oh please - rewriting a string takes less time than starting a process. Unless you've got some stats to back up your claims then I'd think you were just scare mongering.

More to the point (and I seem doomed to say this) it's not a new idea. Many sites have a directory structure that's actually dynamic. A directory structure doesn't necessarily mean a filesystem - it's always been an abstract way of sectioning data into parent/child relationships.

I don't "hate" complex URLs. But these complex URLs better justify being complex or they're complex for no good reason.

The world isn't going my way? Bah... inane inane (show us your URNs then) and rather than posturing evidence - I'm sure we could both find a lot... hell, it's the internet, any opinion has pages of data to back it up - lets make this interesting -- $1 US - payable in five years (at the value in five years). So, game? We each pick five popular sites (commercial or not) and see whether they modify their URLs to be more human usable (or more computer usable).

Are you saying that an URL like slashdot or k5's is good - you never did clarify that.

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

No redundancy (none / 0) (#51)
by anewc2 on Wed Mar 14, 2001 at 02:44:53 AM EST


Here q is the name of the script to be run, a mnemonic for current Quotes. Not pointless at all, unless you want to limit all servers to exactly one cgi script.

%5E is the URL encoding for the caret (^). It is there because Yahoo uses it to identify an average or an indicator as opposed to a stock, and I am looking for the symbols ^IXIC, ^SPX, ^DJI, ^TV.O and ^TV.N. Stock symbols are assigned by the markets, but there are no standards for averages and indicators, so every data source makes up its own. You can't use something that could be a stock symbol, because tomorrow one of the exchanges could list a new stock with exactly that symbol.

d=2b identifies the output format out of a list of eight or ten possibilities (big and small graphs over different time periods).

I think this URL is minimal as it stands.

The world's biggest fool can say the sun is shining, but that doesn't make it dark out. -- Robert Pirsig
[ Parent ]
Not to compete with Yahoo, but... (none / 0) (#53)
by kellan on Wed Mar 14, 2001 at 03:19:32 AM EST

This is a very good URL, however, none of what we are discussing is new to Yahoo. Jerry wasn't the only guy with a list of links, Yahoo got to be where it is by grokking the web at a level most people don't.

You could make this URL more readable, and more guessable, but, you're right, it would require making it a longer. Afterall, that URL is packing a lot of information!




this preserves a short url, while being immently more guessable.

I also question display the carets in the URL. It seems to me that this is an internal implementation detail and not one the user should have to worry about. I could however be mistaken on this, not being a user of the site.


[ Parent ]
guessable is not the point (none / 0) (#54)
by anewc2 on Wed Mar 14, 2001 at 04:10:56 AM EST

Why do some parameters look like parameters, while others look like directories? And how is the poor server to tell the difference? Guessability is fine for humans, but I don't want the server to have to guess what I am asking for. I want it to know.

The URL spec is an artificial language designed to communicate with machines. You can play games with URLs all you want. But until you can specify a syntax that a server can use to interpret these free-form URLs of yours, playing games is all you are doing.

The world's biggest fool can say the sun is shining, but that doesn't make it dark out. -- Robert Pirsig
[ Parent ]

Usability ain't (just) a game. (none / 0) (#56)
by Holloway on Wed Mar 14, 2001 at 05:54:51 AM EST

The order, or rather, the parent items (like a directory tree) define the variable names for the script. I have done this with the url rewriting software for Apache that I linked to in the article (and, I'm told, this can be done on MS platforms).

The overhead of these rewriting/translation URLs is not much more than processing any string on a set of rules. It's very minimal.

There's a reason why pseudo-directories are prefered over ?year=2001&month=1&day=1 but only for certain types of data structures (particularly singular parent/child -- for sections and their items -- which occurs often).

Similarly ?a=1&b=2 has it's place but they are often overused - scoop and slashcode are obvious examples.

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

"She's a tree" (none / 0) (#57)
by Holloway on Wed Mar 14, 2001 at 06:22:49 AM EST

OK, ignore the parent items bit - that's confusing. It's all to do with the order. By assuming an order you can assign values to the appropriate variable (or 404).

k5 is incredibly simple to do. If it begins with a number then it's a story. If it begins with a letter it's a special page (like /IRC or /FAQ).

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

What about missing variables? (none / 0) (#59)
by anewc2 on Wed Mar 14, 2001 at 10:45:43 AM EST

Passing variables by position is inferior to passing them by name in the case where you want to omit a variable. Suppose there's some script somewhere that wants to distinguish between ?a=foo&b=&c=bar (which you would translate to /foo//bar) and ?a=foo&c=bar (which you would translate identically). That script would break under your scheme.

You want to enforce the use of the empty string as the default value for a missing variable. Even if that had been a good idea at the beginning (and I would argue that it was not), it is certainly a bad idea to change the rules now.

Passing variables by position is also inferior in the case where you have a large number of them. In your way, I have to count them from the beginning to find out what a particular value means; a tedious and error-prone process if it is after the fifth or sixth place on the list. When the variables are named right there in the URL, I don't. There is more to readability then length. Explicitness is good too.

The world's biggest fool can say the sun is shining, but that doesn't make it dark out. -- Robert Pirsig
[ Parent ]
stop before you hurt yourself (none / 0) (#63)
by kellan on Wed Mar 14, 2001 at 11:32:27 AM EST

H, you were doing so well, and then that last line seems to imply that you are guessing what to do based on the first letter on a variable.<shudder>

Part of the point of this style of url building is it is <b>more</b> deterministic then some cgi, with a 255 string of args passed into it, not less.

Btw. do you really do arguements, and then application in your urls? That seems kind of backwards. Putting the app first in the directory structure is what people expect. Think about it, if you don't find what you want on the end of a url, and you start manipulating it, you start at the end, and work up. Therefore the variables most subject to change should also be at the end of the URL.

Also, I occasionally use ProxyPass to send some applications to an entirely different machine, and again it really helps to have the most fixed, most descriptive part of the application up front.


[ Parent ]
*wink* (none / 0) (#69)
by Holloway on Wed Mar 14, 2001 at 05:51:09 PM EST

All I can say is that I was very tired last night. I was drunk. I just broke up with my girlfriend. They're going to take my house ... oh,

The number/letter would work with all the pages on K5 - I assume that's what I meant. It's not something you would ever implement - but it would work.

Think about it, if you don't find what you want on the end of a url, and you start manipulating it, you start at the end, and work up. Therefore the variables most subject to change should also be at the end of the URL.

How is what I do not consistant with this? Take K5... again... /year/month/day/sid/[pid/]?settings Or were you perhaps talking about my muddled stock URLs?

do you really do arguements, and then application in your urls?

I'm a little confused here, so far as I see it there is no application in the URL. When it comes to URLs everything's an argument - the application is chosen depending on the arguments in the URL (it's what webservers do).

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

web servers are smart (none / 0) (#61)
by kellan on Wed Mar 14, 2001 at 11:25:58 AM EST

Well there were 2 points in Holloway's original story. One was that it is important to hide implmentation details, which Yahoo does, if not completely, then at least sufficiently. The second was that it increases usability.

URLs can go back to being machine readable line noise when we get URNs, which is to say never. TBL didn't anticpate everything, and the web has to evolve to how it is used.

You are giving the server <em>way</em> too little credit. It is trivial to parse this url into the original URL you submitted or into something like:

Perhaps this is where communication is breaking down. Take a look at a site like advogato.org, they have urls in the form of "http://www.advogato.org/proj/Subversion/". This is a dynamically generated page which maps internally to advogato.org/proj.cgi?name=Subversion

Who said free-form? I'm not in any suggesting free-form URLs. It is not up to the server to guess what the user is trying to accomplish, but to the user to understand the interface provided by the URL. I claim that mine would be easier to understand, and iterate over.


[ Parent ]
web servers are busy (none / 0) (#65)
by anewc2 on Wed Mar 14, 2001 at 12:24:04 PM EST

You are giving the server way too little credit. It is trivial to parse this url into the original URL you submitted
Trivial to do it once, maybe. How many times per second will it have to happen?
Who said free-form?
I did, based on your example which gave semantically identical items differing syntax (parameters as directories and parameters) and gave semantically different items the same syntax (script name and parameters as directories).

If your variant is really well-formed and won't add significantly to server load, then I won't object, provided I can still use the old URLs, which I still find simpler and clearer (and more powerful) than your alternatives.

The world's biggest fool can say the sun is shining, but that doesn't make it dark out. -- Robert Pirsig
[ Parent ]

no, i meant trivial (none / 0) (#66)
by kellan on Wed Mar 14, 2001 at 01:53:41 PM EST

First, a caveat, all my comments are based on the assumption you are running Apache, there are some good reasons for running some of the other web servers (zeus, khttp) but not many, and they are few and far between.

Given that you are running Apache, this really is trivial, no different then Apache's own internal mechanisms for munging urls into file names.

<blockquote><em>semantically identical items differing syntax (parameters as directories and parameters) </em></blockquote>

You think of these items as semantically identical because you know the underlying implementation. However the different parameters describe very different things. One set of parameters describe the presentation of the data, while the second set describe the actually data.

I choose to interpret the first set of parameters, those that describe the layout, as describing differently applications, while the second set, the stock tickers, being parameters that could be passed to a number of applications.

How is the information, "I'm looking up stock quotes", "I want to display 5 days at a time", and "I want to display info the S&P index", identical? They seem to be very different concepts to me.


[ Parent ]
Different issues (3.50 / 2) (#9)
by adamsc on Mon Mar 12, 2001 at 09:55:03 PM EST

Avoiding link rot should be mandatory. Using mod_rewrite to make URLs friendlier for search engines and less likely to self-destruct if a script changes is also good. (esp. if it keeps the length under 80 characters to make it easy to send in email or remember) However, I don't see any point in trying to turn the URLs into an alternate interface. Making easy links for a story is a good idea (if nothing else, it makes it easier to send in email) but I don't see why anyone needs to care about the URLs of pages used in the process of posting a comment. Similarly, unless you use Apache's mod_speling, I doubt it's worth the time to map /jobs to /job - there are simply to many more important things for an admin to do.

(nod of the head, pinochio style) (4.00 / 2) (#33)
by jxxx on Tue Mar 13, 2001 at 06:36:54 AM EST

Trying to cover synonyms is asking for endless work. If you encourage people to abstract individual words into concepts, translate, and feed them into a computer, shouldn't you include Arbeit, and trabajar, and work, and career, and ...
The list of possibilities goes on an on. You either end up with endless work, or a favored audience.

[ Parent ]
thesaurus (none / 0) (#71)
by Holloway on Wed Mar 14, 2001 at 06:18:33 PM EST

Initially I had that opinion too, but then I didn't. Here's why.
  • 1. It doesn't need to be a human process to choose choose all the synonyms. We have thesaurus's (mod_thesaurus? ;) so - upon a 404 - one could provide some near meanings rather than just near spellings (near spellings is popular - so why not this?).
  • I forgot to say the situation in which this would happen. Memory drift is a recognised illness or condition (in that people suffer from it). I, on times, have exchanges words in my head for similar ones. I think it's safe to say this would apply to URLs like anything in people's memory. Though I have no data on this, and I wouldn't know where begin.

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]
...but related (5.00 / 1) (#52)
by kellan on Wed Mar 14, 2001 at 02:51:57 AM EST

I just got done converting a website from ASP to JSP. The site hadn't been built with an eye to the future (does that even need to be mentioned if it was built with ASP?) and therefore used urls which explicity pointed to the ASP files.

Not only this but all the ASP files had been kept in a single directory in order to "keep the urls simpler", and to make it easier for the developers to forward around connections in these convulted logic chains.

Converting(rewriting) the site to JSP for security and scalability, and adding an information hierarchy was an immensly useful, and rewarding 3 month tasks. Going back, and figuring out how to prevent link rot was a frustrating an ugly task that took almost a month on its own.

Simply creating aliases for all the old ASP files didn't work, because an old ASP file used to do a dozen or so different (sometimes radically different) things based on a series of obscurely named parameters. Now those different functions were broken up over logic names and directories.

6 months later, and we're still finding intermittent bugs with that part of the system.

Thats why having clean urls prefents link rot. The accessibility question is a totally separate issue you're right. It just happens to have the same solution :)


[ Parent ]
Not sure if this is an issue or not... (2.00 / 1) (#12)
by omegadave on Mon Mar 12, 2001 at 10:24:05 PM EST

...but, wouldn't that keeping multiple web pages hard? Personally, I like the ability to specify what exactly I'm goin to, whereas what your proposing seems to eliminate that luxury.

An example would be looking if I was looking at a archive of comics at, say, Penny Arcade. Now say I see one that I think a friend of mine would like. With the current system, if I wanted I could send the specific location of that comic, i.e. like here. How exactly would I do that with the new system? For some cases, using this system makes sense

Of course, f that is actually not a problem, and I have just misunderstood, than I would have no problem with something like this.

Cleaner URLs (4.33 / 3) (#15)
by Holloway on Mon Mar 12, 2001 at 10:47:37 PM EST

Internally it could use the same URL, but instead of http://www.penny-arcade.com/view.php3?date=2001-02-26&res=l you would access something like http://www.penny-arcade.com/2001/02/26 or perhaps http://www.penny-arcade.com/2001/02/26/?highres for the high-res version (assuming it defaults to low-res).

There's no need to create many webpages. It's the same thing internally - but this way you're not tied to any software (.php - or worse a version of php, php3) and it's a more readable URL. You can speak the URL, or publish your URL on paper (some urls reach 200 characters - it's unnecessary).

When you have an URL with 2001/02/26 you could remove the /26 and expect to get list of all comic strips from 2001/02. There's no reason it couldn't be done with the bloated URL but it just another way URLs could be more flexible.

Perhaps http://www.penny-arcade.com/strips/2001/02/26 would be better suited on the type of content on that site (I'm not too familiar with penny arcade or if it has other sections, I read it occasionally).

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

Excellent! (4.66 / 3) (#21)
by Luke Francl on Mon Mar 12, 2001 at 11:39:42 PM EST

It's about time this issue was addressed! When ever I create a new webpage, I try to think about the usability of the URL. Jakob Nielsen has covered just about everything I have to say in URL as UI (amusingly, he does not follow his own advice. The URL of that page is "http://www.useit.com/alertbox/990321.html"). See also reader responces to that Alertbox, as there is a lot of good commentary.

I have two major pet peeves with URLs:

  • I hate mixed case URLs. Dave Winer does this ALL THE TIME, and it drives me nuts. What is it with OO programmers and mixed case? Instead of "mixedCaseUrl", I prefer "mixed-case-url". It's easy to say and easy to type.
  • Forcing inclusion of "www." in your domain name. It is so annoying to type in a perfectly valid domain name and get an error.
My advice to programmers is to make your web-apps' URLs human-readable and human-hackable. For example, wouldn't it be awesome if this story was at "http://kuro5hin.org/2001/3/12/qualities-of-a-good-url" instead of "http://kuro5hin.org/?op=displaystory&sid=2001/3/12/20643/1807"?

And, as other's have suggested, let http://kuro5hin.org/2001/3 be all the stories for March. It's analogous to when you hit a 404 error page and chop off the end of the URL to try to find a valid page.

I also urge you to drop the extensions from your URLs. At work, we had a pain of a time because we foolishly decided to switch from .html to .phtml. Don't do it! Use content negotation to pick the appropriate piece of content for you. Apache has mod_negotiation now (this means you don't need to use mod_rewrite to simply strip the extension, by the way, but it is still incredibly powerful for other things). ArsDigita also has an Abstract URL System which does the same thing, and is quite nice. Tim Berners-Lee apparently hates extensions on URLs, so do it for him, if not for your users.

Looks like he's following his advice perfectly (4.00 / 2) (#36)
by kzin on Tue Mar 13, 2001 at 08:42:18 AM EST

The URL consists of a plain section name and date of publication, just like your other examples. It's arguable that it would have been better if the date had slashes to seperate its components, but that's nitpicking. I fully agree with the rest of your post, though.

[ Parent ]
meaningless numbers (none / 0) (#88)
by Luke Francl on Tue Mar 27, 2001 at 07:59:24 PM EST

To me, the Alertbox URL looks like meaningless numbers. I think it'd be better if the title of the piece was encoded in the URL.

[ Parent ]
hyphens are also ugly (none / 0) (#49)
by kellan on Wed Mar 14, 2001 at 02:35:55 AM EST

i dislike hyphens, but find mixed case can make things easier to read. must be a matter of taste.(of course I'm an OO programmer)

however case should be a suggestion, not a requirement. it would be foolish from a usability reason to have two files with the same name, and different cases and expect a user to remember the difference. therefore why force then to remember the case of the objects in your url at all. (i speak about web apps only, none of this, i don't really remember anything about case ever windows crap)


[ Parent ]
Right! (none / 0) (#74)
by Luke Francl on Thu Mar 15, 2001 at 02:15:34 AM EST

it would be foolish from a usability reason to have two files with the same name, and different cases and expect a user to remember the difference.

Exactly! Even if you hate hyphens, I'm glad we agree on this. A number of studies have shown that people can't remember the difference between "File" and "file", especially when they have different things!

Sorry I can't provide a link right now, but I read a while ago that the Python guys were working on using Python for something they called "CP4E", or "Computer Programming For Everyone". They found that forcing people to remembering case-sensitive things was a real learning barrier.

XML has the defficency of being case-senstive, which really makes me mad. XML is great, but it seems like a real attempt to try to take the web back from ordinary people. Ugh, don't get me started on XSLT!

I think the Mac makes a good compromise here -- it preserves the case of the files for the user, but ignores case internally. So if a user meant "thatOneFile" but he types "ThatOneFile", he'll get the right thing.

[ Parent ]

WWW != Internet (none / 0) (#72)
by Bad Harmony on Thu Mar 15, 2001 at 12:04:46 AM EST

Forcing inclusion of "www." in your domain name. It is so annoying to type in a perfectly valid domain name and get an error.

Sometimes people forget that the web is not the same thing as the Internet. It is just another service, like mail or ftp. What is wrong with using www.foobar.com for a web server?

5440' or Fight!
[ Parent ]

bah (none / 0) (#73)
by Luke Francl on Thu Mar 15, 2001 at 02:09:18 AM EST

I've seen this argument before.

Technically, if we followed convention (ftp.cdrom.com, mail.foobar.com, news.umn.edu, etc), then webservers would be accessed by http://http.foobar.com.

They aren't, because that looks even stupider than "www", which as I'm sure you've noticed is an acronym which takes longer to say than the words it abbreviates (6 sylables versus 3 for "world wide web"). The indication that I want a website should be clear from the protocol I'm using to access it -- HTTP.

And in any case, the majority of the time people hit "foobar.com" with a web browser, they meant to go to "www.foobar.com" -- so take them there. It's a good guess, and if it's wrong, well, they can correct it.

[ Parent ]
Changing semantics is always problematic (3.62 / 8) (#23)
by anewc2 on Tue Mar 13, 2001 at 12:24:07 AM EST

According to the W3C, a URL has "explicit instructions on how to access the resource on the internet". You want to change the semantics of a URL from explicit instructions (for a computer) to a user interface (for a person). To get a superficial simplicity, you want to add complexity and more processing under the hood.

Maybe it's a philosophical difference, but this just seems to me like a Bad Thing®.

The world's biggest fool can say the sun is shining, but that doesn't make it dark out. -- Robert Pirsig

Hmm... (3.66 / 3) (#24)
by Holloway on Tue Mar 13, 2001 at 12:43:19 AM EST

Superficial simplicity... righto. These measures make a website more flexible and don't hamper the usability of URLs [as much] as present day URLs do.

These are as explicit as any other url - only more readable and printable == more usable.

More processing? Processing a string... yeah.. muchas overhead!

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

The real world (4.25 / 4) (#35)
by Simon Kinahan on Tue Mar 13, 2001 at 08:06:00 AM EST

URLs were meant to encode the location, and URNs were meant to provide a human usable name. However, URNs haven't happened, because the problem of creating a global namespace is politically hard. A global namespace implies only one person can own each name. While resolution standards exist, they get little use, probably for this reason. The fight over URNs would make domain name lawsuits look like a storm in a teacup.

Since there is no other way to identify a web page, people need to be able to use URLs.


If you disagree, post, don't moderate
[ Parent ]
You *can* have it both ways (5.00 / 1) (#58)
by Dion on Wed Mar 14, 2001 at 10:14:01 AM EST

There is nothing contradictory in "explicit instructions on how to access the resource on the internet" vs "user interface", there is absolutely no reason to make the computers language unreadable when you have a choice.

Ugly URLs must die there is no reason for ever having content on a site with get parameters in the url, the *only* time where parameters in the url are ok is when you are getting a certain result (NOT performing an operation, that is what POST is for) that wouldn't make any sense to ever link to from the outside (for example setting sorting order for the comments on a story: http://k5/2001/3/14/good-urls/comments/?sort=bogo)

Oh, and BTW:
GET is for GETting static content, a GET must *never* change the state of the server.
POST is for performing operations, never for following links or simply getting content
Webbrowsers must never re-perform a POST, but they may re-perform a GET all they want to.
This is important when you descide wether to use a GET or a POST action on your forms.

[ Parent ]
GET vs POST (none / 0) (#76)
by Luke Francl on Thu Mar 15, 2001 at 02:31:56 AM EST

The main problem I have with POST is that it breaks the back button in most browsers. Opera is an exception, but Opera is weird, and does all kinds of client-side caching stuff it probably shouldn't. Nine times out of 10, I love Opera, but that one time...man, I want to kill it.

I think the only reason to use POST is when you are submitting a large amount of data to a database, and GET can't handle it (I forget what the size limit on a GET is, I believe it varies from server to server).

[ Parent ]
Specific == Permanent (4.00 / 4) (#25)
by bjrubble on Tue Mar 13, 2001 at 12:44:29 AM EST

I love to see sites with URLs like K5's. They tell me that I can link to them without worrying that they'll disappear. A URL like "http://www.blah.com/news/cat_on_fire.html", while concise and descriptive, is usually a sure sign that the story will be archived in some other spot. And I'd still have to guess the case and word delimiters when typing it in later. Just as /job/Hamilton might be easy, but try /job/Johnson (Jonson? Johnsen? Johnstone?) or wait until there's another Hamilton. Easy-to-remember names come at the cost of ambiguity and naming conflicts. I almost never type out URLs but I copy and save them quite frequently, so I'll take URLs that look like crap over URLs that disappear, in a heartbeat.

:tcejbuS (4.00 / 3) (#28)
by Holloway on Tue Mar 13, 2001 at 02:46:41 AM EST

In truth I think that has more to do with amateur webmasters vs professional rather than anything to do with the URL. Amateurs use static files and rarely have a database whereas professionals wouldn't go without one for a big site. Conversely directories are often wiped clean but databases are left untouched during an upgrade.

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]
I agree (4.50 / 2) (#30)
by rusty on Tue Mar 13, 2001 at 03:10:53 AM EST

As of today, a URL means nothing, basically. Say you see something like http://www.foobar.com/home/hello.html. What is that? A static HTML page? Not necessarily. It's easy to write a mod_perl handler that will take that URL and translate it to "find all pictures with topic "me" in database "myhomepage" and format them like so..." and print out the result as html. Which means that there's no reason that if you even reload the page immediately, that seemingly innocent html page won't have totally different content.

I agree with Holloway, that a URL is an abstract thing. And the flip side of the above example is that if I'm a good webmaster who cares about the net, I can easily make sure that I never have to break a link, ever. *Any* URL can be internally hacked to serve something other than what it originally pointed at, and can be maintained to serve what it originally served, even if your site has changed layout or structure. Keeping links working is a hallmark of a professional admin, vs. someone who doesn't really care (or just doesn't know enough about the subject).

Not the real rusty
[ Parent ]

URLs and Frames (4.00 / 3) (#27)
by Mabb on Tue Mar 13, 2001 at 02:17:51 AM EST

Editorial request: can some kind ed please get rid of the [repost] in the title?

Simple URLs are important for deep linking (ie linking to the actual page with the content you're referring to, rather than the home page or generic topic page). Deep linking is one of the best things about the web and should be supported by the web's back-end (server software, protocols etc).

FRAMES ARE EVIL and subvert deep linking and bookmarking. It's well known in the usability community that Frames Suck, most of the time. To quote Jakob Nielsen from the previous link: "The fundamental design of the Web is based on having the page as the atomic unit of information, and the notion of the page permeates all aspects of the Web. The simplicity of the original Web contributed to its ease of use and its rapid uptake." Frames break that unified model.

They are certainly not any kind of answer to long and/or complex URLs... They can completely bamboozle the person who wants to link to or bookmark a specific piece of content. And should a clever person work out the link to your content frame, they'll find it hard to navigate to other areas of your site unless you've repeated navigation elements within that frame. So you lose the chance to draw them into the rest of your site.

Holloway uses a good example of deep linking to link to the detailed page about mod_rewrite. The details were not necessary for the comprehension of his article, but the link was there for the curious. What a pain it would be if he could only link to the apache home page and you had to search for the term. Would you bother? Would it piss you off?

In addition, URLs are often quoted in printed matter - long and/or complex URLs with lots of =#%& type characters are prone to typos. Another good reason to simplify them.

As for the issue of dynamic content, having the content as changeable shouldn't automatically mean that the URL needs to change as well. A page composed of various elements, a few of which are dynamically loaded based upon context often use a template system for layout and each component in the template points to variables that call content from a database. I worked on a site (as project manager) based on Vignette Story Server, and we had to use the URL (we called it a cURL, c being for context) to load the correct content into the correct layout templates. It would have been a terrific long-term goal of this site (Medweb) to simplify the cURLS in a similar way to what Holloway describes. In terms of usability and maintainability, it would have been a big bonus for this site, which syndicates and co-locates content with other sites.

On a personal note, I HATE having to add .php or .php3 to my URLS! It's ugly and I am forced to create home pages without PHP code so I can at least have a decent home URL. grrr! If there's already a way around that, please let me know.

QuiltBlog: WIP, SEX, WOW, MQ, LQS, HST...

Frames (2.00 / 1) (#39)
by delmoi on Tue Mar 13, 2001 at 11:21:36 AM EST

I get so annoyed by this. Frames do not need to suck. And modern browsers allow you to go 'back' and 'forth' the way you normaly would without frames. Jakob Nealson also thinks that you should break content up into seperate 'pages' so you have to wait another 10 seconds for the next 'page' to load while your trying to read something over a modem.
"'argumentation' is not a word, idiot." -- thelizman
[ Parent ]
Frames are adept at sucking. (none / 0) (#67)
by Minuit on Wed Mar 14, 2001 at 03:49:11 PM EST

I agree with some of what you are saying. Frames do not need to suck. It just so happens that they generally do.

Sure, browsers allow you to go 'back' and 'forward' within framesets, but the way that most framed sites are designed makes it impossible to 'bookmark' them. You know, that little button that lets me mark the exact page that I'm on so that I can go directly back to it later, or send the address to someone else so they can see what I'm seeing.

You get annoyed at ten second waits? Well so do I. I get annoyed at reading a mailing list archive and having to wait that long to load all the extra site navigation and news frames[1]. I get annoyed at having to tell people "Okay, go to http://lamesite.com/, click on 'community', then 'events', then 'next' twice and you'll find what you're looking for" instead of "Go to http://sanesite.com/events/august/".

These are things that I find annoying, and they are all caused by careless use of frames. So yes, frames don't need to suck. But I have yet to see an example of something which requires frames and doesn't suck.

I'm open to enlightenment, but I'm not holding my breath.


[1] Actually, I don't get annoyed at this because I have scripting turned off. Javascript is arguably more evil than frames.

If you were my .sig, you would be home by now.
[ Parent ]

inframe navigation (none / 0) (#83)
by delmoi on Sun Mar 18, 2001 at 01:28:34 PM EST

Sure, browsers allow you to go 'back' and 'forward' within framesets, but the way that most framed sites are designed makes it impossible to 'bookmark' them. You know, that little button that lets me mark the exact page that I'm on so that I can go directly back to it later, or send the address to someone else so they can see what I'm seeing.

I don't know about some of the Linux browsers, but in IE you can right click on a page in a frame and do "open in new browser window" and bookmark that, or right click and select the "properties" menu option, and cut'n'paste the URL from there for online transfer.

"'argumentation' is not a word, idiot." -- thelizman
[ Parent ]
i like aphex twin (none / 0) (#84)
by Holloway on Mon Mar 19, 2001 at 12:15:28 AM EST

I assume they mean that there is no unique resource locator for a combination of framesets. The first one has an URL - but in most cases the URL doesn't necessarily reflect the content you're looking at anymore.

Or, if there is, it's only when the server has gone to some special effort to reload over the top for each link clicked.


== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

[ Parent ]

index.php (3.00 / 1) (#42)
by Jagged on Tue Mar 13, 2001 at 12:45:05 PM EST

On a personal note, I HATE having to add .php or .php3 to my URLS! It's ugly and I am forced to create home pages without PHP code so I can at least have a decent home URL. grrr! If there's already a way around that, please let me know.
I just happen to be teaching myself PHP and figured out how to prevent that last night using Apache.

Look for DirectoryIndex in the httpd.conf (or other conf) file and add any filenames you want such as index.php to the list.

Now just create a file with that name in each directory and a URL such as http://www.example.com/foo will redirect to http://www.example.com/foo/index.php but the browser will only show the first URL. All references in the PHP script to itself should be written as ./ to preserve the aliasing.

[ Parent ]
Another option... (5.00 / 1) (#62)
by TrentC on Wed Mar 14, 2001 at 11:28:46 AM EST

...would be to add ".html" as a filetype that is read by mod_php. Use this:

<IfModule mod_php4.c>
AddType application/x-httpd-php .php4 .php3 .phtml .php .html
AddType application/x-httpd-php-source .phps

This way, all HTML files are run through the PHP parser. I dunno what kind of performance hit it entails, though...

Jay (=

[ Parent ]
Use this, or... (none / 0) (#75)
by Luke Francl on Thu Mar 15, 2001 at 02:25:38 AM EST

Disclaimer: I use PHP at work. I haven't done any scalability studies though. I think PHP could handle parsing all .html as PHP. It would have to have a pretty pathetically weak parser if it didn't check to see if there was PHP code in the page before doing any massive calculations (oh the other hand, if you have an auto-prepend file which is doing a lot of work, this might not be such a hot idea. I would recommend this, because you can then use all kinds of nifty stuff on HTML pages *later* if you so desire. On the other hand, if you've got a recent Apache install, it seems to be compiled with mod_negotiation, which does content negotation -- allowing you to strip the filename extension entirely. So, to hit /foobar/baz.php, your user goes to /foobar/baz, and the server fetches the correct file. It's pretty cool, and I love the extensionless URL look!

[ Parent ]
The single biggest mistake... (4.33 / 6) (#29)
by rusty on Tue Mar 13, 2001 at 02:58:41 AM EST

The ONE major thing (as opposed to the million minor things) I wish I hadn't done when first designing Scoop was that "?op=blah;sid=crap"... It constantly annoys the hell out of me now, especially since there's no reason at all that those arguments have to be passed in like form data. That was, unfortunately, one of the first things I had to decide on, and, not having a good grasp of the problem space yet, I just went with what was familiar (i.e. slashdot). Now I wish I had just done it via a pseudo-directory structure (like Advogato's lovely "/person/kuro5hin" system).

I will eventually write a translator that will convert between arguments and a tree structure, but I have a bunch of rather more pressing stuff to do already. You can use mod_rewrite to massage it into a nicer shape, but it's kind of half a solution.

PS: If anyone wants to send me a patch to do this, I'd be tickled. :-)

Not the real rusty

But does that work for "hosted" sites? (3.33 / 3) (#44)
by kostya on Tue Mar 13, 2001 at 02:23:35 PM EST

I was reading this article with interest, as I am working on some site code and have used the "?op" thing as well. My problem is that I have my web services hosted for me--on a massively multi-homed webserver. So I don't have the option of mapping urls to scripts via mod perl because I can't touch the Apache file.

Is there something I'm missing? Is there a way to do what you are saying with straight CGI scripts? I suppose I could do something crazy like:




Would that work?

But frankly, even writing that gives me the willeys. It would be like a whole new social hack for redirecting people to pages. Icky. But then, it does possess a cleverness :-)

Veritas otium parit. --Terence
[ Parent ]
not so easy if you don't have root (5.00 / 1) (#46)
by rusty on Tue Mar 13, 2001 at 04:57:00 PM EST

If you can't use mod_perl and fool with the server, it's somewhat harder to do tricky path things like that. In a mod_perl environment this works by assigning a Location to a Handler, which will get all requests at or under that URL path. With straight CGI, you'd have to do something like what you suggest, which probably isn't much of a win, overall, considering the extra code you'd need to use to make it work.

Not the real rusty
[ Parent ]
Not quite that limited (5.00 / 1) (#47)
by panner on Tue Mar 13, 2001 at 07:43:52 PM EST

With CGI, you can still do something similar with paths. Look at basically any of the many env dump scripts (apache comes with one to test with). You may or may not see an env called PATH_INFO, but it'll be there. Just add something like "/test" to the end of your URL, and you'll see PATH_INFO="/test", and also a PATH_TRANSLATED which may or may not help you (on mine, test translates to DOCUMENT_ROOT/test.pl, but I think I have mod_speling installed :)

Keith Smiley
Get it right, for God's sake. Pigs can work out how to use a joystick, and people still can't do this!
[ Parent ]
IE (3.50 / 2) (#34)
by gregholmes on Tue Mar 13, 2001 at 06:58:08 AM EST

Last time I checked, IE ignored the content type and looked at the extension. For example, a page that serves a generated Excel file (application/msexcel) caused my browser to open a garbage file in ColdFusion Studio because the script had a .cfm extension!

Yes, that is IE's problem, but filetype is a good fallback (or can be).

more like a server problem, actualy (3.00 / 1) (#37)
by delmoi on Tue Mar 13, 2001 at 11:06:32 AM EST

I'm pretty sure IE goes by content type most of the time, unless it can't determine it. Of course, it's just as likely that your server was screwed up. After all, how many users are going to have Cold Fusion up and running on their desktops for IE to redirect to?
"'argumentation' is not a word, idiot." -- thelizman
[ Parent ]
I thought I read somewhere... (5.00 / 1) (#38)
by nstenz on Tue Mar 13, 2001 at 11:12:53 AM EST

...that IE's content-type/extension code didn't handle types in the manner reccomended by the standards or something... In some instances it uses the extension of the URL rather than the content-type. If anyone has more info on this, please post it.

By the way, I don't think IE for Macintosh had this problem (the whole extensions thing really isn't an issue there, now is it?).

[ Parent ]
maybe in here? (none / 0) (#41)
by gregholmes on Tue Mar 13, 2001 at 12:33:42 PM EST

The answer is probably in here, but I don't have time to read through it now!

[ Parent ]
Exactly what I was looking for... (none / 0) (#87)
by nstenz on Fri Mar 23, 2001 at 04:51:20 PM EST

The document basically says this about MIME types and IE handling them:
  1. If the server returns a content-type, IE will remember that.
  2. However, IE also runs the beginning of the stream through a 'buffer check' to verify whether the data actually looks like the content-type being passed to it. If IE thinks the content-type is invalid, it'll just ignore it and do whatever else it can to parse the data, including falling back on the extension of the URL given.
I'd be inclined to trust the server's content-type first. I think MS screwed this one up.

[ Parent ]
No It doesn't (none / 0) (#86)
by OmniTurtle on Wed Mar 21, 2001 at 04:56:50 PM EST

No as far as I can gell IE totally ignores the content type. I've got a portion of a site where users can download a comma delimited file for easy import into excel. The file is served by a php script, but try getting IE to recognize the .php file as a .csv and pop up it's little dialog box to open in excel! I fought it for a couple of days with no luck, and finally changed the php script to .csv and then told apache to treat .csv as a php script. Now it works great, but I still can't believe IE is so braindead.

There was a post on Bugraq a few months ago talking about how IE will even examine the first bits of a file to determine it's type if an extension is missing.. so It's theoretically possible to make the first few bytes of an image be some nasty active x and get the browser to execute it somehow. (I didn't follow it to closely sorry)

[ Parent ]
But . . . but . . . (5.00 / 1) (#89)
by regeya on Wed Dec 04, 2002 at 12:43:30 AM EST

Aw, hell, insert an sarcastic remark about IE's supposed perfect standards compliance here.

[ yokelpunk | kuro5hin diary ]
[ Parent ]

picture-rate.com (3.66 / 3) (#40)
by delmoi on Tue Mar 13, 2001 at 11:31:30 AM EST

This is kind of an aside, but when some of my friends were trying to pass around picture-rate.com URLs using the URL in the address bar of the browser, rather then the one underneath the picture. The problem was, if someone tried to load that URL they would get a different random picture. So I changed the URL to this http://picture-rate.com:8 080/hello/ui.jsp?urlmsg=DONT_USE_T HIS_URL_USE_THE_ONE _BELOW_THE _PICTURE_THAT_SAY S_DIRECT_LIN K_THNX &random=48363& rating=76.92654256839103.
The urlmsg= parameter is totally ignored. Eventually I'll set it up so that that the address URL will work for referencing specific pictures, eventually. But right now my page doesn't even work in mozilla. it's getting closer though.

Of course, this is totally what you're against, but I just thought it was kind of funny :P
"'argumentation' is not a word, idiot." -- thelizman
Zope URL's (4.00 / 3) (#43)
by dragondm on Tue Mar 13, 2001 at 01:02:05 PM EST

This is one of the things I LOVE about the app-server Zope. Url's on zope sites are in general, quite intelligable and 'clean'.

Go to zope.org (or most any Zope based site) and look at their url's

What have you got against CGI? (2.00 / 1) (#48)
by hotcurry on Tue Mar 13, 2001 at 09:46:08 PM EST

It works, it's a standard, it's supported by browsers, and it's not all that much longer than what you propose.
I have no quarrel with your other points.

Whats to like? (none / 0) (#60)
by kellan on Wed Mar 14, 2001 at 11:13:23 AM EST

I assume you mean the CGI standard as opposed to CGI implementations? (Because your standard CGI environment of spawning perl interpretters per request has some pretty obvious problems)

I'm also unclear on what you mean by its supported by browsers. What does browser support have to do with this?

So I can only assume you are talking about the

Well first its not a standard, it a convention. The ? is standard, everything after that is just a convention that some guy decided upon when he was hacking together NCSA httpd. And its not even a great convention as we were just discussing over on scoop, because the & servers double duty as the HTML(and XML) entity indicator. So besides the fact that is kind of crappy convention. There are just the problems already laid out in the article, confusing, technology and implementation specific, etc, etc.


[ Parent ]
Complex URLS and pages (1.33 / 3) (#70)
by slothman on Wed Mar 14, 2001 at 05:52:26 PM EST

I don't like when URLS and also web pages are complex. Why do we need a million variables to access a web page? It should be just text,hyperlinks,java/javascript(maybe) and a few buttons and drop-down lists. Automatic things like advertisments that change every time are annoying. The URL should be http://www.place.com/dir1/dir2/file.ext. It should be just like a directory. Not a function!

It's completely subjective. (4.00 / 2) (#78)
by mindstrm on Thu Mar 15, 2001 at 07:20:23 AM EST

It really is. The qualities of a good url? A good url for what? Is the URL intended to be memorized by people, or is intended to be dynamic, and simply clicked on. Extensions? Those are there by convenience, when we use a web-server like apache to map directory structures onto url's.... custom apps can behave completely different. The piont is.. there is no 'good' or 'bad' url.. only one that is good or bad for a particular task. A url most certainly CAN be a command line, if that's the purpose of the resource being accessed.

"Cool URIs don't change" by TBL (4.00 / 1) (#79)
by kellan on Thu Mar 15, 2001 at 02:20:46 PM EST

Don't know if this has been posted yet or not, if it hasn't it should have been already.

Its a note, Cool URIs don't change by Tim BL. It shoots down the silly conversations about URL vs. URN, lays out why your URLs should hide their implementation, and other useful concepts. Worth reading.


The URL as UI (4.00 / 1) (#80)
by Captain Napalm on Thu Mar 15, 2001 at 04:45:31 PM EST

I've done some work on this issue over a year ago (late 1999 to be exact) after reading Jakob Nielson's URL as UI as well as reading up on Ted Nelson's Xanadu.

The sample document I did was the King James Bible and my intent was more than just simply serving up Bible verses or chapters, but to allow one to pull out arbitrary portions (well, not quite arbitrary---there are some limits) of the Bible using the URL as the parameter, as it were.

So while you can get the beginning and ending of the Bible, you can also request (if you know where it's located) the story of Noah's Ark. If you check the URLs, you will see that to retrieve what you want, it's http://bible.conman.org/kj/book.chapter:verse-chapter:verse (generally). Check out the site for more information on that particular project.

I've also been working on pulling up date based pages (like an online-journal or weblog) using the dates as part of the URL, and again, allowing the use of ranges to retrieve requested information. So that 2001/3/4 will pull up all entries for March 4th, 2001, while 2001/3/4-15 will pull up all the entries for Marth 4th through the 15th and 2001/3 for all of March and 2001/3-4/8 all entries Between March 1st and April 4th (if they exist).

It's been interesting work and I do want to expand it to other works, such as Shakespeare (with a hypothetical link of say http://literature.conman.org/Shakespeare/Hamlet.III.i.56-90). The intent of the work is to use the commonly used notation for the work in question (if possible).

Process to decide an URL structure. (none / 0) (#81)
by Holloway on Thu Mar 15, 2001 at 09:07:17 PM EST

If anyone's interested, this is the process I use to decide an URL structure:

Anything with a finite set of parent/child relationships such as /shop/books/0-932592-00-7 or, say, a historical temperature site temperature/2001/1/23/16:43 benefits from being in a pseudo-directory structure. By finite I mean only /2000/1/1/temperature. When you have a finite number of flat data structures it suits a pseudo-directory structure.

Anything with multiple (especially unknown numbers of) parent/childs suits a ?a=blah/blah/blah&b=blah/blah/blah URL structure. Anything with a completely unknown variable name suits a ?weather=fine structure (as you can't determine the variable name from the order of parent/child). This would be the one situation where an URL is a command line and the type of site requires it. Finally, anything with an unknown number of flat data structures suits ?a=one&b=two&c=three.

== Human's wear pants, if they don't wear pants they stand out in a crowd. But if a monkey didn't wear pants it would be anonymous

What about sessions? (none / 0) (#82)
by reftel on Sun Mar 18, 2001 at 03:51:54 AM EST

So, if we are to stop using the URL as a command line, then how is one supposed to keep track of sessions? Force cookies on users or use ugly javascript hacks to use form POSTs for every link?

Yes... (none / 0) (#85)
by Mad Weezel on Mon Mar 19, 2001 at 02:41:19 PM EST

seriously, if a site needs to or wants to track its users, it should use something along the lines of a cgi post or a cookie. even kiro5hin.org uses a cookie to track loged in users, no? i think what the author is trying to say is that much of what is in a url ex. http://www.kuro5hin.org/?op=section;section=tech un-nessisary for a person who just wants to see the topics in say technology, and so could be shortended to http://www.kuro5hin.org/tech/. if you need to track sessions, do so by covert methods, and dont make it part of the UI.

[ Parent ]
Qualities of a good URL | 89 comments (81 topical, 8 editorial, 0 hidden)
Display: Sort:


All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!