Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Apache mod_rewrite Primer

By kpaul in Internet
Sat Aug 02, 2003 at 10:37:33 AM EST
Tags: Software (all tags)
Software

"The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail."

-- Brian Behlendorf
Apache Group

Yes, mod_rewrite for Apache is a powerful tool that offers a lot of practical solutions for your webserver. Beyond that, the module can also help search engines find and spider your site. With that power comes danger, though. (Throwing your pages into an endless loop, for example.) In this piece I'll go over why mod_rewrite is useful (especially for search engines), how to set it up on your server, and my personal experience with using it for the first time.


Contents:

  • Why Use It?
  • Setting it Up
  • My Experience



    Why Use It:

    From a search engine optimization point of view, mod_rewrite is a great tool to help your sites get more pages indexed in more search engines. For example, although search engine spiders are getting better at it, some still have a hard time with dynamic pages.

    Take this filename:

    file.php?id=12&section=23&template=45&page=3

    Forgetting the notion that it's not good to use the variable id as some spiders think it's a session id, many search engines have a hard time crawling these types of pages. As I said, they're getting better at it, but what if you could help them crawl your site in the meantime?

    Enter mod_rewrite. With the module you can rewrite the above URL into something like:

    /id/12/section-23/template-45/page-3.html

    Or, even better:

    /keyword/section-name/file.html

    You can see where I'm going with this, I hope, and why it might help your pages rank in the search engines.

    The module is also useful for a variety of other tasks; managing a lot of virtual hosts, dealing with the trailing slash problem on URLs, and even stopping people from using images from your server on their site (running up your bandwidth).

    Setting it Up:

    As you may know, you can compile Apache with different modules, adding functionality to your webserver. You can either add it as a static module (at compile time) or a as a dynamic module that can be added without having to recompile Apache.

    As it is a potentially dangerous tool if you don't know what you're doing, I recommend working with it initially on a test server or sandbox area somewhere if you can. If you plan on using it on your live webserver, you'll need to check if your current hos offers the feature.

    A quick email to support should answer the question. Or, if you have command line access (and access to Apache's conf file), you can peek at the file to see if the LoadModule command for the module is commented out or not.

    Once you have an Apache server up and running with mod_rewrite installed, you may have to load the module in your conf file and restart Apache if it's being run as a dynamic module.

    You can either put the Rewrite directives in your httpd.conf file or in a .htaccess file. (If you use the latter, you may have to change the Apche conf file to allow the .htaccess to override any rules in the conf file.)

    This may be one pitfall if you're not familiar with Apache's conf file. You need to understand the order in which different rules and directives are applied by the webserver. One thing I ran into at first was having [AllowOverride None] in my httpd.conf file for the directory where I was using a .htaccess file.

    The above directive tells Apache to ignore any .htaccess file you may have on your server. You can also add the rewrite code directly in your httpd.conf file, but again you need to be aware of what overrides what.

    Also, it's probably a good idea to include the following lines in your httpd.conf file in a container like <Directory> or <VirtualServer>:

    RewriteEngine On
    RewriteLog /var/log/apache/rewrite.log
    RewriteLogLevel 9

    This will log what's happening behind-the-scenes when mod_rewrite is in action. It is of great help when you're not getting the expected result and you can't figure out why.

    Which method (.htaccess or httpd.conf) is better? Most likely putting it in the conf file, but the difference may not be enough if you're really looking for bottle-necks. You can put rewrite code in both places, but again be careful of what overrides what.

    My Experience:

    Or, what I learned in the last few days about mod_rewrite.

    I'd been meaning to start using mod_rewrite and had done some initial reading on it, but my current server didn't have the capability. So, I put it off, trying some other manual techniques to help search engines spider my dynamic content more easily.

    About a month ago, I started planning a migration of my site (with many sub-sites on it) to a larger server. Traffic was growing at a steady rate and I knew I would have to do it eventually. With a lot of my 'very first code' on the site, though, it took a little bit of work to get the site set-up on the new server.

    The move was worth it, though, because on the new server, I had the ability to start and stop Apache (among other things) and I finally had access to mod_rewrite.

    First off was a simple test. I created a .htaccess file with vi:

    #start the engine
    RewriteEngine on

    #this needs to be set to allow the URL manipulation
    Options +FollowSymlinks

    #the directory the manipulations will take place on
    RewriteBase /

    # mod_rewrite allows you to use Regular Expressions
    # to define the manipulations

    RewriteRule ^/foo.html /bar.html [R,L]
    # the above looks for a file called foo.html
    # and any request for that file gets redirected
    # to bar.html

    The above is a really simple example to see if mod_rewrite is actually working in a .htaccess file. If you're wanting to use it to help with search engine indexing, you'll have to take it a little further.

    For it to work, you have to change your application to construct links as static files rather than dynamically. I have an online app and on one page there's a link to another section.

    In the original, the link in the code had a '?' and variables after the filename. I had to change this to what I wanted the resulting files to look like in the browser's address bar.

    Once I did this, I had to write some mod_rewrite rules to translate any requests for:

    /my-app/my-section/my-page.html

    Into:

    /yourfile.cgi?section=widgets&page=13

    That's what some newbies to the whole concept don't understand - not only do you have to have mod_rewrite doing something, your application also has to change the way it links within the script. This concept confused me a little at first.

    Next came the decision on whether to include the rewrite rules in a .htaccess file or in the httpd.conf file. If you think about it, you can see how having it in the .htaccess file might cause more overhead, but some have said the loss is minimal. I'm testing this for myself currently.

    It looks like the server is holding up now, but as the search engines eventually find the new content, I'll be keeping a close idea on how much they grab and what affect they have on the performance of my server.

    Mod_rewrite has a high learning curve, and it may not be for people who don't have a lot of scripting or unix experience. If you have time to look into it, though, it's a very powerful resource (in many aspects) that's worth the time it takes to figure it out.

  • Sponsors

    Voxel dot net
    o Managed Hosting
    o VoxCAST Content Delivery
    o Raw Infrastructure

    Login

    Poll
    Mod_rewrite?
    o You should've used it on this article! 13%
    o Love it! 27%
    o Google loves it. 20%
    o I shot myself in the foot with it (see below) 0%
    o I shot myself in the foot with it but I don't want to share the experience. 0%
    o Hate it. 3%
    o You are not ready for that kind of power, grasshopper. 10%
    o Back away from the SSH client... 24%

    Votes: 29
    Results | Other Polls

    Related Links
    o mod_rewrit e for Apache
    o practical solutions
    o Why Use It?
    o Setting it Up
    o My Experience
    o search engine optimization
    o search engine spiders are getting better
    o help your pages rank
    o search engines
    o compile Apache with different modules
    o looking for bottle-necks
    o Also by kpaul


    Display: Sort:
    Apache mod_rewrite Primer | 52 comments (40 topical, 12 editorial, 0 hidden)
    You are not a beautiful and unique snowflake (1.85 / 27) (#6)
    by egg troll on Fri Aug 01, 2003 at 12:51:34 AM EST

    Like everyone else, I had become a slave to the IKEA nesting instinct. If I saw something like clever coffee table sin the shape of a yin and yang, I had to have it. I would flip through catalogs and wonder, "What kind of dining set defines me as a person?" We used to read pornography. Now it was the Horchow Collection. I had it all. Even the glass dishes with tiny bubbles and imperfections, proof they were crafted by the honest, simple, hard-working indigenous peoples of wherever.

    He's a bondage fan, a gastronome, a sensualist
    Unparalleled for sinister lasciviousness.

    What are you trying to say? (2.83 / 6) (#22)
    by Verbophobe on Sat Aug 02, 2003 at 01:15:40 AM EST

    You sound troubled, young grasshopper.

    Proud member of the Canadian Broadcorping Castration
    [ Parent ]
    URLs versus form variables (4.80 / 5) (#9)
    by swr on Fri Aug 01, 2003 at 03:38:56 AM EST

    Take this filename:
    file.php?id=12&section=23&template=45&page=3
    Enter mod_rewrite. With the module you can rewrite the above URL into something like:
    /id/12/section-23/template-45/page-3.html

    If you just want a URL that doesn't include ? and & characters, you can use a URL like so:

    http://www.example.com/somepage.php/your/arguments/here

    Because somepage.php is a file and not a directory, Apache will know to use it as a .php page instead of trying to descend into it. The "/your/arguments/here" goes into the PATH_INFO environment variable (as per the CGI spec), which you can access in PHP or whatever other scripting language you use.

    Most URLs that contain ? and & are the result of HTTP GET form submission, so I'm not really sure what you're trying to do here, as most spiders don't fill out forms.



    Form submissions (5.00 / 1) (#10)
    by gazbo on Fri Aug 01, 2003 at 06:11:24 AM EST

    Your first point is valid - you can just use the remaining "path" as parameters to your script. Your second point is off, though. Consider a news page, that has a list of headlines by date, hyperlinked to the full story. The hyperlink would likely be of the form http://mysite.com/news.php?storyid=7.

    Now sure, you could rewrite this as http://mysite.com/news.php/7, but it's still a perfectly reasonable use of GET parameters for a non-form purpose.

    -----
    Topless, revealing, nude pics and vids of Zora Suleman! Upskirt and down blouse! Cleavage!
    Hardcore ZORA SULEMAN pics!

    [ Parent ]

    agreed (5.00 / 1) (#12)
    by ph317 on Fri Aug 01, 2003 at 02:03:56 PM EST


    I hate the 90% of cgi writers who end up using urls in the form of "/cgi-bin/myserver.cgi?asdf=1&e345FGH==#U^*RTG:dhkftho5j698hj9dtjohijdgfh... .".  In all my cgi coding, I've always felt it best to strive for simple urls.  You don't need mod_rewrite to accomplish this - just name your stuff correctly for what its functionality to the user, and use POST rather than GET, and make use of PATH_INFO.  For example, the URLs for one system I developed look like:

    https://www.mycompany.com/TheApplication/login
    https://www.mycompany.com/TheApplication/post_message
    https://www.mycompany.com/TheApplication/search_messages
    etc....

    There's no need (and much harm) in putting arguments in the URL itself, or exposing anything about how the request is being processed (.cgi, .asp, .pl, etc...).

    [ Parent ]

    GET is good (5.00 / 2) (#16)
    by Cloaked User on Fri Aug 01, 2003 at 06:02:47 PM EST

    GET and GET-type URLs (or the path-style ones, of course) should be used for any page that you want people to be able to bookmark. That's not possible with POST requests - at best, you'll get a copy from your browser cache, which may well not be what you want.

    Aesthetic arguments aside, GET should be used for requests that don't change anything, POST for ones that do. The resulting URLs may not be as pretty, but at least they're bookmarkable. In your example, login and post message should be POST, but search should probably be a GET.
    --
    "What the fuck do you mean 'Are you inspired to come to work'? Of course I'm not 'inspired'. It's a job for God's sake! The money's enough and the work's not so crap that I leave."
    [ Parent ]

    Of course (5.00 / 1) (#31)
    by ph317 on Sat Aug 02, 2003 at 09:15:02 PM EST


    When I said ise POST instead of GET, I was referring to requests that involve actual user input.  Actually in the example I was citing above, all the URLs operate in both POST and GET mode.  For the "login" page, an initial "GET" gets you the login screen, the login button on the screen does a "POST" to the same "login" page, which then decides if you're a valid user or not and redirects you to the next page you need to hit.  Similarly when you GET the search page you get a search form, the submit button on the search form does a POST back to the same search URL.. etc..

    [ Parent ]
    handy, but be careful. (none / 0) (#37)
    by pb on Sun Aug 03, 2003 at 11:06:22 AM EST

    I use that technique as well; just note that (a) in Apache 2 you have to explicitly turn PATH_INFO on (which is annoying), and (b) technically, '/' is not a valid character in the query string (I think it's reserved); therefore, what you're doing is technically legal (because apache understands it), but if you had said http://www.example.com/somepage.php?path=your/arguments/here then that would technically not be legal. Some versions of Mozilla had problems with this; eventually they got more forgiving because IE is too, and a strict interpretation of the spec here just causes more problems for people.

    ---
    "See what the drooling, ravening, flesh-eating hordes^W^W^W^WKuro5hin.org readers have to say."
    -- pwhysall
    [ Parent ]
    Re: handy, but be careful. (none / 0) (#49)
    by swr on Tue Aug 05, 2003 at 09:40:09 PM EST

    technically, '/' is not a valid character in the query string (I think it's reserved); therefore, what you're doing is technically legal (because apache understands it), but if you had said http://www.example.com/somepage.php?path=your/arguments/here then that would technically not be legal.

    Really? That seems odd... I would think the first question mark would denote the start of GET form variables. If (in the unlikely event) the actual file path contained a question mark I expect it would need to be URL-encoded.



    [ Parent ]
    ...like I said... (none / 0) (#50)
    by pb on Wed Aug 06, 2003 at 12:23:25 AM EST

    The query string == the GET form variables

    You'd have to URL-encode any question marks in the file path, yes. But my point was, that you'd also want to URL-encode any slashes in the query string.
    ---
    "See what the drooling, ravening, flesh-eating hordes^W^W^W^WKuro5hin.org readers have to say."
    -- pwhysall
    [ Parent ]

    Search engines (5.00 / 7) (#13)
    by ucblockhead on Fri Aug 01, 2003 at 02:50:24 PM EST

    Beyond that, the module can also help search engines find and spider your site.
    My favorite use is to prevent certain search engines from spidering your site. For example, there's a spider out there that collects email addresses for spammers. It has a referrer string of "Microsoft URL Control". (It has nothing to do with Microsoft.) I use a couple lines so that when it attempts to spider my site, it gets a "403 forbidden":

    RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control"
    RewriteRule .* - [F,L]

    In a similar manner, the following lines cause anyone browsing my site from an RIAA or MPAA computer to get an entirely different page:

    RewriteCond %{REMOTE_ADDR} "^12\.150\.191\." [OR]
    RewriteCond %{REMOTE_ADDR} "^63\.199\.57\." [OR]
    RewriteCond %{REMOTE_ADDR} "^64\.166\.187\." [OR]
    RewriteCond %{REMOTE_ADDR} "^64\.241\.31\." [OR]
    RewriteCond %{REMOTE_ADDR} "^65\.244\.101\." [OR]
    RewriteCond %{REMOTE_ADDR} "^66\.252\.128\." [OR]
    RewriteCond %{REMOTE_ADDR} "^67\.112\.252\." [OR]
    RewriteCond %{REMOTE_ADDR} "^67\.125\.49\." [OR]
    RewriteCond %{REMOTE_ADDR} "^81\.4\.78\." [OR]
    RewriteCond %{REMOTE_ADDR} "^146\.82\.174\." [OR]
    RewriteCond %{REMOTE_ADDR} "^198\.70\.114\." [OR]
    RewriteCond %{REMOTE_ADDR} "^208\.192\.0\." [OR]
    RewriteCond %{REMOTE_ADDR} "^208\.209\.2\." [OR]
    RewriteCond %{REMOTE_ADDR} "^208\.225\.90\." [OR]
    RewriteCond %{REMOTE_ADDR} "^208\.229\.253\." [OR]
    RewriteCond %{REMOTE_ADDR} "^208\.49\.164\." [OR]
    RewriteCond %{REMOTE_ADDR} "^208\.50\.66\." [OR]
    RewriteCond %{REMOTE_ADDR} "^212\.241\.48\." [OR]
    RewriteCond %{REMOTE_ADDR} "^217\.228\.123\."
    RewriteRule .* /noriaa/index.html [R,L]

    -----------------------
    This is k5. We're all tools - duxup

    putting content identifiers in the query string (4.50 / 2) (#15)
    by zzzeek on Fri Aug 01, 2003 at 05:43:46 PM EST

    to all those developers who still write dynamic pages that reference a fixed content database based on data in the query string (i.e. content.pl?contentid=25473) as opposed to constructing a "virtual" directory structure (i.e. /content/25473/ where /content/ is an alias to a mod_perl/servlet/etc), all i can say is: hello! its not 1996 anymore ! please learn to parse path information.

    that is all.

    what difference does it make?? (nt) (4.00 / 1) (#32)
    by speek on Sat Aug 02, 2003 at 09:39:35 PM EST


    --
    al queda is kicking themsleves for not knowing about the levees
    [ Parent ]

    clean code (5.00 / 1) (#34)
    by zzzeek on Sun Aug 03, 2003 at 02:09:14 AM EST

    when you are displaying content from a fixed content database (i.e., the end users arent dynamically changing the content they receive) youd like the outside world to see your site as organized into a clean hierarchical directory/file structure, for aesthetics as well as for search engines, web statistics applications, server side caching applications, and to allow meaningful URL schemes with a minimum of meaningless ID numbers (witness K5 as an example).  browsers act differently with query strings with regards to client side caching, as they assume the content is not static and is based on some kind of user input (as query strings were intended).

    so assuming you dont want to display query strings to the world, having your code be dependent on exotic mod_rewrite rules in apache establishes a dependency on having to use apache in the first place, and also spreads out your application's logic within two different programming contexts.  remember that your dynamic application, when it outputs URLs, has to output URLs in the same style as those that mod_rewrite would be parsing, so the logic for understanding this URL scheme must live in two different places, which impacts maintainability.  also, a good dynamic URL scheme can also indicate things like subcategories, channels, etc., which makes mod_rewrite translation rules even more complex.

    never put the same application logic in two completely different places (and programming languages, no less) when it doesnt really have to be.


    [ Parent ]

    witness K5 (none / 0) (#38)
    by speek on Sun Aug 03, 2003 at 12:08:25 PM EST

    What is K5 an example of? The non-use of query strings (ie good), or the overuse of meaningless ID numbers (ie bad)?

    --
    al queda is kicking themsleves for not knowing about the levees
    [ Parent ]

    what do you think ? [nt] (none / 0) (#39)
    by zzzeek on Sun Aug 03, 2003 at 01:22:46 PM EST



    [ Parent ]
    both (5.00 / 1) (#40)
    by speek on Sun Aug 03, 2003 at 01:31:19 PM EST


    --
    al queda is kicking themsleves for not knowing about the levees
    [ Parent ]

    hello! (none / 0) (#42)
    by aytekin on Sun Aug 03, 2003 at 11:31:14 PM EST

    > hello! its not 1996 anymore ! > please learn to parse path information. Hello! Many 10-dolar-a-month-virtual-hosted webmasters still do not have access to their httpd.conf. Not everybody needs a dedicated server. Stop being a prick.

    [ Parent ]
    i think you misunderstood (none / 0) (#43)
    by zzzeek on Sun Aug 03, 2003 at 11:55:44 PM EST

    this exactly for when you dont have access to httpd.conf, or even mod_rewrite or apache for that matter.  having scripts use path_info is a lot cleaner to use than mod_rewrite when you want a dynamic page to not use query string for content identifiers.


    [ Parent ]
    Sorry. how about forms? (none / 0) (#46)
    by aytekin on Mon Aug 04, 2003 at 04:37:22 PM EST

    I thought you meant use rewrite for everything.

    I still don't see how you can use the path_info in the GET based forms.


    [ Parent ]

    im only talking about content... (none / 0) (#47)
    by zzzeek on Tue Aug 05, 2003 at 02:04:52 PM EST

    ...not user submitted data. like, a site full of news articles. like K5. or nytimes.com...or wahtever. see the url? its got /comments/2003/7/31/2335/08552, which organizes the content into a file hierarchy, even though it perhaps may come from a database dynamically or something. instead of the old comments.pl?id=08552.

    [ Parent ]
    Hallelujah! (none / 0) (#45)
    by siberian on Mon Aug 04, 2003 at 10:40:11 AM EST

    mod_rewrite is overkill for simple path parsing for a simplified URL scheme. You can do that with stock HTTP environment variables.

    Its the 21st century, lets all use our technology wisely.

    [ Parent ]

    why do you live in the past (1.14 / 28) (#17)
    by tofubar on Fri Aug 01, 2003 at 07:14:18 PM EST

    apaches name comes from the fact that it's a bunch of bloated code patched and built up and up. just like linux, a god damn archaic monolithic os. fuck, you open source retards, why do you live in like the 80s, and would you use something better like a commodore 64 or apple?

    the 80s? (4.33 / 3) (#19)
    by Work on Fri Aug 01, 2003 at 09:22:08 PM EST

    most of that monolithic stuff started going out of style in the late 70s... the 80s represented the rise of the microkernel, at least among serious OS researchers.

    [ Parent ]
    indeed (2.00 / 1) (#20)
    by tofubar on Fri Aug 01, 2003 at 10:20:31 PM EST

    so why is linux monolithic and sucky?

    [ Parent ]
    cause its simple. (5.00 / 3) (#23)
    by Work on Sat Aug 02, 2003 at 01:52:58 AM EST

    as for 'sucky', well thats all relative. works fine for many things.

    Monolithic... well probably because that was easiest for linus to build. After all, he was a grad student (or was he undergrad?) when he first started working on it. Of course the most amusing exchange is the old usenet one between him and andy tannenbaum (an OS professor and reseacher) where tannenbaum declared linus wouldn't pass his class for such an outdated design. Linux was also based on tannenbaum's toy OS Minix which he used to teach a beginner's OS class. This being back in the early 90s.

    [ Parent ]

    Zero (1.33 / 3) (#21)
    by regeya on Fri Aug 01, 2003 at 11:12:49 PM EST

    lamest troll attempt ever

    [ yokelpunk | kuro5hin diary ]
    [ Parent ]

    its not really a troll (1.00 / 2) (#24)
    by tofubar on Sat Aug 02, 2003 at 03:54:46 AM EST

    but i mean, that's linux communities rationalization for why people genuinely think linux sucks, they must be trolls.

    [ Parent ]
    Well I gave you a 1 because your comment sucked. (1.75 / 4) (#26)
    by Ta bu shi da yu on Sat Aug 02, 2003 at 04:52:31 AM EST

    Troll or no troll.

    Yours humbly,
    Ta bù shì dà yú

    ---
    AdTIה"the think tank that didn't".
    ה
    [ Parent ]

    Oh, I'm sorry. (2.25 / 4) (#27)
    by regeya on Sat Aug 02, 2003 at 03:14:11 PM EST

    I keep forgetting that idiots do have Internet access, and occasionally figure out how to use it.

    My bad.


    [ yokelpunk | kuro5hin diary ]
    [ Parent ]

    there is no way this comment deserves to be hidden (2.66 / 3) (#28)
    by rmg on Sat Aug 02, 2003 at 04:29:18 PM EST

    it may be stupid. it may be a lame troll. but it is not hidden page material.

    one could imagine this being a serious comment. hell, it may even be one.

    some people on this site need to think very carefully about their zero rating policy.

    _____ intellectual tiddlywinks
    [ Parent ]

    What in your opinion is modern? (1.00 / 1) (#36)
    by Lynoure on Sun Aug 03, 2003 at 07:17:39 AM EST

    Go head, tofubar, tell us.

    [ Parent ]
    +1 because (4.33 / 3) (#25)
    by dzimmerm on Sat Aug 02, 2003 at 04:37:27 AM EST

    It is not about religion and it might be usefull to some folks. It also is about "life in the trenches".

    dzimmerm

    Where's the beef? (5.00 / 1) (#29)
    by iso on Sat Aug 02, 2003 at 06:46:09 PM EST

    Great idea for an article, but where is the content? The examples? You say:

    Once I did this, I had to write some mod_rewrite rules to translate any requests for:

    /my-app/my-section/my-page.html

    Into:

    /yourfile.cgi?section=widgets&page=13

    Ok, where is the example of this? How did you do it? Does anybody have some relevant implementation details to share?



    I'm wishing I'd done more with it... (none / 0) (#30)
    by kpaul on Sat Aug 02, 2003 at 08:52:24 PM EST

    too rushed, I think.

    Maybe I'll do another (more in depth one) that isn't a 'primer.'

    Thanks for sharing your comment.


    2014 Halloween Costumes
    [ Parent ]

    the (late) example... (5.00 / 1) (#33)
    by kpaul on Sun Aug 03, 2003 at 02:08:39 AM EST

    /my-app/my-section/my-page.html

    Into:

    /yourfile.cgi?section=widgets&page=13

    Options +FollowSymlinks
    RewriteEngine on
    RewriteBase /a-folder/
    RewriteRule ^\/(.*)\/(.*)\/(.*).html$ yourfile.cgi?my-app=$1&section=$2&page=$3


    Or, if you have a URL you'd like re-written, post it here and I'll have a go at how I would set it up.


    2014 Halloween Costumes
    [ Parent ]
    I use it (none / 0) (#41)
    by Freaky on Sun Aug 03, 2003 at 05:04:17 PM EST

    aagh.net is all served from a single index.php file, with URL's mapped from an .ini file "mounting" modules on, e.g. blog and search.  If you look closely, it also uses Accept: headers to determine whether it should send your client XHTML 1.1 as application/xhtml+xml or HTML 4.01 as text/html, although this isn't done with mod_rewrite :)

    newzbin.com also uses a similar technique, giving URL's of the form browse/cat/p/games, which could map to anything you like behind the scenes, and is fairly easy to remember.  It's also used to provide shortcut URL's, e.g. /12345 to jump to a specific post, which is excellent for email notifications and the like.

    Reverse proxy as well (none / 0) (#44)
    by siberian on Mon Aug 04, 2003 at 10:38:08 AM EST

    Many people forget that running a reverse proxy is critical when you are dealing with dynamic content engines. The reasons for this are numerous but the general idea is that you can get an average of 10 to 1 requests served if you let your reverse proxy handle the user connection and static content (images and whatever else you define). Think about it, why do you want your database connected, perl or php embedded, java enabled apache serving 'spacer.gif' and other such non-dynamic content, thus using valuable RAM and CPU to do, essentially, nothing of value? You don't!

    Granted, this does not give you a 10 to 1 when you are using something fairly light like PHP but it does wonders for things like JSP pages or mod_perl enabled systems.

    mod_rewrite is a great tool for that fine grained control over how you reverse proxy. Here is an example ruleset:

    ##
    # If its a gif or other 'static' type handle it
    # with the lightweight apache proxy running
    # mod_rewrite
    ##
    RewriteRule \.(gif|jpg|png|css|txt|cgi)$ - [last]
    RewriteRule ^/cgi-bin - [last]

    ###
    # You to can have fun with debug ; Turn it off for production, its slow.
    ###
    #RewriteLog /var/log/rewrite.log
    #RewriteLogLevel 5
    #####
    # Check for trailing slash issue
    # If its a directory, append a slash
    #####
    RewriteCond     /www/html/%{SCRIPT_FILENAME}    -d
    RewriteCond     %{SCRIPT_FILENAME}      ^.*[^\/]$
    RewriteRule ^(.*)$ $1/ [N]
    # pass off everything we have not matched off to #the heavy-weight server via reverse-proxy
    # Now this lightweight apache process is the client
    # to my dynamic content server. It mediates
    # between the user on slow dial-up who needs
    # 100 seconds to load the cool marketing
    # images and flashy content and my backend
    # server that is really CPU and memory intensive.
    # This apache proc retrieves the info in under a second
    # and then doles it out to the user for me.
    #
    # Thanks Reverse Proxy!
    ####
    RewriteRule ^/(.*)$ http://backend.content.server/$1 [proxy]

    Anyhow, reverse proxy is your friend, give it a shot. I do a good amount of HTML::Mason/Mod_perl development on fairly high-traffic systems (~5-10 million unique non-image requests a day) and the reverse proxy approach has saved us a ton of money.

    5 if i wasn't afraid to try to vote using Safari (none / 0) (#48)
    by kpaul on Tue Aug 05, 2003 at 09:01:03 PM EST

    again and too lazy to fire up another browser on the g3 ;)

    thanks for contributing...


    2014 Halloween Costumes
    [ Parent ]
    Learn PHP better (none / 0) (#51)
    by bolthole on Thu Aug 07, 2003 at 08:36:43 PM EST

    The irony is that you used a php page as an example of how you need mod_rewrite. But php gives you the power to completely do without mod_rewrite, at least for the purposes you demonstrate.

    I have a page,  http://www.blastwave.org/packages.php

    You can access the server with

    http://www.blastwave.org/packages/somepackagename

    and the packages.php script gets called, and it can determine, "oh.. I need to look up the information for 'somepackagename'", all without ever needing mod_rewrite.

    (you MAY need the "MultiViews" apache option enabled, though. Which it normally is by default)

    The trick is to strip out $_SERVER["SCRIPT_NAME"] from $_SERVER["REQUEST_URI"]

    Apache mod_rewrite Primer | 52 comments (40 topical, 12 editorial, 0 hidden)
    Display: Sort:

    kuro5hin.org

    [XML]
    All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
    See our legalese page for copyright policies. Please also read our Privacy Policy.
    Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
    Need some help? Email help@kuro5hin.org.
    My heart's the long stairs.

    Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!