Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

Kill the Anonymous Hero

By theboz in Meta
Fri Jul 27, 2001 at 08:54:10 AM EST
Tags: Kuro5hin.org (all tags)

This diary entry has raised a question about how to improve the performance of K5 so that it is usable during the day again. I am making assumptions about what the cause of the problem is based on some of the comments I have read in the past and would like to offer some suggestions to improve the experience for everyone.

We all know that the performance of K5 has been degrading tremendously. While rusty and Inoshiro are dealing with a replacement server that doesn't work, pressure builds up for them as the site is slashdo...I mean k5'ed every day between 9:00am EST and 6:00pm EST. Getting the new system up and running may take a long time, and almost everyone seems to be complaining about the performance problems.

You see, there is one group of people that will not complain when they can't reach the site, and they do not take part in any discussions, moderation, or any other part of the site. They come here to glance over the website, often following links for slashdot or another site or check out an article someone told them about or countless other reasons. If they are interested in the site, they sign up for an account and start to take part in posting comments and moderating stories.

The problem is that there are too many of these people. As I am writing there there are 164 anonymous heros accessing the site and it is nearly making the entire site unaccessable. They are unknowingly killing the kuro5hin. As more and more of these people come here, it prevents everyone else, including the registered users and even the subscribers from viewing the site. Without the active users who submit stories and comments, this site is nothing. That's why we have to find a way to allow people to get a glance at the current articles, but not to mess up the system performance.

What I propose is that a nightly dump of the current articles and comments, perhaps for the 50 most recent articles (including section) to flat html files. These files get sent via ftp to a mirror site, where anonymous users can see what the previous days stories and comments were all about and still be able to sign up for the site. Once they sign up then they can have access to the real k5 like all of the other registered users.

I think this would help improve the performance of the site without hindering the ability for new users to check out k5 and see what this site is all about. It would get rid of the high number of anonymous heros accessing the system so that we can actually post things interesting to get new people to sign up. I won't say whether this should be a permanent or temporary measure but it doesn't sound difficult, and I think it would help eliminate some of the stresses that this website is going through right now.

I know many people won't like this idea and will disagree with me, and I would like to ask you to give a better alternative or explain why it won't work.


Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure


Do you think this would work?
o Yes 15%
o No 60%
o Maybe 25%

Votes: 80
Results | Other Polls

Related Links
o Slashdot
o Kuro5hin
o This diary entry
o a replacement server that doesn't work
o Also by theboz

Display: Sort:
Kill the Anonymous Hero | 66 comments (61 topical, 5 editorial, 0 hidden)
why bother? (4.20 / 5) (#1)
by klamath on Thu Jul 26, 2001 at 10:54:24 AM EST

I don't think that there's much point in making substantial modifications to K5 just to alleviate some short-term performance problems.

Furthermore, the solution you're proposing would make the site substantially less appealing to many potential users; and although they would have the option of signing up for an account, why penalize them in the first place?

I'd say that very soon, the new server will be installed and performance will return to an acceptable level. The time and effort it would take to make these modifications to the site and setup the FTP/mirror system would be better spent fixing the server.

Plus, even if the changes are made, will performance improve significantly? MySQL reads tends to be very fast, but I'd wager it's the number of updates that is hurting database performance -- and those updates are generally made by registered users voting on stories, posting comments and voting in polls. As some have mentioned, moving to PgSQL seems like a better long-term plan to improve site scalability.

Thinking big helps even when you're small. (5.00 / 1) (#32)
by ramses0 on Fri Jul 27, 2001 at 01:31:15 AM EST

Thinking big helps even when you're small (small sites grow big, and scalability helps).

It's not always the case that you can throw more load-balanced hardware at a problem, especially at a free (not-for-pay) site. See one of my earlier comments in this article about places where some duplicated database effort can be saved.

It would be as simple as adding an [input type=hidden value=none] to the voting pages, as well as the rate-all form. (There isn't currently a 'none' option for comment display in scoop, but it would be trivial to implement).

I'm sure that the recent addition of mod_rewrite handling is burning some CPU cycles, and also I can't imagine how expensive an operation it must be to show "new: 10 comments" for all stories. Don't get me wrong- those were both great additions to scoop/k5, but I'm sure they're still expensive operations.

Premature optimization is the root of all evil, but short of implementing caching solutions there are a few simple guaranteed measures which can help reduce the load on the server.

I've tried to get scoop set up on a local box so I could learn perl and help out (PHP is my web language of choice) Maybe once there's a debian package for scoop, I'll finally be able to hack on it and implement site-wide hotlisting, as well as "post to $section section" on story voting. :^)=

[ rate all comments , for great ju
Parent ]

Scalability, Caching, Scoop (none / 0) (#54)
by panner on Fri Jul 27, 2001 at 05:18:21 PM EST

True, throwing more hardware at something isn't always the best way to fix a problem. But there comes a point when that's the problem, and k5 is at that point. Also true is that Scoop doesn't do extremly aggressive caching, and doesn't mind eating up CPU cycles. However, no matter how much optimization there is in the code, there will still be slow-downs until something is done about the database.

As for the none option for comment display, submit a feature request. Someone will code it, and probably end up in 0.9 development (0.8 release is forthcoming).

As for mod_rewrite, it's not recent at all. And it has nothing to do with new URLs. mod_rewrite has been used for awhile to implement the (somewhat standard) two apaches: a lightweight one in front that proxys back to a heavy mod_perl server. On the other thand, mod_proxy has nothing to do with the new URLs, those are supported nativly by Scoop, and really probably don't take to much CPU (not much more then parsing a query string, I'd say).

The new flags aren't really to bad, either. The only thing is that scoop isn't removing old rows for these, which it should (no one has gotten around to a cron job for it). Besides that it's just a row with a comment count, and if story commentcount > stored then x new, where x is story cc - stored cc, and any cid over stored cc is new.

As for Scoop, it's really not hard to setup (but then, I reinstall it on a whim, so I don't count). Get a recent nightly tarball (or wait a little while for 0.8 release), read the INSTALL (and/or Installation section of SAG, which will come with 0.8), and run the installer. It'll do all it can. What do you mean by site-wide hotlisting, anyway? You can hotlist any story on a site, so not that.

Keith Smiley
Get it right, for God's sake. Pigs can work out how to use a joystick, and people still can't do this!
[ Parent ]
Ahhhh @ [new] comments. (none / 0) (#58)
by ramses0 on Fri Jul 27, 2001 at 07:37:18 PM EST

I didn't know you guys already stored total comment counts per story. I was envisioning some large SELECT COUNT(*) FROM comments WHERE sid IN(...) GROUP BY sid; ... I guess it's much easier your way. :^)=

I've browsed the scoop code more than once, and it's very nice-looking... I have to say kudos to whoever implemented the preferences options the way they did. Simple and elegant, even if it does tie your preferences inextricably (not really) to perl.

Oh, and site-wide hotlist... it's really easiest to just show:

Top Hotlisted stories by k5 users:
#1 - Who are You (137 comments)
#2 - Things I learned from the RustyCam (100 comments)
#3 - Why scoop is the best weblog ever (989 comments)

The idea being that k5 shows a list of the stories that users have hotlisted the most. To implement is fairly straightforward (I've made at least 2 stabs into the scoop code to know this much):


add_to_hotlist( $sid )... UPDATE stories SET times_hostlisted += 1 WHERE sid=$sid;

remove_from_hotlist( $sid ) ... UPDATE stories SET times_hotlisted -= 1 WHERE sid=$sid;

get_top_ten()... SELECT * FROM stories ORDER BY times_hotlisted DESC LIMIT 10;

Please forgive my pseudocode.

I don't talk about it much anymore because it seems a bit unwieldy now with the size of K5. It seemed a lot cooler back when the idea of having more than 100 stories was ridiculous. :^)=

[ rate all comments , for great ju
Parent ]

Ah, I see (5.00 / 1) (#60)
by panner on Fri Jul 27, 2001 at 09:09:20 PM EST

Well, your site-wide hotlist could be done quite easily with a box. When comment counts went in, the hotlist stuff was moved to the same table (viewed_stories), so I'm thinking this would work:

SELECT sid, count(sid) AS total FROM viewed_stories WHERE hotlisted = 1 GROUP BY sid ORDER BY total DESC LIMIT 10

I just tried this on my install, and it seemed to work. It gave the results I expected, so after that it's just a matter of displaying it. The biggest thing will be getting the story titles to display (a join does this, as I just tested).

I'll probably write a box like this, but one thing is the user-prefs for boxes (another devel thing...), so that users can turn off boxes they don't want. As it stands now, k5 is basically out of space for boxes (that's one, maybe the, reason RDF feeds aren't enabled).

Keith Smiley
Get it right, for God's sake. Pigs can work out how to use a joystick, and people still can't do this!
[ Parent ]
An update on the box (5.00 / 1) (#61)
by panner on Fri Jul 27, 2001 at 09:27:27 PM EST

Okay, I just threw together a basic box that does that (17 lines, 9 of those make the select). Actually counting the position added two more (I had %%dot%% instead of 1, 2, etc). Now I realize your example has comment counts, which is a little harder because scoop currently doesn't keep the count (it did before, and it's going to again in the near future). However, I'm about to add another select to do this, and between the two, I'll have exactly what your looking for (at least, as far as I can tell).

I guess I'll post it to scoop.k5 when I'm done.

Keith Smiley
Get it right, for God's sake. Pigs can work out how to use a joystick, and people still can't do this!
[ Parent ]
You rock! :^)= (none / 0) (#62)
by ramses0 on Fri Jul 27, 2001 at 09:53:51 PM EST

...and can whore for mojo by implementing new scoop features any day of the week as far as I'm concerned. :^)=

One of my original ideas was for something similar to the /. hall of fame, or to give our anonymous friends a hotlist when they didn't have one of their own. Kindof funny that in a story about killing AH's because they cause too much load that I'm advocating generating more load for anonymous users. :^)=

Counting vs. Dots wasn't an issue at all for me, it was just easier to convey the concept of ordering by using numbers. Numbers I think would detract from the 'feel' of the box because some people might mistake it for some sort of competition. It'd be neat to see how it looks on a real scoop site, but I'd hesitate to check it in to a site that is super-busy. ORDERing BY anything that isn't indexed (even ORDER BY RAND();) is a huge performance killer with MySQL.

Wow. Inspiration. I'm gonna go home and code up some cool stuff now so I don't feel left out. :^)=

[ rate all comments , for great ju
Parent ]

caching (4.60 / 10) (#2)
by Defect on Thu Jul 26, 2001 at 11:00:10 AM EST

I think that if pages for anonymous viewers are simply cached, then having a mirror becomes less of an issue. I know there's been talk on scoop.k5 and the mailing list as to how to improve scoop performance, but i have no idea what, if anything, has come of it. A (what i perceive to be) simple, perhaps temporary, solution would be to knock the front page html over to an entry in the "special" table of the scoop db every half hour or so. That way, if an anon hits the site, then scoop would just output the html located in the special table (fyi, by special i mean the "static" html pages like the irc page and the faq). AFAIK, scoop handles the parsing and output of "special" pages fairly well and fairly quickly.

On another, maybe (but probably not) related, note; as i pointed out in one of my diary entries, k5 uses its own graphic for the list item bullets, when html has its own entity for such (&ull; : •). K5's version adds 89 bytes per use of the bullet, and at around 160 bullets on the front page and all sections, that's 14k of extra html per page. I know k5 maxes out at around 90k hits a day (from last i heard), and if you figure that 30k of those hits go to the FP or a section page, that's around 500 megs of wasted transfer daily. I doubt that the issue is bandwidth though, but it really wouldn't hurt at all to optimize the html here. And this is simply changing the "dot" entry in the scoop block table.

Of course, this may just be my loathing of the 56k modem sitting in my machine, where 14k is a 3 to 4 second longer download.
defect - jso - joseth || a link
Um (3.00 / 1) (#25)
by suick on Thu Jul 26, 2001 at 06:30:09 PM EST

A small point, but most web browsers download the image once per page, and then keep it cached. Assuming that image caching is turned off (as I have set currently), the browser still only downloads the image once per page. So, assuming none of the readers cache images, it's only 8 megs of wasted bandwidth. But then, we all know that most people cache images, so...

order in to with the will I around my effort sentences an i of more be fuck annoying.
[ Parent ]
not the image, the html for the image. (4.66 / 3) (#26)
by Defect on Thu Jul 26, 2001 at 06:59:32 PM EST

Check the diary link out for more info.

html for the image :
<IMG SRC="http://www.kuro5hin.org/images/lidisc.gif" BORDER=0 ALT="o" WIDTH="12" HEIGHT="12">

native html entity for a standard bullet :

That is where the 89 byte difference is.
defect - jso - joseth || a link
[ Parent ]
HTML Entities (4.00 / 3) (#29)
by J'raxis on Thu Jul 26, 2001 at 08:29:59 PM EST

I don't trust HTML entity names except for the very, very old ones like "aacute;" and "amp;".

One should use the Unicode-number entities for the rest of the symbols, not the names -- for example, "ldquo;" is the left double quote entity under XHTML/1.0, but most browsers have never heard of that and display it literally. The Unicode number for "ldquo" is "#8220;" (decimal). Unicode-numbered entites are more widely-supported (I think they've been in the standard since the beginning, just mostly ignored) -- if the OS has the symbol, it'll work right.

Oh, and the Unicode for "bull;" is "#x2202;" (hex). See the chart I found it in here.

-- The U+00RAXIS

[ J’raxis·Com | Liberty in your lifetime ]
[ Parent ]

Database = death (3.75 / 4) (#3)
by slaytanic killer on Thu Jul 26, 2001 at 11:33:09 AM EST

Well, any caching routine has to deal with making static pages out of the database. For example, if you want to do a search, clicking on the Search link immediately dumps the most recent stories. So there are some gratuitous database hits right there, that could easily be cached every 5 seconds (Scoop architecture willing). I don't think that Anonymous Heroes are the problem, since the pages they visit are the most easily cached.

But a main problem is simultaneously debugging the new server + making modifications to Scoop. Perhaps life is really being messy for the K5 cabal, and they'd have a hard time dealing with both.

The ability to easily introduce caching depends on the architecture. Maybe it is non-trivial for Scoop's design. After all, it is written in Perl, which I am told was selected because they didn't want to purchase a code obfuscator back when Inoshiro was dealing with security. I've had to make obfuscated builds myself to reduce codesize for faster web transmission, and I know that obfuscation can sometimes lead to design problems, especially with dynamic code like Perl closures.


Overly complex solution (3.60 / 5) (#4)
by Betcour on Thu Jul 26, 2001 at 12:13:03 PM EST

Your idea is well too complex. There are two easier solutions :
  • Cache the DB queries (either thru PERL or thru a custom made daemon). I haven't looked in scoop but if it uses MySQL (as I suspect it does) then you have to know there's no query caching - the database will always go search whatever query you throw at it even if it is the billionth time you make the same SELECT. I think it would be rather easy to write a MySQL caching daemon (when UPDATE, DELETE or INSERT in a table throw the cache out, and save and serve all SELECT queries) but for some reason the guys over at MySQL.com seem to think it is not important
  • Cache the pages, save them whenever they are generated and delete them when they are dirtied. This can be done with a few lines of extra code and a bit a mod_rewrite. Additionnaly you can use it to serve gziped content at no extra cost and save a lot of bandwitdh at the same time !

heheh.. (4.00 / 1) (#23)
by Inoshiro on Thu Jul 26, 2001 at 04:40:30 PM EST

We tried using a perl caching solution. mod_perl will never, ever free memory though -- no matter how much you hint it. Rusty's caching code just made us use swap. Then die from oom. An inbetween MySQL and mod_perl would do it, but I'm not sure that'd be a big win in any case. MySQL is pretty damned fast on selections (being just a hash operation), it just needs to get row-level locking.

The pages are cached (sorta) through mod_proxy. As for saving bandwidth, we're not really hurting for it. mod_gzip won't give us the boost we need since we just need a beefier, working db server.

[ イノシロ ]
[ Parent ]
comment caching (5.00 / 1) (#27)
by dr k on Thu Jul 26, 2001 at 07:59:35 PM EST

Just poking through the Scoop code... have you considered caching the non-configurable bits of formatted comments to the DB? Hm, would only save about 10 lines of code in format_comment.
Destroy all trusted users!
[ Parent ]
Great topic, terrible plan (3.87 / 8) (#5)
by DesiredUsername on Thu Jul 26, 2001 at 12:34:04 PM EST

I voted this FP because something needs to happen. But this idea is wrong from beginning to end. 164 "normal usage" users bringing down the system? If that's the case, adding a server won't fix the problem.

The first step is to determine what exactly the bottleneck is. Bandwidth? CPU? Then target a solution to that problem.

My personal feeling is that K5 pages are WAY to complex (both bandwidth and CPU) and fixing the slowdown is a simple matter of coding. For instance, get the hell rid of "who's online". Compare what possible use it is vs how much CPU is burned computing it. But again, this is all from my ass since I don't have access to any actual facts.

Facts first, then solutions.

Play 囲碁
here's a question (3.00 / 2) (#11)
by core10k on Thu Jul 26, 2001 at 02:12:39 PM EST

Are pages gzipped? If not, they sure as hell should be. There's no reason not to.

[ Parent ]
CPU time (4.00 / 2) (#14)
by ajf on Thu Jul 26, 2001 at 02:32:59 PM EST

There is a very good reason not to compress dynamically generated pages. If the bottleneck is CPU time, compression is only going to make it worse.

"I have no idea if it is true or not, but given what you read on the Web, it seems to be a valid concern." -jjayson
[ Parent ]
compression (3.00 / 2) (#17)
by core10k on Thu Jul 26, 2001 at 03:03:57 PM EST

I have a hard time believing that. Really. Data has to be mangled to pipe through TCP/IP anyways, and that's VERY expensive.

[ Parent ]
Data does not have to be mangled (3.50 / 2) (#34)
by pjc50 on Fri Jul 27, 2001 at 03:27:04 AM EST

Just stick a header on it and send it to the NIC :)

Seriously, a lot of work has gone into reducing the number of memory-to-memory copies of information when sending it out over TCP. TCP is a small proportion of the cost of generating a K5 page, and gzipping would only make it larger and annoy those browsers that don't support content-transfer-encoding.

I suspect (from a position of ignorance, not having read the scoop code) that the problem is MySql's poor INSERT performance, particularly when there is a large BLOB oftext (e.g. a story) included. Every INSERT locks out SELECTs while it is proceeding; therefore, having a constant stream of inserted data (comment ratings?) kills your table performance. This has bitten me on a commercial project; the solution was to turn on delayed inserts and make sure all the selects were indexed.

[ Parent ]
gzipping wouldn't annoy browsers (none / 0) (#43)
by chrisbolt on Fri Jul 27, 2001 at 11:47:54 AM EST

Gzipping doesn't annoy browsers that don't support Content-Encoding because when the browser requests a page it includes a header that tells what encoding it supports (Accept-Encoding). If the client's request doesn't specifically say it supports gzipping, the server just doesn't gzip the page.

<panner> When making backups, take a lesson from rusty: it doesn't matter if you make them, only that you _think_ you made them.
[ Parent ]
The money you save on bandwidth (none / 0) (#53)
by pin0cchio on Fri Jul 27, 2001 at 03:59:35 PM EST

If the bottleneck is CPU time, compression is only going to make it worse.

But the money you save on bandwidth by sending gzipped pages to browsers that Accept-Encoding: gzip could be put toward a faster server, couldn't it? And this gzip would only affect the web server once you get your dedicated database server up.

[ Parent ]
The problem is... (4.92 / 13) (#6)
by rusty on Thu Jul 26, 2001 at 01:25:43 PM EST

Database too big. Site too busy. Machine too small. Simple as that.

Yes, some static page caching would help, but shoehorning it in badly right now is not a good solution. The solution right now is making the nice big shiny machine that we already have work. I'm as fed up as you are, probably more so in fact. Think about it -- you can't get to K5, which sucks. I can't get to K5, which sucks, but I'm also losing money with every timed-out page and dropped connection. Or, to put it in technical computer terms, this is what we would call "a giant clusterfuck."

This is a problem that's already solved, if only I could get the nice folks at Vhosting to get our hardware online. Meta articles unfortunately can't help with that. :-(

Not the real rusty

To reduce your (and our) pain... (4.00 / 3) (#7)
by slaytanic killer on Thu Jul 26, 2001 at 01:58:43 PM EST

Is the problem with Vhosting one that would be unwise to talk about? People can easily weather these things, if they just had a guess to what is happening, or know that it's not a good idea to know.

[ Parent ]
curious. (3.33 / 3) (#8)
by Defect on Thu Jul 26, 2001 at 02:01:24 PM EST

How much space is taken up by dumped stories, along with the comments and polls attached to them (and ratings attached to the comments)?

If it's a significant amount, what are you keeping them around for? It makes sense in a packrat, nostalgic kind of way, but they are really very useless, especially if there is no easy way for a user to find their old dumped stories.

There really should be some sort of time limit to how long dumped stories are retained, as it serves almost no purpose to keep them around forever. Right now, i've got a story that was dumped almost 10 months ago that's still in the database, doing nothing for nobody. And the only way i found it was searching back to the very beginning of my comments, to get the link from a comment i posted to it.
defect - jso - joseth || a link
[ Parent ]
Not meant to be a really "bitchy" articl (5.00 / 1) (#10)
by theboz on Thu Jul 26, 2001 at 02:07:53 PM EST

I originally was wanting to flame the anonymous person at vhosting that can't figure it out, compaq for sending a possibly defective machine, and mysql for sucking so badly.

I wanted to try and make an article that doesn't place any blame on anyone since blaming doesn't solve anything, but I did want to offer an alternative (whether or not is was very good) as a temporary fix. I should know better though, because in the corporate world a temporary fix ends up being permanent so that wouldn't be a good idea.

[ Parent ]

Comment cache (4.33 / 3) (#13)
by fluffy grue on Thu Jul 26, 2001 at 02:21:21 PM EST

Whatever happened to that pre-template comment cache thing we were talking about on IRC a couple months ago? i.e. where it'd cache (strictly on-demand, like how caches are supposed to be) the most common comment views.
"Is not a quine" is not a quine.
I have a master's degree in science!

[ Hug Your Trikuare ]
[ Parent ]

Database lookups (4.00 / 4) (#18)
by sigwinch on Thu Jul 26, 2001 at 03:32:00 PM EST

Database too big. Site too busy. Machine too small. Simple as that.
Would these help reduce database lookups?
  1. Clicking 'Rate All' on a comments page goes to its own simple page, and doesn't reload the entire frickin comments page. Every time a rate, I cringe at the 50-250 kB of extra traffic.
  2. Ditto for the 'Vote' button on a story.
  3. Clicking a story title in the mod queue opens it in a new window. That way people don't have to reload the mod queue page every time they vote. Sure, people *can* manually open it in a new window, but many take the path of least resistance/skill and use a single window.
  4. Streamline the HTML source. 'http://www.kuro5hin.org/' shows up a lot. Couldn't relative URLs be used? The FACE attribute of FONT tags also 'wastes' a lot of space. HTML traffic probably isn't the problem, though.
  5. Temporarily set everybody's comment rating option to 'No'. Don't hardwire it to 'No', just set it one time. People who really need it can turn it back on to deal with flooding problems. You can turn it back on when the bottleneck is widened.

I don't want the world, I just want your half.
[ Parent ]

Those aren't the problems. (4.80 / 5) (#21)
by Inoshiro on Thu Jul 26, 2001 at 04:30:19 PM EST

1) have the patience/self-discipline to read all the comments, choosing a rating as you go along. Once done, hit the rate all button. Don't hit it for each comment. That's a waste of time.

2) That kind of select is very easy for MySQL to cache. It's a non-issue. And AHs already can't see that part anyways.

3) I've yet to see a person who doesn't middle click most of the links. I think the browsers should default to new window operations since that's waht most people use once they become experienced with usingthe web.

4) haha.. rusty's HTML makes baby jesus cry. The one time I went to streamline it via CSS, all the silly NS4 users who don't want to upgrade to a CSS supporting browser (like Moz, Konq, Lynx, etc) complained. Anything else is a half-measure and not worth the effort, compared to CSS.

5) That's not what people do most often. Go read the laments about people not rating enough.

[ イノシロ ]
[ Parent ]
HTML (4.00 / 3) (#28)
by J'raxis on Thu Jul 26, 2001 at 08:20:45 PM EST

I think it might be a good idea to move all the formatting into a CSS stylesheet -- so the HTML itself, without the stylesheet, produces pages as elegantly plain as, say, the FSF. Then, CSS-compliant browsers get the stylesheet, Netscape users don't -- they get the still-usable plain page. Of course, doing these user-agent checks might be an additional load.

-- The CSS–Compliant Raxis

[ J’raxis·Com | Liberty in your lifetime ]
[ Parent ]

*zooooom* (sound of plane flying over head) (4.50 / 2) (#31)
by ramses0 on Fri Jul 27, 2001 at 01:11:38 AM EST

1) Load up a 25 comment story. Mark all your ratings. Click "rate all" once. Load up *ALL 25 COMMENTS THAT YOU JUST RATED* again. I've read those comments once already. I don't need to reload them. It would be much nicer to listen to what the nice user is saying and go to a "thank you, I'm not going to spam you with 100kb of text again" page. :^)=

2) Agreed.

3) I have yet to see middle clicking open up a new window on the most popular internet browser on the planet. But you still get spammed with 100kb of comments (that you've already read) after you vote on a story. Once again, listen to the nice user and think about a thank-you page which doesn't load the comments.

4) Personal attacks are bad. CSS is good.

5) Agreed.

[ rate all comments , for great ju
Parent ]

Open in New Window in IE (none / 0) (#56)
by pin0cchio on Fri Jul 27, 2001 at 05:35:41 PM EST

listen to what the nice user is saying and go to a "thank you, I'm not going to spam you with 100kb of text again" page.

Or make it a user preference. (This setting would rarely be changed; update locks shouldn't have much effect on selects from this table.)

I have yet to see middle clicking open up a new window on the most popular internet browser on the planet.

In IE, shift-click does that.

[ Parent ]
Usability Suggestion (4.66 / 3) (#35)
by the trinidad kid on Fri Jul 27, 2001 at 04:04:12 AM EST

I have often wondered why the rate button says [Rate All] - I rate comments individually because each rating slot has a button beside it - one button one control implicitly implies that that button works with that control.

An intuitive [Rate All] button would only exist once.

To make rating all explicit you should stick some explanatory text beside it.

PS: anyone wanting to have a pop at me for being a moron better have read Are Users Stupid? by Jacob Neilson first.

[ Parent ]
Usability Follow-up (5.00 / 1) (#37)
by the trinidad kid on Fri Jul 27, 2001 at 04:17:34 AM EST

Having grasped the concept of [Rate All] I made the mistake of assuming that the scope of the "all" was the totality of the comments on the story instead of the comments on the current page...
Worth mentioning in the explanatory text.

[ Parent ]
Please don't open up new windows! (5.00 / 1) (#44)
by MrEfficient on Fri Jul 27, 2001 at 12:43:45 PM EST

3) I've yet to see a person who doesn't middle click most of the links. I think the browsers should default to new window operations since that's waht most people use once they become experienced with usingthe web.

Please!!! Don't make the links open up new windows. More often than not, that's not what I want, and if it is, a regular link will alow me to right click on it and open it up in a new window. Links that default to opening a new window don't even give me a choice.

[ Parent ]

Residual effect of right-click traps on habits (none / 0) (#57)
by pin0cchio on Fri Jul 27, 2001 at 05:42:38 PM EST

More often than not, that's not what I want, and if it is, a regular link will alow me to right click on it and open it up in a new window.

Or you might be afraid of right-clicking because some site operators not only map right-click to "Sorry, you're not allowed to right-click" but set it on infinite loop just to annoy the user.

[ Parent ]
Nobody's afraid (= (none / 0) (#59)
by ScrO on Fri Jul 27, 2001 at 07:40:56 PM EST

Or you might be afraid of right-clicking because some site operators not only map right-click to "Sorry, you're not allowed to right-click" but set it on infinite loop just to annoy the user.

C'mon, what "real" sites do this? Looking at the link you have of no-right-click offenders, they're all personal pages and such. I doubt that not being able to right click while looking at pictures of Angelina Jolie or looking for N64 cheat codes will cause me to never right click again... (=


[ Parent ]

Please don't do this (5.00 / 1) (#45)
by mauftarkie on Fri Jul 27, 2001 at 12:57:25 PM EST

I've yet to see a person who doesn't middle click most of the links. I think the browsers should default to new window operations since that's waht most people use once they become experienced with usingthe web.

Please do not do this. It irratates the hell out of me when the page designer decides for me whether or not to open a new window. While it's true that I middle-click the majority of the time, there are many times I do NOT want to open a new window. If you force links to always open in a new window, you've taken that choice away from me. I dislike not having a choice.

Please reconsider. My sanity thanks you.

Without you I'm one step closer to happiness without violence.
Without you I'm one step closer to innocence without consequence.

[ Parent ]
What's interesting is (4.00 / 6) (#9)
by wiredog on Thu Jul 26, 2001 at 02:07:02 PM EST

That so many people continue to come here in spite of the slowness.

If there's a choice between performance and ease of use, Linux will go for performance every time. -- Jerry Pournelle
Yogi sez (5.00 / 2) (#12)
by SlydeRule on Thu Jul 26, 2001 at 02:15:36 PM EST

"Nobody goes there anymore; it's too crowded." -- Yogi Berra

[ Parent ]
Patience (4.66 / 3) (#15)
by ajf on Thu Jul 26, 2001 at 02:36:53 PM EST

Yes, the site is noticeably slower in these hours. So go do something else while the page is loading. I've noticed it's particularly annoying when previewing comments, but it's hardly the end of the world. And it'll get better soon.

"I have no idea if it is true or not, but given what you read on the Web, it seems to be a valid concern." -jjayson
Those are the hours I'm at work. (5.00 / 1) (#22)
by theboz on Thu Jul 26, 2001 at 04:36:20 PM EST

I know what people reading this are thinking, but when I have been told by my boss that my job is to sit and wait for an unspecified amount of time I have to amuse myself somehow.

As far as it being slow, I can handle slowness, but most of the time it doesn't load at all during those hours.

[ Parent ]

Sounds like my job... (none / 0) (#48)
by dgwatson on Fri Jul 27, 2001 at 02:25:09 PM EST

I'm a student employee at Kent State, helping a Physics professor (who happens to be my dad :) with research (actually measuring performance of multi-anode photomultiplier tubes for use in the STAR ENDCAP project). Or at least that's what I was hired to do... but because the collaborators have been very slow at actually getting the MAPMTs here, most of the time I'm sitting around... I'm also supposed to be working on a coupler for optical fibers, but the machine shop we've outsourced it to has also been very slow.

So what do I do? Write cool free software (http://www.mcs.kent.edu/~dwatson/), and while I'm waiting for it to compile, I read K5, /., plastic, and various other sites.

[ Parent ]
new mysql table type (4.57 / 7) (#16)
by f00b4r on Thu Jul 26, 2001 at 02:55:14 PM EST

I dont know if K5 uses a Mysql backend, but putting 2 and 2 together it sounds about right.

Mysql definitly does have a lot of problems with mixing updates/inserts with selects.. a lot of these problems can be alievated by using the new Innobase table type with mysql. The site I run had much of the same slowness problems because the database was constantly recieving updates and inserts. When mysql 3.23.38 came out I upgraded and converted the tables to Innobase (largest being 20gb) and now the slowness (caused by locking when using MyISAM tables) has gone away. I am very happy with the increased performance.

Despite Innobase being beta code, it has run very reliably for me (even more so than MyISAM tables) as well as fixing my performance problems due to locking.

Perhaps k5 should look into using innobase for its tables. Atleast use it for the ones that are updated often.

Interesting. (4.66 / 3) (#20)
by Inoshiro on Thu Jul 26, 2001 at 04:23:53 PM EST

I think that having some static pages for AH would be a possible solution. It won't be the great solution you pimp it to be, because all the major load is done by the inserts/updates of logged in users.. but in case of something like a link from Slashdot, all those users will see the static content.

So for the case of a lot of AHs, it'll be a little bit of a win. But we'd need the code for it. Unless you have a patch I'm not aware of, it means rusty will have to take time to do the work -- which he can't, since he needs to go moving for the next few weeks.

[ イノシロ ]
Slashdot does a lot of static content (none / 0) (#50)
by pin0cchio on Fri Jul 27, 2001 at 03:28:30 PM EST

all the major load is done by the inserts/updates of logged in users

In that case, you may be indexing the wrong things.

but in case of something like a link from Slashdot, all those users will see the static content.

Speaking of Slashdot, Slashcode uses static pages for ACs' homepage and story views, updating them every few minutes with a cron job. It also removes two-week-old stories from the database (BIG speed win) and redirects to the static page.

[ Parent ]
I don't think that'd be a big win. (none / 0) (#63)
by Inoshiro on Sat Jul 28, 2001 at 12:49:12 AM EST

The biggest win is that any influx of non-users would only see static content. But for actual users, I think dynamic is just fine. We don't need to make things permanently static for users since we're all about discussion, not MLP :)

Once we get a huge number of users, maybe archiving stuff will be useful. But I doubt people go back to really old stories often.

[ イノシロ ]
[ Parent ]
Funny you mention that... (none / 0) (#64)
by Sunir on Sun Jul 29, 2001 at 04:53:35 AM EST

I lost nine months of e-mail, so I've been going through old stories now and then to pull up arguments between Rusty, Karsten, Maynard, and myself while I'm preparing a presentation.

I personally find Slashdot's static caching of archived articles to be incredibly annoying.

"Look! You're free! Go, and be free!" and everyone hated it for that. --r
[ Parent ]

You're not alone. (none / 0) (#65)
by Inoshiro on Mon Jul 30, 2001 at 05:40:31 AM EST

The static mode is very awkward to work with, which is why we've yet to "archive" a story after a year and a half of operation.

[ イノシロ ]
[ Parent ]
I feel guilty... (3.66 / 3) (#24)
by MattOly on Thu Jul 26, 2001 at 05:04:17 PM EST

When we upgraded our Scoop server, it went off without a hitch. Sorry, guys.

A final note to...the Republican party. You do not want to get into a fight with David Letterman. ...He's simply more believable than you are.

Just out of curiousity.. (4.00 / 3) (#33)
by Sheepdot on Fri Jul 27, 2001 at 01:39:28 AM EST

How many of you:

a) Only access K5 during the day: 9am-5pm (Banker's Hours)

b) Only access K5 at night: 5pm-2am (Sheepdot's Hours)

c) Combination of both?

d) Access it at such odd times so infrequently no system is adhered to?

Yes, you'll have to hit the "Reply" button to cast your vote, but it'd be interesting to see what hours you access it.

Re: Just out of curiousity.. (4.00 / 1) (#36)
by YesNoCancel on Fri Jul 27, 2001 at 04:04:29 AM EST

I access K5 only during the day and have never had any problems. K5 is as fast as ever.

On the other hand, I live in a different time zone. *G*

[ Parent ]

Re: Just out of curiousity (4.00 / 1) (#38)
by depok on Fri Jul 27, 2001 at 04:29:45 AM EST

I live in a different timezone (europe). The site is fast in the morning (9am-1pm GMT), painfully slow in the afternoon (hard to get even to the homepage).

Solution? I know it's slow in the afternoon, so i won't bother visiting Kuro5hin (and thus even more overload the server). I come back at night (1 am - 2 am GMT) and read the rest of the posts/comments. In that way I at least think that I helped the others. Spread the visits, no 3 hour long sessions.

I also haven't used the diary yet, because i know it will be slowing down the rest.

Patience (Rome was neither built in one day).


death has a thousand faces, they all look familiar to me
[ Parent ]

d) (none / 0) (#66)
by Wolfkin on Tue Aug 07, 2001 at 08:15:38 PM EST

I work at home, and hardly leave the house most weeks. I check K5 whenever I happen to think of it, and the only common factor is that it is rarely between 12am and 7am AST.


[ Parent ]
+1 FP (none / 0) (#39)
by kyrbe on Fri Jul 27, 2001 at 06:57:47 AM EST

I voted +1 FP because I feel the issue needs addressing, but making "random" suggestions isn't going to help.

Firstly I feel that the problem needs to be identified. Where is the bottleneck? Bandwidth, server, Scoop, database...? Only once this has been identified can appropriate solutions be discussed. But there's no point even discussing it if the pending changes will clear it.

I'll wait and see where the problem is identified before I even consider suggesting a potential solution.

Equal Rights, Representation, Education and Welfare
Assuming... (none / 0) (#46)
by dice on Fri Jul 27, 2001 at 01:01:51 PM EST

Assuming one can order the set of suggestions, an algorithm containing a random componenent may be the way to go.

Consider, any good suggestions will not be uniformly distributed over the set of suggestions.

This makes straight linear searching inefficient.

Giving a randomized algorithm, as iterations approach infinity, the success rate will mirror the ratio of good suggestions to bad ones.

A case for random suggestions, indeed.

[ Parent ]
The bottleneck... (5.00 / 1) (#51)
by DJBongHit on Fri Jul 27, 2001 at 03:35:22 PM EST

... is the server, since httpd and mysqld are running on the same box.

Anyway, when I started reading this article, I thought it was just going to suggest that we ban anonymous users altogether, but theboz has an interesting idea - since anonymous users can't post anyway, why should they be reading the site through dynamic scripts anyway? Make a cronjob (run it every hour or so) which does a bit of perl-fu to recreate a static version of kuro5hin, and serve this up to anonymous users instead (the first run would be horribly slow, but after that it'll only have to update the stories which have changed).

One thing I absolutely do NOT want to see in Scoop is Slash-style caching for everybody - I hate not being able to see the exact commentcount of each story, the exact results of the current poll, etc, while on the front page.


GNU GPL: Free as in herpes.

[ Parent ]
does it work properly? (none / 0) (#40)
by jesterzog on Fri Jul 27, 2001 at 08:14:58 AM EST

As I am writing there there are 164 anonymous heros accessing the site and it is nearly making the entire site unaccessable.

The strange thing is that even though I'm definitely logged in, my username hasn't shown up on that list since the first couple of days that it was implemented.

Has it been altered not to show people their own username, or am I being listed as one of those anonymous hero's?

jesterzog Fight the light

How to increase load (5.00 / 2) (#47)
by skullY on Fri Jul 27, 2001 at 01:53:33 PM EST

Step 1. Take the largest group of users who use the least ammount of resources per user and restrict them to getting day old articles.

Step 2. Watch performance marginally increase for a few days.

Step 3. Watch performance reach an all time low as the group of anonymous heros figure out they can simply signup for accounts.

The anonymous heros aren't the problem here. In fact, more people should be browsing as anonymous heros to help reduce the load. An anonymous hero has fewer SQL selects per page, and few (if any) writes. Whereas every time a registered user views a comments page, it has to determine which comments they've read, write to the database which new comments are now being displayed to them, generate all the moderation boxes, etc, etc. It's a nice thought for how to fix the problem, but it's at best a knee-jerk reaction that will end up causing more harm than good.

FWIW, while at work, I'm one of those Anonymous Heros because I just don't have the time to post, so there's no need adding extra load to an already strained system.

I'm not witty enough for a sig.
[OT] Scoop and viewing comments (4.00 / 1) (#52)
by DJBongHit on Fri Jul 27, 2001 at 03:41:33 PM EST

Whereas every time a registered user views a comments page, it has to determine which comments they've read, write to the database which new comments are now being displayed to them, generate all the moderation boxes, etc, etc.

That's not really the problem. Checking which comments are new is simply a single-row SELECT which holds the commentcount of the story last time you viewed it. Then the comment generation code simply checks (if $cid > $old_num_of_comments) {print_the_red_new_thing();}. Then it just updates the row with the new comment count. The problem is simply the sheer number of users who are accessing a single, very overloaded server (httpd and mysqld are on the same box).


GNU GPL: Free as in herpes.

[ Parent ]
Page Caching (5.00 / 1) (#55)
by panner on Fri Jul 27, 2001 at 05:29:19 PM EST

I'm surprised no one has mentioned the (probably) simplest way to cache pages: let the browser do it. And, along with that, let a proxy do it. Really, it's not to much work, and could be done basically transparently to Scoop by putting the necessary code in the mod_perl handler. It would work in three parts.

First, there's the check after Scoop has generated a page, which will check the current UID, and if -1 (Anonymous Hero), it will add a few headers indicating that this page can be cached for so long (say, 5 minutes). This way, browsers and proxies will determine it safe to cache, and therefore will.

Next, at the beginning of the handler, somewhere before the ops are handled, check for an If-Modified-Since header, and if so, check to see if it has been. This is accomplished by a simple SELECT to see if there are any new stories since the specified time (there may be another select to see if there are any new comments). If it has been modified, go ahead and generate a page. Otherwise, return 304 Not Modified.

Finally, in order to reduce hits even more, either configure the current light-weight Apache to cache pages with the appropriate headers, or stick a squid HTTP accelerator in front of it (if the latter, then the light-weight will probably be removed). In any case, this results in Anonymous Heros getting cached data without any major code modifications, delay in content, or goat sacrifices.

Keith Smiley
Get it right, for God's sake. Pigs can work out how to use a joystick, and people still can't do this!
Kill the Anonymous Hero | 66 comments (61 topical, 5 editorial, 0 hidden)
Display: Sort:


All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!