Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Why is k5 so slow?

By hurstdog in Site News
Fri Sep 21, 2001 at 07:32:00 PM EST
Tags: Kuro5hin.org (all tags)
Kuro5hin.org

Well, it appears that scoop has a small bug that is putting some trash data into a table in the database. Its not the network, and its not scoop, its the size of the database. Its also the tuning on it, since as far as I know its a default mysql install :-) But, I'm working on getting good at mysql tuning on my box at home, and soon we should be able to set it up a bit better. For a small thread on why its slow, check out this comment by rusty. Also, this is the first post to the K5 Announcements section, so don't complain about it being a rehash of rusty's comment, this is mostly a test ;-)


Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Related Links
o Scoop
o this comment by rusty
o Also by hurstdog


Display: Sort:
Why is k5 so slow? | 10 comments (10 topical, editorial, 0 hidden)
fp (none / 0) (#1)
by qslack on Fri Sep 21, 2001 at 07:39:23 PM EST

fp

What's up with bubba? Are you planning to use the PgSQL Scoop port (that exists, right?)? How can we help speed this repair process to get k5 working normally again?

PostgreSQL scoop... (none / 0) (#2)
by hurstdog on Fri Sep 21, 2001 at 08:05:12 PM EST

It exists sort of. More than a few people have ported scoop to pgsql, but we haven't received any code yet :-/ We have been emailed a scoop.sql file though, ported to postgresql. At linuxworld I got a chance to talk to Krow, of slashcode fame. It was his opinion that the bottleneck wasn't due to mysql per-say, but mostly due to mysql configuration. I talked with him a bit about tuning mysql, and general ways to speed up scoop. It seems like our looping over $sth->fetchrow_hashref keeps database handles open, which will come into play later (if it hasn't already) as we get more and more hits.

So one of the things I'll be looking for in scoop as I start hacking again will be how long we keep db handles open, and maybe converting those to $sth->fetchall_arrayref, and $sth->finish()'ing them immediatly. Also tuning mysql. I'm going to play with that a bit as well...



[ Parent ]
Do you have measurements? (5.00 / 5) (#3)
by tmoertel on Sat Sep 22, 2001 at 02:11:41 AM EST

I don't want to sound like anybody's mom, but have you taken measurements? Blind tuning is often a fast road to unnecessary work and frustration. I've served as the lead performance engineer on a lot of large-scale systems, and I've learned the lesson the hard way: First figure out where your code is spending its time, then you'll know where you ought to be spending yours.

If you do have measurements, post 'em. I'll be happy to provide any advice I can. If you don't have measurements, try to get them:

  1. Instrument your code to capture performance metrics.
  2. If you can correlate the captured metrics with reasource usage (CPU, disk, network, and memory activity), all the better.
  3. Examine your log files to find out what kind of operations your users perform, how often they perform them, and in what orders and groupings.
  4. Determine the costs of individual operations.
  5. If anything seems out of whack, add it to a list of things to investigate.
  6. Weight each cost by the frequency of use.
  7. Start investigatng those operations whose weighted costs are highest. Fix any problems you find.
  8. If performance is still bad, start looking for concurrency problems, i.e., look for operations that are speedy alone but unexpectedly slow when performed together with other operations or other instances of themselves. I/O and lock contention are common culprits here. Unfortunately, you'll need to generate realistic, concurrent loads to get good measurements. That requires a stretch of multi-threaded custom coding or using commercial PE tools like Mercury Interactive's LoadRunner. If it comes to this point, you might want to eyeball the usual suspects before investing the time in a serious load test.
For what it's worth, that's the quick course.

Have fun!

--
My blog | LectroTest

[ Disagree? Reply. ]


thanks (none / 0) (#4)
by hurstdog on Sat Sep 22, 2001 at 03:40:34 AM EST

Thanks for the tips. Rusty has done more profiling than me, so he's a bit more tuned to where scoop spends most of its time. But from what I understand scoop spends about 80% of its time waiting on the database, so thats why tuning mysql made sense to me.

We've done a bit with adding indexes to a few tables, but short of a large redesign I don't think we're going to get a ton more out of what we've got. Rusty will probably stumble upon this article soon and post more info, and/or next time I see him online I'll get him to post his findings to this article.



[ Parent ]
by the way... (none / 0) (#5)
by hurstdog on Sat Sep 22, 2001 at 03:45:51 AM EST

Good job in finding a section that isn't linked anywhere (Not counting Everything of course ;). I didn't realize that it would show up in the 'Everything' list so this was sort of a test post to test out the auto-posting feature of this section. You'll notice that regular users don't get a choice to post to this section...

Ideally this will be a place where us admins can post info about whats happening with the server, whether it was down or not and why, etc. etc. without clogging the queue or forcing you to search through the diaries for our diaries.

So hopefully soon you'll see a link near the section bar to this section. (I don't deal with site look/feel. Thats Driph's territory ;)



[ Parent ]
I'd look into that... (none / 0) (#8)
by Biff Cool on Mon Sep 24, 2001 at 12:20:34 PM EST

You'll notice that regular users don't get a choice to post to this section...

Cause last I looked I'm not even trusted


My ass. It's code, with pictures of fish attached. Get over it. --trhurler


[ Parent ]
post stories... (none / 0) (#9)
by hurstdog on Mon Sep 24, 2001 at 02:47:37 PM EST

not post comments. Anyone should be able to comment on this section, but only editors+ should be able to post stories. If you get a choice to post to "K5 Announcements" in the submit story page, then let me know please :-)

The section permissions (as they stand now) in scoop allow admins to set what groups can post stories, post comments, read stories, or read comments in each section. It also allows admins to set whether stories from particular groups in particular sections get auto-posted to front page or to section. Anyone that is not a normal user, or anonymous, can auto-post stories to this section (which just leaves editors, admins, and superusers. 5 people).



[ Parent ]
Oh... Good (none / 0) (#10)
by Biff Cool on Mon Sep 24, 2001 at 02:58:55 PM EST

That makes alot more sense than not letting people post comments to an announcement.  I'm gonna go somewhere and learn about this whole "Reading Comprehension" thing now.

My ass. It's code, with pictures of fish attached. Get over it. --trhurler


[ Parent ]
sound bites make it sound so easy :( (4.00 / 1) (#6)
by core10k on Sat Sep 22, 2001 at 05:26:14 PM EST

But you've really downplayed #8. How do you test for concurrent performance penalties without modifying the timing of those concurrencies?

/me, wishing all optimization was as simple as fine-tuning an oft-used inner loop.



[ Parent ]
It isn't easy, but it is easier than w/o measuring (none / 0) (#7)
by tmoertel on Sat Sep 22, 2001 at 07:23:50 PM EST

How do you test for concurrent performance penalties without modifying the timing of those concurrencies?
You drive the loads from remote hosts and take the measurements from those points. Typically you measure HTTP response time: You send an HTTP request and start the stopwatch. When you get the HTTP response, record the elapsed time. Distribute thousands of such measurements over hundreds of virtual users, each an independent thread issuing requests w.r.t. to a usage profile you have developed from analyzing the logs (i.e., generate a real-world load that you can "dial up").[1] Repeat any number of load tests (typically 30 minutes to several hours each), increasing the load in between, and statistically analyze the resulting measurements.

Resource-consumption measurements (CPU, page faults, ...) and implementation-specific timings must be taken on the hosts under test, but they are usually negligible. Remember, you're not looking for rare race conditions as much as hideously mistuned or missing database indexes, disk contention, thrashing, resource bottlenecks, and so forth -- all fairly easy to spot under heavy load (if you're taking measurements.)

[1] Analyzing the logs, writing programs to drive the virtual users, and compiling volume data to feed the virtual users with the logins, IDs, and other inputs they need to appear as independent users crawling through your application -- this is nasty, hard work. But you must do it if you want realistic load tests, and that's why I put load testing last on the list. Hope you don't need to go that far to tune your app.

--
My blog | LectroTest

[ Disagree? Reply. ]


[ Parent ]
Why is k5 so slow? | 10 comments (10 topical, 0 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!