Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
'Who Are You' articles search engine

By Carnage4Life in News
Fri Nov 24, 2000 at 05:55:27 PM EST
Tags: Kuro5hin.org (all tags)
Kuro5hin.org

After constantly being vexed by having to search the 'Who Are You' and 'Who Are You, part 2' articles every time I wondered if I could find out more about a particular K5 user, I decided to create a search engine that could locate a particular user's post in either thread or tell me whether that user did or did not post to either thread.

The search engine is available at my domain here. It is case sensitive so searching for 'rusty' or 'Insohiro' works while 'Rusty' and 'inoshiro' do not.


Intended Uses:

  1. Some people have requested for an easy way to link to their 'Who Are You' post and this is a way to provide that. A URL such as

    http://www.25hoursaday.com:8080/servlet/K5users?K5_user_name=Driph

    can be placed in a .sig or used as a signature in email and is easier to remember than

    http://www.kuro5hin.org/?op=comments&sid=2000/3/2/214954/2181&cid=14#14 (for example)


  2. It's a lot easier than retyping all that info in a (User Info) page.


  3. It's an easy way to find out more about a K5 user (if he/she wants you to) especially if you were intrigued by specific posts and wanted to learn more about that person.

Technical Info (gory programmer details)


How I obtained the data for the pages [Phase 1]

  1. I looked at the structure of a K5 article and wrote a DTD for it

  2. I obtained a perl script called wget.pl from DJBongHit which retrieves the HTML returned by a given URL.


  3. I then looked at the HTML generated by a K5 article and wrote k5_reader.pl which is a Perl script that can transform any K5 article into an XML file either via a specified URL (using wget.pl behind the scenes) or given a local file. I foresee distinct possibilities in creating smarter scripts that can retrieve a page and display it on my desktop or change the formating so it is readable by my Palm.

    Disclaimers: k5_reader.pl can generate an XML file with nested nodes if given a K5 page in nested mode, unfortunately I can't wget.pl to retrieve in anything but threaded mode. Also XML has problems handling Latin-ISO I characters so they are all stripped out, sorry. :)
    Finally, this is the third or fourth Perl script I have ever written, so I am not experienced enough to make it as efficient as possible but considering that it only runs for a few seconds, that doesn't seem to be a major concern.



How I stored the data from the pages [Phase 2]
  1. I created this table in my database of choice, actually in three databases (the reason for this will be made clear).

  2. I then downloaded Jim Clark's excellent XML parser.

  3. I then wrote K5DatabaseLoader.java and attempted to use it on the table I created in Microsoft's SQL Server 2000 beta 2.

    This was a disaster, the table data kept running together and all the entries into the DB were garbled.

  4. Thinking it was a problem with SQL Server being unable to handle long table entries (i.e. the comments) I tried the same with IBM's DB2 7.1 with similar results.

  5. Undaunted and slightly less than sober, I attempted to use it on Oracle 8i database but hosed my previous version by tweaking too many config files and could not get the most recent version to install on my machine (what a waste of an hour and a half waiting for the 584 MB download file).

  6. Deciding to use a little thought, I realized that the only thing in common with both attempts at using SQL server and DB2 was the fact that I was using Sun's JDBC-ODBC bridge. With the knowledge I have from a current project I know that ODBC is a shitty C API which is crusty, poorly designed and error prone. I then surmissed that the long K5 posts were overflowing the underlying ODBC buffers and decided to try a different tack.

  7. So I went to IBM's DB2 Java programming site and found out how to use IBM's native Java drivers for DB2 and then K5DatabaseLoader worked great once the changes were made.


How I display the data on the pages [Phase 3]
  1. I downloaded Caucho's Resin , an Open Source Java servlet engine, and set it up on my box.

  2. I wrote K5users.java and asked people on #kuro5hin to test it and give me feedback.

Miscellaneuos Questions and Answers

Q: How can I get added to the DB?

A: Post a top level comment (not a reply to a post) to either thread and your post will be added the next time the database is updated by me (once a week or so).

Q: How can I get removed from the DB?

A: If your post is no longer on kuro5hin(a public forum) the next time I update the DB, then it won't be stored in the DB from then on.

Q: Will it be a strain on your machine if a lot of K5 users use your search engine?

A: DB2 isn't as resource intensive as Oracle, but it still does slow my machine down when lots of people are accessing DBs at once, so if anyone wants to replicate what I have done thus taking the strain of my box they are more than welcome to do it.

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
Was this a good idea?
o Yes 50%
o No 14%
o Undecided 19%
o Inoshiro 14%

Votes: 61
Results | Other Polls

Related Links
o Kuro5hin
o Who Are You
o Who Are You, part 2
o here
o (User Info)
o DTD
o wget.pl
o DJBongHit
o k5_reader. pl
o this table
o Jim Clark's excellent XML parser
o K5Database Loader.java
o Microsoft' s SQL Server 2000 beta 2
o IBM's DB2 7.1
o Oracle 8i database
o Sun's JDBC-ODBC bridge
o current project
o how to use IBM's native Java drivers for DB2
o Caucho's Resin
o K5users.ja va
o Also by Carnage4Life


Display: Sort:
'Who Are You' articles search engine | 29 comments (29 topical, editorial, 0 hidden)
Hey...that is extremely neat. ^_^ (3.25 / 4) (#1)
by shirobara on Fri Nov 24, 2000 at 03:25:52 PM EST

Since the diaries have been up I've tried to go back a couple time and look at a specific person's "who am I" entry, but it always, always crashes my poor browser. This is very cool. ^_^

Miscellaneous (3.66 / 3) (#2)
by Pac on Fri Nov 24, 2000 at 03:34:20 PM EST

I liked it, specially because you took the time to explain how you did it. Some comments:

There are implementations of wget in (a very kosher) C also. It runs in Linux, Windows and various Unix systems.

"Thinking it was a problem with SQL Server being unable to handle long table entries"

Everything you heard before notwithstanding, not everything Microsoft releases is bloated junk. SQL Server is a fine piece of software that would have done the job rather easily. Your database is very small and simple, any of the RDBMSs listed would do. Free RDMDSs like MySQL and Postgress would also serve you well.

On using the bridge, I thought the warning bellow was in the README, but it is in fact in the very page you link

"The JDBC-ODBC Bridge driver is recommended only for experimental use or when no other alternative is available."

The bridge is very fragile, a proof-of-concept implementation that does not scale much beyond a very limited point.

In an unrelated point, should not this post be classified as Meta?

Evolution doesn't take prisoners


RE: Miscellaneuous (3.00 / 1) (#3)
by Carnage4Life on Fri Nov 24, 2000 at 03:57:33 PM EST

There are implementations of wget in (a very kosher) C also. It runs in Linux, Windows and various Unix systems.

I've never used it. I simply asked on #kuro5hin for something that could retrieve the HTML for a URL and DJBongHit gave me that script.

Everything you heard before notwithstanding, not everything Microsoft releases is bloated junk. SQL Server is a fine piece of software that would have done the job rather easily.

I am using a beta version of SQL Server so I expected bugs to crop up.

On using the bridge, I thought the warning bellow was in the README, but it is in fact in the very page you link

"The JDBC-ODBC Bridge driver is recommended only for experimental use or when no other alternative is available."

I actually never read that site, all the stuff on the JDBC-ODBC bridge I know is from code snippets on various websites that never mentioned any instability in the code. Also the fact that it now ships with the JDK implied to me that it was mature.

That said, the problem isn't Sun's Java code but the fact that the underlying ODBC functions (SQLExecute, SQLBindParameter, etc) can only deal with a strings with a set size which is some #defined constant value, if this value is exceeded then buffer overflows and all sorts of weirdness ensue.



[ Parent ]
Wget, SLQ Server etc (2.00 / 1) (#4)
by Pac on Fri Nov 24, 2000 at 04:12:38 PM EST

Wget is actually a very useful piece of software, the default utility for mirroring, an it is capable of a lot of nifty things. I don't know if the script you have is a full Perl implementation of the original application, so you may want to take a look at
GNU Wget
Wget for Windows

As for I am using a beta version of SQL Server so I expected bugs to crop up., alas, you are right, but I would be happier if the bugs that crop up would kindly do so with simple databases like yours, not with million-dollar-company-betting data crunchers as they are more likely to do. :)

As for the bridge, it IS stable, for what it does. It lets you experiment and learn JDBC easily.

Evolution doesn't take prisoners


[ Parent ]
wget.pl (none / 0) (#25)
by DJBongHit on Sat Nov 25, 2000 at 09:58:57 AM EST

I don't know if the script you have is a full Perl implementation of the original application, so you may want to take a look at GNU Wget Wget for Windows
Heh, it's not even close... he wanted a util to get the HTML of a web page, so I fired up vim and threw that together. Here is the actual script. It doesn't do anything more than parse the server and path out of the URL, connect to port 80, do a request for the page, and dump it to stdout (completely ignoring HTTP error messages in the server :)

~DJBongHit

--
GNU GPL: Free as in herpes.

[ Parent ]
*NOT* amused (3.58 / 17) (#5)
by sugarman on Fri Nov 24, 2000 at 04:19:11 PM EST

I'm voting this up, because I want to talk about it, but let me make it perfectly clear that I am not amused.

There are some expectations I have for k5. Slurping up my user info for outside use isn't one of them. I contributed to the DB in the spirit of the conversation, not so somone could archive and cross-reference the whole damn thing.

To clarify that. Yes, I am aware that anything on the web is likely to be archived somewhere. I've been posting to USEnet for cloase to a decade. That isn't the issue.

The issue isthat I fell that this is basically a change in the ToS wrt to k5. It has gone from a community to a tracking DB, by an outside agency, with *no* recourse. I as a user cannot delete my posts and remove them from this DB. The terms you provided: "if the post is not on k5 when it refreshes, it will be removed" is effectively worthless because the ability to remove a post is not an option to the k5user.

Quite simply, I'm pissed right off.

I'm not sure how rusty or Inoshiro feel about this. I'd be interested to hear their opinions.

Now, from the programming / hacking side, I will say, nice work, neat hack. Cool. But I am still pissed.
--sugarman--

This has nothing to do with K5 (4.00 / 7) (#6)
by Carnage4Life on Fri Nov 24, 2000 at 04:32:39 PM EST

The issue isthat I fell that this is basically a change in the ToS wrt to k5. It has gone from a community to a tracking DB, by an outside agency, with *no* recourse. I as a user cannot delete my posts and remove them from this DB. The terms you provided: "if the post is not on k5 when it refreshes, it will be removed" is effectively worthless because the ability to remove a post is not an option to the k5user.

How is this a change in K5's TOS? rusty said he'd never use the data for his own purposes, he hasn't.

You posted to a webpage, a webpage that was stored on my machine, the instant it was viewed by my browser. I simply wrote a program to automate the process of viewing stuff via my browser without access to K5's backend, how does this violate K5's terms of service?

The issue isthat I fell that this is basically a change in the ToS wrt to k5. It has gone from a community to a tracking DB, by an outside agency, with *no* recourse. I as a user cannot delete my posts and remove them from this DB. The terms you provided: "if the post is not on k5 when it refreshes, it will be removed" is effectively worthless because the ability to remove a post is not an option to the k5user.

I already have the webpage in my browser's cache and have also saved it to disk. Removing it from K5 doesn't change this, I only added that because I feel that if you give rusty a good enough reason to delete your post then it's good enough for me.

I'm not sure how rusty or Inoshiro feel about this. I'd be interested to hear their opinions.

I asked them about this before I wrote it and they correctly noted that I simply wrote a script to do something I can do by hand, nothing more. Your privacy is not being violated because it takes me less clicks to read your user info on my home machine vs. Kuro5hin any more than it is violated by the fact that the article is indexed on Google and thus accessible by anyone on the Internet.



[ Parent ]
How does this have nothing to do with k5? (2.66 / 3) (#8)
by sugarman on Fri Nov 24, 2000 at 05:19:46 PM EST

I understand that you can do this manually. There is nothing stopping me from doing that either.

The issue is accessibility and control. If this was posted by rusty, to the extent of "hey, Carnage4Life did a cool little search util that lets the k5 users check out these threads.", I probably wouldn't have objected as much (or at all). Like I said, I do think it's a neat util.

However, the analogy I'm thinking of ties in a little with that "Good Samaritan" thread that was on here last week. It's like getting a cold-call from a telemarketer during Thanksgiving dinner asking you how you like your Butterball turkey. Excuse me? How do you know that, where did you get that from, and please FOAD. (Not you, personally. I mean telemarketers).

As for that bit about accessibility. Sure it may be in a cache on your machine. And while you may be automating it for your own use, which is fine by rusty, apparently, you are in turn making that info publically accessible, and seperate from original source, which is what has me peeved.

Anyhoo, there's a better analogy floating in the back of my head here trying to get out. Later.
--sugarman--
[ Parent ]

Contradicting Yourself (3.66 / 3) (#9)
by Carnage4Life on Fri Nov 24, 2000 at 05:29:42 PM EST

The issue is accessibility and control. If this was posted by rusty, to the extent of "hey, Carnage4Life did a cool little search util that lets the k5 users check out these threads.", I probably wouldn't have objected as much (or at all). Like I said, I do think it's a neat util.

You are contradicting yourself, you previous post implies that you think that if rusty did this then it would be a violation of trust since he claimed that no such thing would be done by him (and I agree)

Your words
The issue isthat I fell that this is basically a change in the ToS wrt to k5. It has gone from a community to a tracking DB, by an outside agency, with *no* recourse.

I'm not sure what your point is unless you are trying to say that only rusty can do anything with the data posted to K5 which is a mistaken assumption, I am not the only one who has written a script to grab K5 articles as rusty has pointed out, I am merely the first to tell everyone I am doing it.



[ Parent ]
Went back, checked the ToS (3.00 / 4) (#12)
by sugarman on Fri Nov 24, 2000 at 06:54:21 PM EST

you previous post implies that you think that if rusty did this then it would be a violation of trust since he claimed that no such thing would be done by him (and I agree)

I see the distinction, and the following explains my position: If rusty had gone and presented what you had implemented, as a part of k5, or moreover, as an added feature to Scoop, then my perception of the addition would be different.

...and on the subject of perceptions, and my apparent contradictiontory statements:

I went back and checked the ToS, and yes, you're correct in that it says little beyond "Rusty won't do anything with the info". I think what is clashing here is what my (not speaking for anyone else here) perceptions of what those ToS implied. A large part of that is due to the very nature of the community, which I've been a part of for a while, and how I relate to that community. (This may hold true for you other k5'ers out there as well).

To me this was like a local pub: maybe not the best and the busiest, but a place I could kick back. Then the bartender makes some kind of funky shooter that no-one else has, and all of a sudden you have the glitterati coming in, a velvet rope outside, and Aerosmith is playing the joint. =)

Anyhoo, I hope you get my point. I know you're not technically doing anything wrong, or violating the ToS. That doesn't mean I have to like what you are doing, even though I can admire how it was done. m'kay?
--sugarman--
[ Parent ]

Perceptions (3.00 / 2) (#13)
by Aquarius on Fri Nov 24, 2000 at 07:48:16 PM EST

This post definitely helped to explain your opinion. However, I disagree with it. Now, bear in mind that this isn't intended to be a flame; if it comes across that way, I apologise. However...

This sounds rather, umm, what's the word? Not childish, per se, but something like "I'm going home and I'm taking my ball with me". You're very clear about the fact that the concept of an external referencing of a K5 thread clashes with your ideal of what K5 should be, rather than "what K5 should be"; your opinion, rather than your prescription. This is just to say that I disagree with your opinion. :)
As other people have said, K5 is publically accessible. If I wanted to know about someone, I could quite happily go and check the threads myself. As far as I can see, this is just a convenient shortcut to save me trawling the threads, and that's a good thing. I don't know what C4L plans to do with the database; if he mentioned and offered it outside K5, then I might be more inclined to your point of view, because then it would cease to be a handy resource for K5 users (as it currently is, I believe) and start becoming a way for non-K5ers to "check up" on K5 people. However, since, afaik, it's only been mentioned within the confines of K5 itself, it retains its status as a convenient shortcut, and therefore I can't see a problem with it. I'd be interested in seeing further development of the ideas you explain above...

Aq.

"The grand plan that is Aquarius proceeds apace" -- Ronin, Frank Miller
[ Parent ]
Then again, I might be wrong (3.00 / 1) (#23)
by Aquarius on Sat Nov 25, 2000 at 02:32:41 AM EST

After a night's sleep, I found that: I still disagreed with you. :-)

In fact, I disagreed with you until five minutes ago, when I saw Music of the Kuro5hin Community in the queue. Normally, I think that these "poll K5 users for their favourite X" threads are rather silly (and skim123 agrees :), but the thought occurs: what stops C4L, or someone else, throwing that thread into the database, too? Again, it's only automating something that you could do by hand, but over time, with a few of these polls, you could build up quite a picture of someone. This kind of data collation I disagree with. For instance, here in the UK, there are various databases owned by the government (the NHS medical records, the DVLA drivers' licence DB, the electoral roll, ad nauseam), and there are specific prohibitions on linking them all together into the One Big Master Database That Knows All About You[1]. That sort of thing would also be possible here; from there, it goes from being a handy concordance for one thread to being a DB of K5 users' likes and interests, and that's not an ideal thing at all to have out there on the web.

Aq.

[1] We might call this, for example, Big Brother. :)

"The grand plan that is Aquarius proceeds apace" -- Ronin, Frank Miller
[ Parent ]
Get over it. (3.42 / 7) (#7)
by cactus on Fri Nov 24, 2000 at 04:34:32 PM EST

Sorry, this might sound harsh, but you've got to get over it. *Assume* anything you post anywhere public *will* be stamped, numbered, tracked, correlated and monitored. Make your peace with that and move on.

This is really no different from the debate over "deep linking." Once the information is publicly available on the net, it's fair game.


--
"Politics are the entertainment branch of Industry"
-- Frank Zappa
[ Parent ]
Miss something? (2.14 / 7) (#10)
by sugarman on Fri Nov 24, 2000 at 05:32:37 PM EST

To clarify that. Yes, I am aware that anything on the web is likely to be archived somewhere. I've been posting to USEnet for close to a decade. That isn't the issue.

But I do have an issue with it. Whatsmore, I'm within my rights to have an issue with it, and I'd like to discuss that issue, and giving the subject, now's as good a time as any. Just because something is assumed to be fait accompli (death, taxes) doesn't mean we can't talk about it.
--sugarman--
[ Parent ]

Setting precedents (3.00 / 1) (#17)
by jesterzog on Fri Nov 24, 2000 at 10:45:35 PM EST

I can see where you're coming from. Specifically I don't mind in this instance because I think it's a really useful idea and the information isn't being abused in this instance. (At least not intentionally and directly.)

I'd hate for it to set a precedent though. The last thing I want is for it to be morally and ethically okay for a marketing company to sift through all the weblogs building profiles of everyone based on keywords in their postings, for example. Other sites might have completely different (or no) terms of service, so as soon as they're hosting the information it could legally end up anywhere no matter what is said on k5.

The only terms of service I can find relating to this are in the faq, basically stating that k5 won't abuse your information. Does this need to be updated to say that other people who use kuro5hin can't abuse it either?

This introduces problems in itself. People who haven't signed in and haven't necessarily agreed to anything still have access to all the information. The only technical way around this that I can think of would be to suppress user information to people who aren't logged in. They wouldn't be able to see who posted a message, or look up information about other users. I wouldn't have a problem with this since IMHO you can get enough of an introductory idea without knowing who posted what, anyway.


jesterzog Fight the light


[ Parent ]
How Long Have You Been On The 'Net? (2.75 / 4) (#26)
by Carnage4Life on Sat Nov 25, 2000 at 10:17:41 AM EST

This introduces problems in itself. People who haven't signed in and haven't necessarily agreed to anything still have access to all the information. The only technical way around this that I can think of would be to suppress user information to people who aren't logged in. They wouldn't be able to see who posted a message, or look up information about other users. I wouldn't have a problem with this since IMHO you can get enough of an introductory idea without knowing who posted what, anyway.

All this talk of precedents is slighly amusing. Bots have been indexing weblogs for years. Why do you think most people spam proof their email addresses (e.g. kpako@DONTSPAMME.yahoo.com) on K5 and Slashdot? It isn't so that people can't email them but so that bots that harvest email addresses of people that post to tech sites are given useless information.

Everyday my site is hit by 3 to 5 bots, yet the only places my URL is displayed are slashdot and kuro5hin, meaning that these sites are indexed several times a day by bots.

My advice is to either get over it or stop posting to the 'web at all. When the web was still new, most people knew that putting anything on the web or even sending it in email was akin to putting it on a bulletin board due to the inherrent lack of a means to shield information from prying eyes. Now some people somehow feel that information published in a forum that is accessible to people all over the world and travels through several computers before reaching it's destination is sacred and has restricted access , this is so far from the truth it's almost amusing if not for the gross mistake in that assumption.



[ Parent ]
You're missing the point (3.50 / 2) (#27)
by jesterzog on Sat Nov 25, 2000 at 10:50:49 PM EST

I've been around long enough to know how incredibly easy it is to collate several databases (in whatever form) and build a profile of someone very quickly using information from several sources that they never thought or intended would be used together. I've done it several times against my better judgement.

Like I said I don't really care much about your thing specifically because it's genuinely useful, and (IMHO) it's not misusing the information in itself. I know bots go through everything and I know what it's possible for them to do, but that doesn't mean people shouldn't go paranoid about companies building up some pretty chunky profiles of people based on information that was never intended to be used for that. No matter how hard you try to avoid it, it'll be possible for someone to build up a profile of you sooner or later. There's simply too much information available, and the net makes it very easy to find between completely different sources.

Sure the information's there and it can be used for something like marketing by a third party who happens to "acquire" it. Similarly when people leave their window open, anyone can step through it. At least one of these is an obvious violation of someone's rights, and IMHO the other one is too. The same rights can apply to a lot of information re-use when the information is obviously being used against the intentions that it was provided under. Obviously this can't be stopped, but (my point being) it'd make things a lot easier though if organisations like weblogs could put a few measures in place when possible to help prevent it from happening through them.


jesterzog Fight the light


[ Parent ]
A useful explanation, and database issues (3.75 / 4) (#11)
by Aquarius on Fri Nov 24, 2000 at 06:44:32 PM EST

Definitely vote this up, because of the detailed explanation. That sort of thing is always useful, since providing access to data via a browser is what I do for a living. :-)

I might take issue with your choice of DB, though. Now, don't get me wrong. I like DB2. I spent two years coding for it on IBM big iron -- the fact that I was using COBOL was bad, but DB2 is really rather good. However, I'd have said that it was rather severe overkill for a task this small? I'd have been in two minds as to whether to have a DB at all backing up this task; I might have just done it in flatfiles, since we're not talking about a lot of data. In fact, I might even have generated static HTML in a cron job, as jwz did with gronk, the MP3 jukebox. If I were using a DB, I might have been inclined to go for MySQL. Don't get me wrong, this is not an open-source-is-always-better rant. As I said, I like DB2. I just think that it's vastly overpowered for this task. What made you choose it, exactly?

Nice article.

Aq.

"The grand plan that is Aquarius proceeds apace" -- Ronin, Frank Miller
Reasons (none / 0) (#16)
by Carnage4Life on Fri Nov 24, 2000 at 10:32:24 PM EST

However, I'd have said that it was rather severe overkill for a task this small? I'd have been in two minds as to whether to have a DB at all backing up this task; I might have just done it in flatfiles, since we're not talking about a lot of data.

Windows acts very flaky when two programs want to access the same file. I wanted to make sure I avoided this problem (which may be non-existent when doing servlet programming) by using a system that supports multiple accesses to a file such as an RDBMS, preferrably one with transactions since I may be updating the DB while it is being viewed by users.

If I were using a DB, I might have been inclined to go for MySQL.

MySQL is dead to me for two reasons. In my opinion mySQL is not a Relational DataBase Management System because it doesn't enforce foreign keys. Also I don't think it can really call itself a professional DataBase Management System without support for transactions. Finally, the fact that they charge for the Windows version when Oracle and IBM give theirs away for free to developers is frankly ridiculous.



[ Parent ]
Corrections, question (none / 0) (#18)
by chrisbolt on Sat Nov 25, 2000 at 12:13:06 AM EST

MySQL for windows has been free ever since MySQL has gone GPL, a few months ago. Second, the latest version of MySQL (3.23.28-gamma) has support for transactions with Berkeley DB.

Also, just wondering, why do you need foreign keys? I've only heard about them from your post and the MySQL documentation, and the MySQL documentation makes it look like they aren't a requirement for most things (though that may be a bit biased considering the source). And are transactions really required for something such as this? You don't need transactions to be able to update the DB while visitors are viewing it. The only time you would need transactions is if you would need to roll back multiple queries if any single one fails.

---
<panner> When making backups, take a lesson from rusty: it doesn't matter if you make them, only that you _think_ you made them.
[ Parent ]
Answers to your questions... (3.50 / 2) (#20)
by Carnage4Life on Sat Nov 25, 2000 at 02:06:42 AM EST

MySQL for windows has been free ever since MySQL has gone GPL, a few months ago.

The last time I looked at mySQL was a few months ago, I guess I should have looked at it again before assuming what the price was.

Also, just wondering, why do you need foreign keys?

In this application, foreign keys are unnecessary, but foreign keys are an important aspect of relational database theory and practice because they are a way to check the validity of data in the DB. Here is an example from a project I am currently working on
    I am currently creating an online weblog for a small business as part of my senior project. Part of the requirements are that the weblog should have different sections with each section having it's own list of members with access only to their specific section. With foreign keys I can, on creation of the table for messages, specify that the poster of a message can must be a member of that section simply by making the poster_id and group_id fields of the message foreign keys with the user_id and group_id fields in the table with the list of members for each section. Without foreign keys, checks like this have to be coded into my application logic and would involve performing SELECT queries before doing each iINSERT which most programmers are loathe to do.
And are transactions really required for something such as this? You don't need transactions to be able to update the DB while visitors are viewing it.

You have taken my comment out of context. I was referring to prefering an RDBMS to a flat file mechanism due to the fact that a.) Windows is notoriously bad about having multiple applications open a file at the same time, on the other hand most RDBMS's have their own threading model that keeps Windows happy with regard to multiple applications trying to access a file, and b.) Update operations are ACID. For a definition of ACIDity, read that post I made a while ago to Slashdot. Considering that updating the DB involves storing several posts at a time, it is an operation that needs to be rolled back if anything goes wrong.



[ Parent ]
Databases and database issues (none / 0) (#21)
by Aquarius on Sat Nov 25, 2000 at 02:12:48 AM EST

Windows acts very flaky when two programs want to access the same file. I wanted to make sure I avoided this problem (which may be non-existent when doing servlet programming) by using a system that supports multiple accesses to a file such as an RDBMS,
Yep, OK, I can see that. In theory it shouldn't be a problem, but if it'll destabilise the OS out from under you and an RDBMS will solve that then hell, go for it. :-)
preferrably one with transactions since I may be updating the DB while it is being viewed by users.
er! This is not a transactions problem, it's a locks problem. Any database should handle this without a problem; it's a fair part of what databases are for, IMHO. OK, so if you have transactions then you can swaddle everything you do in a transaction and roll it back based on a problem (like deadlocking, which is pretty rare, really), but transactions will also save you if cosmic rays break the process or Don Knuth comes round your house and write-protects the drive while you're not looking, and I wouldn't perceive those as big worries, certainly not big enough to step away from something like MySQL to DB2 just to prevent them.
MySQL is dead to me for two reasons. In my opinion mySQL is not a Relational DataBase Management System because it doesn't enforce foreign keys.
My issue here is not that you're wrong, because I pretty much agree with you. It's that, unless the underlying tables for this app are a whole load more complicated than the way I'd have designed them, you don't need all this cool functionality. Now, it's entirely possible that you plan to develop other and more complex database-backended systems in the future, for which you will need all this cool stuff of which you speak; my point was that you presumably haven't done any up to now (because you went out and chose a DB when you wrote this! :) and, in the interim, you're using a very powerful steam-hammer (DB2) to crack a rather small walnut. :-)

Anyway, it's your box :-) I was just puzzled as to why you felt you needed something as powerful as DB2 for this when there are simpler solutions out there, and you've explained. This seems to come across as a "use MySQL for everything!" rant, and that wasn't my intention at all...

Aq.

"The grand plan that is Aquarius proceeds apace" -- Ronin, Frank Miller
[ Parent ]
This isn't the only DB app on my box (3.50 / 2) (#22)
by Carnage4Life on Sat Nov 25, 2000 at 02:22:40 AM EST

' unless the underlying tables for this app are a whole load more complicated than the way I'd have designed them, you don't need all this cool functionality. Now, it's entirely possible that you plan to develop other and more complex database-backended systems in the future, for which you will need all this cool stuff of which you speak; my point was that you presumably haven't done any up to now

  1. There is only one table used for this app and it is linked from my write up. Here it is in case you didn't click the links.

  2. I also believe I mentioned that I am currently doing other DB development in my write up. Anyway, the back end for my research project is on my machine, a brief overview might be obtained by reading the javadocs . Secondly the back end for my senior project is also on my machine.




[ Parent ]
Other development (3.00 / 2) (#24)
by Aquarius on Sat Nov 25, 2000 at 03:16:12 AM EST

Ah, I must have missed the "other development" bit of the write-up. I made an assumption that, since you went out and got a DB (rather than using the one you were already using, if you see what I mean) that you weren't doing any other DB based development. My fault, sorry about that.
Besides, it's nice to see someone running DB2 on something other than big iron :-)

Aq.

"The grand plan that is Aquarius proceeds apace" -- Ronin, Frank Miller
[ Parent ]
User Info pages (3.25 / 4) (#14)
by Delirium on Fri Nov 24, 2000 at 09:36:44 PM EST

2. It's a lot easier than retyping all that info in a (User Info) page.

What's wrong with cutting and pasting that info into a User Info page? I'd much rather have the info in a standard format like that than in some external search of comments. Cutting and pasting a paragraph is really not very difficult.

Restricted length? (4.00 / 1) (#15)
by jesterzog on Fri Nov 24, 2000 at 10:19:16 PM EST

I don't know if it's a bug or not, but whenever I try to type anything into the user bio page, it gets cut off after about three or four lines. (I don't know how many characters this is, but it's acting like it's a fixed length field.)

Is it just me or does everyone have a fixed length restriction on their user bio? I don't have the same problem with other fields like my public key or entering comments.

I hadn't got around to mentioning it because I'd forgotten about it between being a new user and now. If I hadn't had that problem though, there would be a description in the user bio part of my info page.


jesterzog Fight the light


[ Parent ]
I can vouch for that (none / 0) (#28)
by Denor on Mon Nov 27, 2000 at 10:39:03 AM EST

I didn't test exactly the number of characters, but yes, I ran into the same problem when trying to put my entry in. I ended up settling for a link.

Of course, you could always put it in the block reserved for 'public key'.


-Denor


[ Parent ]
Same with me too (none / 0) (#29)
by a humble lich on Wed Nov 29, 2000 at 10:27:54 PM EST

I was able to get about 2.5 lines before it would cut me off.

[ Parent ]
K5 feature request. (3.00 / 2) (#19)
by Nick Ives on Sat Nov 25, 2000 at 01:45:02 AM EST

User bio's are the obvious place for 'who am I?' type information, but its still nice to have it all in a discussion forum so you can see when someone new arrives and contributes to it. This poses a range of problems

How do you automate the entry of user bio's, so that every new users gets presented with the oppertunity to write a nice long meaningful biography, and not have it hidden away in their user info? I personally think the best thing to do is to present the user with a link to the "Who are You?" articles immediatly after they create an account and maybe add them to their hotlist by default. We could instead attach a discussion forum to each users bio, but I think it adds to the community feel to have everyone grouped together in the 'who are you' threads.

Either way, the mere existence of this search engine shows a missing feature in K5 (which may require an additional feature in scoop, depending on how its done), and I think just for general niceness & completness something like this should be part of the actual site.

Which reminds me, I should really get round to posting to one of those things and explaining myself. The only thing is that I tend disagree with anything I say within 5 minutes of saying it, which I can deal with for most things but when im explaining who I am and putting it in a place where people will read it and actually take note, well, its all a bit much. The entire notion of taking who you are and trying to put your sum total into a short (or bloody long, depending on how long you can sit at your k/b and type into this comment box =P) comment is just, well, something.....

'Who Are You' articles search engine | 29 comments (29 topical, 0 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!