Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

A primer on distributed computing

By Bill Godfrey in Technology
Fri Aug 02, 2002 at 12:44:51 PM EST
Tags: Technology (all tags)

Say you've got a big computation task to perform. Perhaps you have found a way to cure cancer, or you want to look for aliens. All you need is a few super computers to work out some calculations, but you've only got the one PC on your desk. What to do?

A popular solution is the "distributed computing" model, where the task is split up into smaller chunks and performed by the many computers owned by the general public. This guide shows you how.

Sponsor: rusty
This space intentionally left blank
...because it's waiting for your ad. So why are you still reading this? Come on, get going. Read the story, and then get an ad. Alright stop it. I'm not going to say anything else. Now you're just being silly. STOP LOOKING AT ME! I'm done!
comments (24)
active | buy ad

Computers spend a lot of their time doing nothing. If you are reading this on your computer, that expensive CPU is most probably just sitting around waiting for you to press a key. What a waste!

With the recent popularity of the Internet, distributed computing has become popular. The technology behind distributed computing is old, where it is usually known as parallel computing. When people speak of parallel computing, it is usually in the context of a local group of computers, owned by the same person or organisation, with good links between the various nodes.

Distributed computing takes the same principle a step further, bringing in computers owned by the general public over the world. Popular projects include SETI, distributed.net, GIMPS and many others.

The key issue here is that you are using computing power that you don't own. These computers are owned and controlled by other people, who you would not necessarily trust. The world is populated by both angels and demons, and unless you know them personally, you can't tell them apart.

This primer attempts to show the basics of running a distributed computing project and avoiding the pitfalls.

Is my task suitable for distributed computing?

Distributed computing works by splitting up the larger task into smaller chunks which can be performed at the same time independently of each other. If your task cannot be split up in the way, alas, distribution is not for you.

For example, say your task is in the form of a linear sequence of repeated steps, where the result of the previous step is needed to perform the subsequent step. This task is not appropriate to distribution. Only one step can be performed at a time.

Most successful distributed projects are of the form of a search. Here, there is greater scope for splitting up work. Each worker can take a particular area and do their job on that area.

Additionally, if you want to keep the nature of your task secret, perhaps researching something for a patent filing, you will need to keep the general public out of it. The main strength of distributed computing is to harness the tons of idle time of computers in the world. Restrict yourself to people you know and trust, and the scope is greatly reduced.

There are ways to keep the details secret, but from a technical viewpoint, they are all doomed to failure. (Legally is another matter.)

How does distributed computing work?

The two main entities in distributed computing are the server and the many clients. A central computer, the server, will generate work packages, which are passed onto worker clients. The clients will perform the task, detailed in the work package data, and when it has finished, the completed work package will be passed back to the server.

This is the minimal distributed computing model. A popular model is the three-tier model, which splits the server task into two layers. One to generate work packages and another to act as a broker, communicating with clients. This is simply a means to manage load and protect against failure.

So the server knows that a potential client exists, we need another message in addition to work packages and completed work packages, initiated by the client. A work request simply says, "give me a work package".

For example

Consider distributed.net's RC5-64 project. The challenge is to unscramble an encrypted message. To do this each possible key, from zero to 264 is tried, one by one, until we find English text. In fact, we know that the hidden text begins "The secret word is", which makes it easy to detect once we have the correct key.

Before distributed computing

  1. Start key at zero.
  2. Decrypt message using key as the key.
  3. If decrypted message begins "The secret word is", report key as the answer.
  4. If not, add one to key and start again.

In this type of search, we could be lucky and find the answer at the beginning. We could just as equally be unlucky and find the answer at the end. With average luck, the answer would be in the middle. This process would take a single Pentium III 200,000 years to complete. Beyond most lifespans.

Instead, we find ourselves 200,000 computers to divide up the task, so it will take just one year to complete. The key range is split up into blocks of 228 keys, one block making up a work package.

After distributed computing - Server

  • Start;
    1. Start key at zero.
    2. Wait for a message.
  • Received work-package-request;
    1. Generate a work package, listing keys from key to (key + 228).
    2. Send new work package back to client.
    3. Increase key by 228.
    4. Wait for a message.
  • Received completed-work-package;
    1. Load possible key value from completed work package and attempt decryption by this key.
    2. If decrypted message is English, report key as the answer.
    3. If not, repeat until all listed keys are exhausted.
    4. When finished, continue waiting for a message.

After distributed computing - Client

  1. Send work-package-request to server and wait for response.
  2. Start key from the starting key in the work package
  3. Attempt decryption by key.
  4. If the first three characters of result are "The", then record key as a possibility.
  5. Increase key by one.
  6. Repeat until key has hit upper limit inside work package.
  7. Return completed work package to server.

This example assumes all clients are trustworthy. Later on in this document, we will discuss techniques to sort the angels from the demons. But first, an aside on the mechanics.

Talk to me

These messages need to be transported around. The Internet has become ubiquitous so we may as well use it. Email and http are common transportation mechanisms. The choice depends on the resources you have available.

If your computer is only on line (say) an hour a day, then emails could be used. The problem here is that it could be difficult for clients to automatically pick up emails. (Not everyone uses the same mail reading client). Also, people usually don't want to be inconvenienced with the hassle of emails clogging up their inbox.

Sending emails is less of a problem. If you client code can be written to talk SMTP (the mail sending protocol) then your client will only need configuring to the user's SMTP server.

If you have a web server permanently online, then you could set it up to distribute work packages. A client would make an http request to a URL, and the server would respond with a work package.

Other protocols are impractical. Even under the most restrictive firewalls in use, HTTP and emails usually get through.

If you can, the web server approach is usually better for the client. Having said that, if the rewards are high enough or emails infrequent enough, people may put up with the hassle of sorting emails.

How should a client work?

Distributed computing is sold on the idea that you can continue working and the client will do its thing in the background and the user will not notice anything different. The two usual ways to do are a task running at "idle" priority, or to write a screen saver.

About idle

An idle priority task will (if the OS organises it correctly) only run when the computer would otherwise have nothing to do. Idle priority tasks are designed to work in this time. If the user moves the mouse, the scheduler should make the client drop everything and handle what the user wants straight away. Modern operating systems handle this very well.

How to set up a task to work in idle time depends on system to system. Look it up.

Screen savers

A popular route for distributed computing clients is the screen saver. Easier to write (it says here) and the concept is more understandable to the non-techie public. People can understand that screen savers only run while they are at lunch and so won't interfere with normal working. (Rather than some deeply technical explanation of how schedulers work.)

Also, the screen saver gives you the ideal opportunity to show the user what's going on. The SETI project does this well by showing lots of nice graphics showing the computations at work.

So, which?

Modern schedulers do work rather well, and idle priority tasks will do their thing and the user need never realise anything is going on. Except in rather freaky circumstances, an idle priority client will get much much much more work done than a screen saver.

The screen saver client is a big concession to the non-techie audience, who fear that something running all the time will make things less reliable.

How do I generate work packages?

It depends on the job in question.

In the case of distributed.net, this is very simple. The key space from zero to 264 is split up into 228 sized blocks. When a client asks for some work, it is given a set of ten blocks. The process is so simple, that when clients run out of work, but cannot ask for new work, it takes it upon itself to randomly choose a block to calculate.

Why 228? 28 is a nice round number. A block of 232 keys would take too long and a block of 224 keys would be over too quickly.

Thanks to Moore's law, a modern computer will complete a work package twice as fast as a computer 18 months old. The amount of time taken by different clients will vary a lot. Flexibility is useful, but not required.

Attacks against distributed computing projects

There are three types of attack to consider. Parasites, spoilers and faulty clients.


People running distributed computing systems will often have some sort of reward for the people who help out. These are often simple pleasures as karma and fame. At present, people tend to help distributed projects because they want to help out. Most offer a chart of the top helpers as a motivation. In future, we may start seeing cash payments to helpers, persuading the public to give their spare time to this project rather than the other.

However small, there is often a reward for doing the work, and then there is the motivation for parasites to get the reward without having to do the work. To stop a parasite, you need to make sure they did the work correctly.

Spoiler attacks

As well as parasites, who want the reward, without doing the work, there are those people who want to stop your project from succeeding. These are commonly known as spoiler attacks.

When designing your security checks, consider that there may be someone whose only motivation is bragging rights for having broken your project. ("I'm a great haxor, worship me!")

Again, you need to make sure that when you receive your completed work package, it represents actual work done. In the case of "search" projects, so long as you have adequate parasite protection, a defence against spoilers is that it's statistically unlikely that the attacker will be lucky enough to be given the opportunity to hide a successful search.

Faulty clients

This isn't really an attack, but unintentional faults. Even if your client software is perfect, it has to run on hardware owned by the general public. Faulty hardware isn't unheard of. The "floating point co-processor" is a popular place to find hardware glitches. If your project depends on accurate calculation, you should build in checks to confirm that the client is operating correctly.

How do I stop parasites, spoilers and faulty clients?

When a client helps out with a distributed computing project, there is usually no contractual arrangement, and clients typically select themselves without their trustworthiness or reliability being checked.

Randomly repeat a job.

This is the simplest way. Routinely send the same job to a different client and check that they come back with the same answer.

The problem is that where you have a genuine and accurate client, it wastes time. Reduce the number of retries, and risk a parasite getting though unchecked. Increase checks and waste time.

There is the additional problem that you have no way of knowing if the client you give a checking job to isn't also a parasite. If a copy of a parasite client becomes widespread, then the risk of this happening increases. Having said that, this technique is simple, and easily implemented. It can even be used in conjunction with other schemes.

To help this process along, instead of a simple "yes/no" response, have the client include (say) a totalled up calculation, even when the result was overall, a "no". This also helps detecting a faulty client, once you have established which one is faulty.

With this in place, for parasites to avoid detection, they would have to collaborate. This makes their job harder (but not impossible). For example, a collaboration protocol amongst parasites might work;
"Did any other parasite do job #xyz?"
"Yes, I did, and my randomly chosen checking total was 58."

To better detect collaboration, always vary the pairings so that the same two people do not always check each other.

Include quickly checkable proof of work done.

If you could repeat every single completed work package yourself, you wouldn't need to enlist help.

But what if you can quickly check that the client has completed the work package, by having it respond with proof where it came close. Say that you know that in any work package, a client would probably come close (say) eight times. A completed work package shows ten occurances where indeed it did come close. This would show that most probably did the work.

This method usually works where the project is a "search". Consider a search for large prime numbers.

If a client tells the server that a given number turned out not to be prime at all, a simple check would be for the client to report a factor. The organisers can very quickly check this answer by performing a simple division. If that value is indeed a factor, it's proof that the larger number is indeed not-prime and you can strike it off the list.

Don't distribute the source code.

This is a controversial topic. So controversial, it's in it's own section.

Only reward success.

If the project is of a type where work packages fall into two groups, unsuccessful ones and successful ones, and the only way you'll find out which group your work package falls into is to actually do the work, how about only recognising those who you can be sure did the work?

Instead of recognising unsuccessful effort, only reward successful effort. Ignore anyone who is unlucky enough to be given a work package which doesn't contribute to the effort, except perhaps marking this work package as "done". Take away the reward away and the motivation for parasitical behaviour goes away?

This strategy alone still leaves you open to spoiling attacks and faulty clients, and unless the actual rewards will be frequent, you may find this strategy will limit the number of helpers.

This would be filed under social engineering rather than technical. The effectiveness of social engineering is debatable.

Reject "impossibly fast" responses.

Even modern computers have their limits. A parasite will want to churn though as many work packages as possible. Have a secret threshold, where any responses before that threshold will be thrown out, or at least double checked.

For good measure, tell the client (if such a message is normal) that the work packages was accepted, and also move the threshold around, to keep attackers guessing.

Under this scheme, a parasite would instead have to pretend to be a large number of computers. Request enough work for a 100 computers to do in a day, wait a day, respond with forged completion responses, as if they were from 100 different computers. This is much more difficult to detect, especially with firewalls being commonplace (so many computers appear behind one IP address). Perhaps mark "unusually busy" groups with suspicion and have a higher proportion of their work packages double checked.

To publish source?

This remains a controversial topic within the distributed computing community. Here are some arguments for and against.

Against publishing source code

Compiled code is very difficult to reverse engineer. You don't want anyone producing hacked clients to launch parasite or spoiling attacks, and so it's best to make their job as difficult as possible.

The "open source" crowd will want to push the benefits of publishing sources, but this is not the objective of your distributed computing project. Don't lose sight of your end objective. You want to find aliens, cure cancer, etc. That's more important.

With a controlled set of clients, you can present a moving target for attackers. If your client is reverse engineered, no problem, release version 2, which you've had waiting in the wings and the attackers will have to start all over again. Any work packages from older versions can be detected.

Additionally, if you control the production of compiled copies, you can be thorough in testing them and be sure that compiler bugs or well meaning fixes or enhancements will not trip up the overall project by putting an incorrect result into the stream.

Finally, if your code is important to you, "closed source" is a well established way to protect your copyrights, by making it difficult for people to re-use your code.

Here, the distributed.net people argue against publishing source code, amongst other issues.

For publishing source code

Without source code, an attacker will still have the client code as a compiled executable. This will be in machine code, and this is readable programming language, as much as C is.

For sure, it's difficult to understand, but with tools such as disassemblers and debuggers, the task is a whole lot easier. The bad guys will have the expertise, and if the reward is high enough, they will use it. Without publishing your source code, you lose all the benefits of open source for a short period of security.

Even so, it's not just the compiled executable that can be read. The behaviour of your code can be monitored. The files it uses can be read. The conversations the client has can be picked apart for all to see. The attackers have all the aces. All you can do is hope they won't use them.

Publishing your source code allows people to produce clients for obscure platforms. Is it worth going to the trouble of producing a copy for (say) NetBSD on PPC? Probably not, but with the source code, anyone can just compile a copy for themselves. The "closed source" crowd would have you limited to just Windows running on a Pentium series.

History has shown that reliance on security through obscurity doesn't work. Use the techniques mentioned elsewhere in this primer, and embrace the advantages that published source code bring.

Other issues to consider


How quickly do you need an answer to your work package?

In the case of a distributed chess player, or distributed.net's DES challenge (which was broken in a little over a day), the answer is very quickly.

As for SETI, and the RC5-64 project, I can download a whole load of work packages, and then take my laptop up a mountain for a week, returning the completed work packages when I get back to civilisation.

What existing distributed projects are out there already?

The GIMPS project maintains a list of current (and completed) projects.

Is there a simpler way?

If this all seems too much to cope with, there is a simpler way. Take a cue from Tom St. Denis and his search for an S-box. (The source code has since been removed.)

Tom St. Denis is a reader of the sci.crypt newsgroup, and he wanted an S-box which fitted certain requirements. (An S-box is just a matrix of numbers, often used in cryptography.)

Not having the resources to perform a systematic search, he wrote a client which would randomly put an S-box together and then check it. This would continue again and again until it found one, where it would display the results. The user would simply copy and paste the result into an email message.

This is an example of an unashamedly low tech, low scale operation. The helpers were largely other sci.crypt readers who wanted to help out. Work done was not recognised at all, unless a successful match was found. Enthusiasm by the contributors was the only motivator. Even the success of finding a right answer only won you Tom's gratitude.

There are no messages passed between client and server, as there is no server. Instead, randomness took the role of the server in generating work packages.

For sure, randomness isn't perfect. Roll a six sided dice six times. Did any number fail to appear? If so, continue rolling until all six sides have appeared. How many rolls did it take in all?

As I write this, I rolled a dice. 3, 2, 4, 5, 2, 6, 3 and 1. Two wasted rolls. Too much wastage? Worth it?

Sounds great. Where do I go from here?

Kirk Pearson's nifty web site

Always remember the 8 undoubtable truths of distributed computing. Print them out in big letters.

To discuss things, try the new newsgroup, "comp.distributed", where distributed computing is discussed.

Information on parallel computing as distinct from distributed computing.

Best of luck.

Copyright Bill Godfrey, 2002. Full version


Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure


Distributed computing is...
o a good thing. 49%
o a bad thing. 1%
o only worth taking part if money is paid. 1%
o bad, because most clients out there today are not free/open. 3%
o a good way to attract women. 13%
o just PVM hyped up. 3%
o #include "beowulf-cluster-joke.txt" 24%
o too difficult for me to organise. 1%

Votes: 53
Results | Other Polls

Related Links
o argue against publishing source code, amongst other issues.
o maintains a list
o search for an S-box.
o sci.crypt
o Kirk Pearson's nifty web site
o 8 undoubtable truths of distributed computing.
o Informatio n on parallel computing
o Full version
o Also by Bill Godfrey

Display: Sort:
A primer on distributed computing | 41 comments (14 topical, 27 editorial, 0 hidden)
Brilliant! (3.00 / 8) (#22)
by patofiero on Fri Aug 02, 2002 at 11:27:29 AM EST

Well written, and with a narrow enough focus that you don't meander all over the place :)

As it states, it's only a primer....

Maybe you could cover distributed time synchronization in another article? (Lamport clocks!)

Or distributed locking? :)

The only girl I've ever loved
 Was born with roses in her eyes
  -=- Neutral Milk Hotel - Holland 1945

For those interested in playing around... (4.00 / 2) (#29)
by bsletten on Fri Aug 02, 2002 at 10:24:53 PM EST

Parabon Computation offers a Java-based SDK that you can download to take advantage of their Frontier system. You can download it and test it out on a small number of computers for free. Commercial use is reasonable but they work with a variety benevolent projects too:


Distributed computing vs. supercomputers (2.00 / 2) (#30)
by wpmegee on Fri Aug 02, 2002 at 10:29:38 PM EST

Distributed computing such as SETI@home (my example) is already more powerful than the most powerful supercomputer in the world, IBM's ascii white.

Ascii white pumps out 11, teraflops, or 11 thousand billion floating point operations per second. But SETI is far ahead, with the latest statistics is far ahead with 28 teraflops per second. Therefore, SETI is the world's most powerful computer.

It all depends on your application (5.00 / 2) (#32)
by Laz on Sat Aug 03, 2002 at 08:34:37 AM EST

Distributed computing such as SETI@home (my example) is already more powerful than the most powerful supercomputer in the world, IBM's ascii white.


Be careful before you make unqualified statements about "supercomputers". SETI is not one, it is a distributed application.

You're comparing apples to oranges. Supercomputers and distributed computing differ in the applications they run, and as such, comparing one number (raw flops) for both is not a good comparison.

It all depends on how fine grained the amount of work you can dish out to each "worker" node. Distributed computing addresses the class of problems that are very coarsely grained (what is commonly called "embarrassingly parallel"). If your problem is not easily chunked into big pieces of work with very little communication, then you need a lower latency network to get any sort of performance. Most applications that run on supercomputers are much finer grained than SETI@home. Latency between nodes is an issue, and it is where distributed computing cannot keep up (because the internet is high latency and low bandwidth).

Oh, and just to nitpick, The Earth Simulator is the "fastest supercomputer" documented at top500 at 40 TFlop/s peak (ASCII White is 2nd with 11TFlop/s).

[ Parent ]

ASCI not ASCII (none / 0) (#33)
by FattMattP on Sun Aug 04, 2002 at 04:35:14 AM EST

ASCI (Accelerated Strategic Computing Initiative) not Ascii.

[ Parent ]
Another risk of closed source (2.50 / 2) (#31)
by phliar on Fri Aug 02, 2002 at 10:52:51 PM EST

Not having source, a sane and sensible person (perhaps others would say paranoid) will refuse to run a program unless there is no alternative -- and in this case, there is one, viz. don't run the program.

The risk to the person-on-the-street is not just that a malicious person might put backoors in the client (because I really don't think they would) but here's a program running on your machine that requires network access, perhaps even listens on sockets. A bug in such a program compromises your machine.

Faster, faster, until the thrill of...

Having the source is not enough... (none / 0) (#41)
by Waste of Time on Sun Aug 11, 2002 at 02:42:55 PM EST

...as you need to read through it, to be able to say you're safe.

Just having the source, compiling the program yourself, doing make install as root will not make you enlightened and will not make backdoors go away. You might as well install a binary package in that case.

Having the source on disk without reading through all of it will not make a difference security-wise.

Of course, when there is a problem, having the source makes it easier to fix, if you know what to do. But source != security. Having just the source without reading it is like having a pen with no ink.

-- Walk freely, undisturbed.
[ Parent ]

The link to the distributed.net document. (none / 0) (#34)
by Bill Godfrey on Mon Aug 05, 2002 at 03:57:26 AM EST

A better link is http://www.distributed.net/source/specs/opcodeauth.html

Similarly, it's better to link to http://www.bacchae.co.uk/docs/dist.html instead, as my copy has all the headings and tables in place, which I had to remove for K5.

Any way I can modify the document now it's been done?

Bill, editing.

I'm a theoretical chemist.... (none / 0) (#35)
by thehappygit on Mon Aug 05, 2002 at 10:44:16 PM EST

...and there are many problems that I find are quite interesting to solve that are relatively well suited to being solved in the SETI@Home manner (basically all of statistical mechanics boils down to solving 3N-dimensional integrals, where N is the number of particles in the system - if solved via monte carlo methods it leads to an "embarassingly" parallel algorithm) ... for example efforts to work on the protein folding problem (I know, I know, they exist. But I don't like the method most have chosen to employ...too ... approximate). But there are two stumbling blocks I see to this.

1. Are there any pre-canned libraries that a researcher could use to make servers/clients easily...that way they could drop in the science and leave the screen-saver code to someone else - like a computer scientist >gr<. What I would ideally like would be something that handles the pretty screensaver and the communications code so that the scientist need only call a function.<br>
2. How would one generate the publicity to get thousands of computers involved. ... sure most people have heard about Seti@HOME, but what others have popular awareness? I guess I'm asking, do people think that the market is saturated?

Distributed computing for non-programmers. (none / 0) (#36)
by Bill Godfrey on Tue Aug 06, 2002 at 05:17:40 AM EST

if solved via monte carlo methods it leads to an "embarassingly" parallel algorithm

"monte carlo methods"? Do you mean you can just let the client randomly pick a value and see if it works?

If you don't need any continuous client/server communication, then your best bet may be to follow Tom St.Denis's lead. See the section "Is there a simpler way?".

If you do need the full work package generation... Quick review...

  • Do you have a web site where you can put up your own CGI programs?
  • Is there a way for the client to prove to you that the work was done, without you having to repeat the work package yourself?
  • Are you famous in your field?
  • Do you want to pay cash to helpers?

Distributed computing is a comparitively young technology. (For sure, PVM and such like has been around for aeons, but it's not appropiate for this type of task.) I certainly don't know of any libraries.

Having said that, I'm helping out with developing such a library, nodes. It's the classic sourceforge project, start a sourceforge home page, *mumble*, profit! I'd certainly be interested in getting someone from your side involved.

But that's a long way off, and I'm kinda diverted on my other big project.

In the meantime, I would suggest contacting the people who run the existing projects. This sounds like the sort of thing distributed.net may be into.

[ Parent ]

Well... (none / 0) (#37)
by thehappygit on Tue Aug 06, 2002 at 10:02:20 AM EST

Distributed computing for non-programmers.

First off, I am a programmer. In my time I've written more lines of code than I can count. I've just never written a library. I'm a scientist, so my talents are best suited to coding the science, and most libraries I've needed have been written by experts in their field. Just fyi...

"monte carlo methods"? Do you mean you can just let the client randomly pick a value and see if it works?
In a nutshell, yes. But depending on how you write the code, you still need some interaction between a client and a server. I would definately want to do it this way for 1000s of computers that I had no direct control over.

To answer some of your questions:
1. Yes I could put up a web(or any other) server. At the worst, I could use the computer on my desk at work, but we've a bit more resources than that :).
2. This is difficult. I couldn't prove that their result was perfect without reproducing the calculation, but I might be able to reject less than subtle errors.
3. If I was famous in my field, I'd have all the computer time I needed at government facilities or the money to pay for my own :)
4. Perhaps, but ideally not :)

(For sure, PVM and such like has been around for aeons, but it's not appropiate for this type of task.) I certainly don't know of any libraries.
PVM is way out. Check out MPI if you do any serious distributed memory parallelism coding. But you're right, neither of these is suited to Seti@HOME-type jobs. And the hardest part for us scientists is this (IMHO)... coding the crappy gui part - a screensaver or an animation. We're not trained for that. Every program I've written is a console program that accepts a plain text file for input. If you've grown up in unix this is the norm, but if you expect all the nice people to donate their computer time (even if it was going to be wasted anyway) you need to be a showman as well. Seti@HOME did it right, I just wish they'd released their work in a generally useful fashion (or at all for that matter).

[ Parent ]
and broke his ankle bone. (none / 0) (#39)
by Bill Godfrey on Tue Aug 06, 2002 at 12:55:00 PM EST

First off, I am a programmer. In my time I've written more lines of code than I can count. I've just never written a library. I'm a scientist, so my talents are best suited to coding the science, and most libraries I've needed have been written by experts in their field. Just fyi...

That's good. You are half way there already. Sorry for the misunderstanding.
But depending on how you write the code, you still need some interaction between a client and a server. I would definately want to do it this way for 1000s of computers that I had no direct control over.

I think I've misunderstood your question. You're after some pre-packaged way of writing screen savers for use as distributed clients?

If so, some google searching is probably best approach from here. (developing windows screen savers) Just write your code and have it run within the screen saver executable until its signaled to stop. Don't worry about working only in idle time, the user has gone to lunch.

If there's only one message from client to server, "I found it!", then you've got rid of the difficult stuff. No need to check that completed work packages represent actual work done.

Screen savers are all very well, but you would get a lot more work done by developing an idle priority task. For single user workstations, CPU activity tends to bursty. For sure, if you have some task running in the background doing lots of stuff, then your CPU will be busy. But if you are just typing a document, there's a lot of idle time between key presses, which a screen saver would not see.

How do you make a process work in idle time only? No idea. Look it up. (google to the rescue)

So what do we do when you'be come around to sending this one "I found it!" message? You could perform an http GET with a URL containing all the data.

Or your client could automatically send an email to a pre-programmed address.

Or you could display a dialogue box and ask them to manually send an email.

1. Yes I could put up a web(or any other) server. At the worst, I could use the computer on my desk at work, but we've a bit more resources than that :).

That's okay, you just need some way to pick up messages from the client. An http GET is a good way for the client to talk to the server, even from behind a firewall.
2. This is difficult. I couldn't prove that their result was perfect without reproducing the calculation, but I might be able to reject less than subtle errors.

Right, we're assuming here that there will be continuous interaction between clients and servers, rather than the one big "I found it".

You should decide early on if you are going to go for the simple way, favoured by Tom St.Denis, or are you going to pull out the big guns, with managed server and client transactions. (Or somewhere in between.) Once you've made that decision, all else follows.

(If you like, email me. I'd be interested in discussing this in more depth.)

[ Parent ]

processing in idle time.. (none / 0) (#40)
by joib on Thu Aug 08, 2002 at 10:13:48 AM EST

At least on unix it's dead simple, just

#include <unistd.h>

int main(void) {
    nice(19);  // 19 is the lowest priority

Sorry, I don't know how to do it in windows.

Regarding protein folding, there is already the folding@home project, check that out..

[ Parent ]

Randomness. (none / 0) (#38)
by thehappygit on Tue Aug 06, 2002 at 10:10:29 AM EST

For sure, randomness isn't perfect. Roll a six sided dice six times. Did any number fail to appear? If so, continue rolling until all six sides have appeared. How many rolls did it take in all? As I write this, I rolled a dice. 3, 2, 4, 5, 2, 6, 3 and 1. Two wasted rolls. Too much wastage? Worth it?
This is not a good measure of randomness. Checkout http://world.std.com/~franl/crypto/random-numbers.html for a good explanation.

A primer on distributed computing | 41 comments (14 topical, 27 editorial, 0 hidden)
Display: Sort:


All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!