Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

Combining the Ideas of Distributed Computing and Peer-to-Peer Networks

By qaz2 in Internet
Thu Feb 21, 2002 at 10:18:57 AM EST
Tags: Internet (all tags)

Within the last few years, the Internet has caused the rise of two phenomena, distributed computing and peer-to-peer networks. Distributed computing is the donation of unused computer time for use in the some project which requires an immense amount of processing power. Popular examples include Seti@Home and the various projects hosted at distributed.net. Peer-to-peer networks is the term for the linking of computers via the internet, usually for the purpose of sharing files.

But what happens when the two ideas are combined?

We get a peer-to-peer network where, instead of sharing files, users share unused computer time. If a user, for example, is rendering an animation, part of the work would be done by other people's computers. In return, when the same user is doing nothing, his or her computer time would be available to other people.

How is this done? The user will be running a piece of software which will connect his or her computer to the network of other users of the software via the Internet. When the user needs more processing power, the software will send the data and instructions through the network to an available computer (or computers). This computer (or computers) would process the data and send the results back to the user's computer.

Of course, this would not work for all tasks. If the user's computer needs to draw a window, for example, the lag time of the operation would make the process slower. The ideal task would be one which has a small amount of data and a large amount of processing to do with the data.

The network should be structured to reduce lag time. Tasks should be sent to the closest available computer.

Certain restrictions would have to be imposed. A user's processing power would only be available if the user was not using it. There would have to be a maximum amount of processing power a person could use at a time. Otherwise, one user could tie up the network.

One interesting issue which arises is what to do with the extra computer time that might be available if the available computing power exceeds the needed computing power. One option would be to dynamically increase the processing power cap. Another would be to do work on the various distributed computing projects out there, recognizing that if this idea caught on they would otherwise be hurt by it. Perhaps scientists would be allowed to apply for computer time for research. These applications would be voted on by the users. Of course, the available computing power could be split up between the above options.

The basic question here is whether or not this would (a) be feasible and (b) be popular enough it make it work.

Other questions/interesting issues:

  • Would it be possible to write a client which would be able to use the network to speed up programs not specifically programmed to use the network. (Or, would the speed up be limited to programs written to use it?)
  • Might there be privacy/security issues? A user might try to steal data being processed via the network.
  • Might there be legal issues? What if a user was using the network to speed up a process doing something illegal? Would the other users doing the actual processing be legally liable?
  • Sponsors

    Voxel dot net
    o Managed Hosting
    o VoxCAST Content Delivery
    o Raw Infrastructure


    Would this idea work? Would you participate?
    o I would participate, and I think it would work. 45%
    o I would participate, but I think it would not work. 8%
    o I would not participate, but I think it would work. 14%
    o I would not participate, and I think it would not work. 10%
    o I would participate, and I am unsure/don't care about whether it would work. 8%
    o I would not participate, and I am unsure/don't care about whether it would work. 4%
    o I really don't care. 8%

    Votes: 48
    Results | Other Polls

    Related Links
    o Also by qaz2

    Display: Sort:
    Combining the Ideas of Distributed Computing and Peer-to-Peer Networks | 59 comments (57 topical, 2 editorial, 0 hidden)
    Been there, done that.. (3.75 / 8) (#1)
    by jabber on Wed Feb 20, 2002 at 11:55:14 PM EST

    I solved this problem with a friend, in our capstone thesis paper nearly 5 years ago.. The solution is really quite elegant, but there's not enough space in this margin to explain it.

    [TINK5C] |"Is K5 my kapusta intellectual teddy bear?"| "Yes"

    Is it available on the web? nt. (none / 0) (#15)
    by scanman on Thu Feb 21, 2002 at 02:10:51 AM EST

    "[You are] a narrow-minded moron [and] a complete loser." - David Quartz
    "scanman: The moron." - ucblockhead
    "I prefer the term 'lifeskills impaired'" - Inoshiro

    [ Parent ]

    Hint: Fermat. (none / 0) (#21)
    by Tezcatlipoca on Thu Feb 21, 2002 at 03:50:11 AM EST

    The poster you are asking obviously is jocking.
    "At eighteen our convictions are hills from which we look;
    at forty-five they are caves in which we hide." F. Scott Fitzgerald.
    [ Parent ]
    Sort of... (none / 0) (#27)
    by jabber on Thu Feb 21, 2002 at 10:07:13 AM EST

    I don't have the more detailed paper available, but I do have it's precursor, where most of the latter ideas were introduced, here.

    Hmm.. I wonder how K5 would receive that paper..

    [TINK5C] |"Is K5 my kapusta intellectual teddy bear?"| "Yes"
    [ Parent ]

    Some implementation considerations (5.00 / 4) (#2)
    by fluffy grue on Wed Feb 20, 2002 at 11:57:58 PM EST

    Consider using a platform-independent and easily-sandboxable mechanism for distributing the code. Tcl and various other scripting languages are good for this, as is Java bytecode. Existing programs (such as POVray) could be adapted to Java bytecode using the GCC JVM cross-compiler. Also, some way of performing process migration (for example, taking a snapshot and sending it to another node) would be vital. Another big problem is redundancy; when you have a single project like rc5des, it's pretty easy to figure out which parts of the keyspace need to be restarted, but when you have a huge amorphous blob like this, things get incredibly complicated.

    Also, you need to have some mechanism for the data to make its way back to the requestor. Email would be the most obvious implementation, but perhaps something Freenet-ish (or even Freenet itself) would be better.

    Agent-based computing isn't a new idea, by the way, and truly-distributed computing was one of the eventual goals of Multics (the predecessor to UNIX), as well as Amoeba (the successor to Plan9).
    "Is not a quine" is not a quine.
    I have a master's degree in science!

    [ Hug Your Trikuare ]

    Platform independance (5.00 / 1) (#3)
    by qaz2 on Thu Feb 21, 2002 at 12:02:40 AM EST

    Platform independance makes it easier to progam, maybe, but there is a performance hit. Java is slower than C(++). As this is a performance oriented application, the speed is more important than the portability.

    [ Parent ]
    at risk of java/c++ flame war... (4.00 / 4) (#5)
    by jeffy124 on Thu Feb 21, 2002 at 12:32:23 AM EST

    Java has comparable speed to C++ apps once they get going, especially for text-based apps and those that are heavily computative (such as scientific apps). Java's HotSpot compiler will (at runtime) convert bytecodes into native code to speed up execution significantly. The only major drawback to Java (which is what many people refer to when a short app takes 10 seconds) is the load time and footprint of the JVM. Hence, the performance hit happens once and only once, not throughout the entire execution like older versions of Java.

    A big plus to Java over C++ would be code safety. Java Applets already run in a sandbox environment. By extending the sandbox to your proposal, the machine receiving code to be executed can easily sandbox the code and prevent the program from setting up backdoors, wrecking the machine, etc.
    You're the straw that broke the camel's back!
    [ Parent ]
    "comparable" speed (2.00 / 1) (#33)
    by pb on Thu Feb 21, 2002 at 03:57:43 PM EST

    Yeah, the speed is "comparable"; it's slower--there's your comparison. :)

    And yes, Java doesn't HAVE to be slower, because THEORETICALLY a JVM could be written that optimizes its code to a ridiculous amount. But even so, then you've probably lost the sandbox paradigm altogether in favor of speed.

    The fact of the matter is that you have three different things that you're building a system around:
    (1) performing at the raw speed of the underlying hardware
    (2) ensuring the protection provided by a sandbox
    (3) coping with the bloat inherent in the base Java classes

    I'm positive nothing exists that does all of these, and I know that at the least it would be a very tricky proposition to code this, if it is even possible. (maybe the same big(O); not the same performance :)

    On the other side of the fence, it isn't like C code runs in a sandbox either, although there are tools for restricting access to regions of memory and the filesystem. But still, maybe we're asking too much here.

    I doubt such a beast of a JVM exists, but feel free to prove me wrong by posting links to benchmarks between that JVM and, say, a C++ or C program performing a computational task. I'd be fascinated to see it. :)
    "See what the drooling, ravening, flesh-eating hordes^W^W^W^WKuro5hin.org readers have to say."
    -- pwhysall
    [ Parent ]
    "compare" argument (4.00 / 2) (#38)
    by stfrn on Thu Feb 21, 2002 at 05:20:34 PM EST

    Y`know, allot of responses would be saved if people would look for things themselves before saying, "find this for me!". There have been many comparisons between java and c++ or other languages here and on slashdot, if my memory hasn't faded yet, and i also belive there findings were exactly what Jeffy mentioned.

    IDNWAFW, but i had to dispell your old myths. You actually didn't say much in response to Jeffy, you just flamed about java in general.

    btw, what would optimisation have to do with the sandbox? The code would still have the same restrictions, ie no local io, no conects except fro thru the p2p network, etc.

    [ Parent ]

    flaming vs informed posting (none / 0) (#40)
    by pb on Thu Feb 21, 2002 at 05:30:07 PM EST

    Well, I wouldn't have to flame if I had ever SEEN any of these benchmarks, especially if they meet my criteria. Could YOU point me in the right direction?

    Also, WTF is IDNWAFW? "I Do Not Want A Flame War"? Well, don't be cryptic, then. I said quite a bit about Java, and if you take issue with any of it, then I will back it up.

    Many low-level optimizations work by breaking things like memory protection, or fudging specifications a bit. That's why there's an --ansi flag on gcc and g++, and that's also why there are fast STL implementations for C++, sometimes faster than the equivalent C programs; (I'm sure I could find a link to a paper on it if pressed) these are tricks that couldn't be used in a sandboxed implementation, or at least not as easily.

    If you wanted to sandbox a C program, you could do it at the OS level, but it's a kludge; I suppose the same techniques could be applied to a JVM, (likely a C program :) but that also would not be The Right Way to do it. :)

    Also, I don't believe in trust models, but if you insist on it, I suppose you could implement one into your sandbox. But heck, if you're going to do that, build in a firewall while you're at it, and rewite the filesystem in Java in your copious free time... ;)
    "See what the drooling, ravening, flesh-eating hordes^W^W^W^WKuro5hin.org readers have to say."
    -- pwhysall
    [ Parent ]
    benchmarks... (4.00 / 1) (#46)
    by jeffy124 on Thu Feb 21, 2002 at 11:12:48 PM EST

    quick search on google for "java c++ benchmarks" yielded these two (among many other) results:

    http://web.informatik.uni-bonn.de/II/ag-klein/people/zach/benchmarks/java-vs-c++.html -- Outdated stats using JVM 1.0.1

    http://verify.stanford.edu/uli/java_cpp.html -- This is the most comprehensive I found and uses 1.3 beta. 1.4 was released last week.

    The big thing you should take note of when looking at those, look at how much Java improved in execution speed over the past 5 years. Many people stopped using Java back in the 1.1 days because of lousy execution speed back then, and have refused to touch it since then.

    Another thing to keep in mind when reviewing these, yes, Java does take longer than C++. But C++ does not have the loading time that Java does. I have seen articles that show that both Java and C++ take similar amount of time to do similar tasks when using g++ 2.95 and Java 1.4beta. The major difference was Java took L millisecs longer, which was consistent in the various benchmarks performed. Turns out a null Java program (eg, main() { ; } ) took L ms to execute. L was the amount of time it takes for a JVM to load. I cant find one of those articles right now, if I do, I'll post a link.

    lastly, you say: Java doesn't HAVE to be slower, because THEORETICALLY a JVM could be written that optimizes its code to a ridiculous amount

    It's no longer theoretical. HotSpot (as of 1.3) does exactly that.
    You're the straw that broke the camel's back!
    [ Parent ]

    good point, but still slower :) (5.00 / 1) (#52)
    by pb on Fri Feb 22, 2002 at 04:45:38 AM EST

    That second link is quite interesting in places, although the methodology behind the first test seems somewhat flawed; the goal in benchmarking shouldn't be to (1) write a Java program and then (2) write a C++ program *exactly* like the Java program, when trying to compare the two languages (unless your goal is to give Java an artificial advantage). Rather, one should (1) write a spec, and then (2) implement programs to the spec on both languages. Still, their results are interesting. But, obviously, the C++ programs are massively (a bare minimum of 5x, after much optimization, testing, and swapping of JVM's on the Java side) faster, which confirms my first point, that JVM's simply aren't as fast as running the same algorithms natively with a standard optimizing compiler in, say, C or C++. (at least, I hope they used -O instead of -g, for their sake. ;) This is not to say that the JVM's haven't improved; they have improved quite a lot since the bad old days.

    However, the most interesting point in the article for me was the "Comparison using the library sort algorithms" at the bottom; this is exactly what I was talking about before, comparing C / C++. Of course, if there IS a built-in function to do your task, especially with a language like Java, it SHOULD be much faster than writing code yourself in Java to do it, since the JVM can just do its thing and you can get out of its way for a while and stop slowing it down by making it interpret your code all the time. This is not to say that Java is faster at this task, and the author notes that he suspects the run-times to be memory bound; obviously Java is doing much better at this than it was on the previously CPU-bound tasks of executing Java code, but it likely still uses more memory storing information about its data, even with a base type like int. Also, in Java, it can be hard to find the right library function for the task, since there are so many of them; even the author of this paper tested the "wrong" (i.e. more generic, likely more frequently used, but slower) library function first, before finding a simpler, more optimized one that suited his task.

    I agree that it would be more useful to compare Java to C++ without the load time; one way to do this would be to run the benchmark twice (or multiple times) and time it between each iteration. Then (in a simple benchmark) all the Java code should get loaded, interpreted, compiled, optimized, and whatnot. You'd have to do the same thing comparing, say, code executed on a Crusoe and an Intel chip, and note the load time and the iterative improvement in execution times. Note, also, that there is load time for a C / C++ program, (usually in the form of dynamic linking, loading the executable into memory, and allocating memory) and similar techniques could be used to ensure a fair comparison; also, for raw speed tests, static linking is an option as well (both for a JVM and a C / C++ program).

    But, I'm sorry, even in a biased comparison such as this one, they couldn't make Java faster. Which was my original point. I am impressed, however, at the great length they went to, to test and compare all the JVM's, and if I ever do my own tests on this matter, that will be quite helpful as a starting point. Thank you very much for the links. :)
    "See what the drooling, ravening, flesh-eating hordes^W^W^W^WKuro5hin.org readers have to say."
    -- pwhysall
    [ Parent ]
    Java performance (5.00 / 2) (#49)
    by Pink Daisy on Fri Feb 22, 2002 at 02:35:39 AM EST

    I've read a lot about Java performance. It's hard to compare directly, since you can't just take SPEC and run it in Java. I used to be a big believer that you could make a JVM that would run faster on equivalent programs than any statically compiled code. Now I'm not so sure. Anyway, the pinnacle of what's available today would be a JVM that runs at the same speed as equivalent statically compiled native code, plus a large constant factor. The worst case would be much worse; normally a smaller constant factor but a very large slowdown on execution. Most of today's commercial JVM's (particularly IBM's jdk, although Sun Hotspot and some others are also very good) come close to our pinnacle, but none actually reach it.

    There is a lot of research dedicated to making Java run faster than native code. Currently, that is the largest single focus of compiler research. The stage may have to be shared with IA64 compiler research if that becomes very popular, but definitely the improvement of VM's is an interesting topic. There is hope for making Java the preferred language for scientific computation, instead of a second class citizen.

    [ Parent ]
    Security (5.00 / 1) (#12)
    by fluffy grue on Thu Feb 21, 2002 at 01:28:26 AM EST

    I'd rather not allow just anyone to run just any piece of arbitrary code on my machine, thanks. I'd at least like something which is limited in terms of which system calls it makes. I don't want people to run a DDOS client on my machine, for example.

    Also, not all of my computers are x86.
    "Is not a quine" is not a quine.
    I have a master's degree in science!

    [ Hug Your Trikuare ]
    [ Parent ]

    GCC Java (none / 0) (#13)
    by Pink Daisy on Thu Feb 21, 2002 at 01:37:35 AM EST

    I believe the GCC Java backend is compatible only with the Java frontend. That is to say, although you can compile Java (or any other language gcc supports) to native code, the only language that you can compile to bytecode is Java.

    It is possible to target other languages to Java bytecode, although it is not trivial (in the sense that targetting any assembly language is trivial, or targetting Java to Java bytecode is trivial). There is a bigger problem with the current gcc, though. The IR it uses is too hard to target to a stack based machine. Until that changes you won't be able to target C or C++ to a JVM using gcc.

    [ Parent ]
    Not much point... (5.00 / 1) (#4)
    by ucblockhead on Thu Feb 21, 2002 at 12:22:03 AM EST

    There's just not much point to do this in a peer-to-peer manner. Almost no one uses their CPU to anything near its capacity.

    The average home user likely never has. Most "home user" applications are disk bound anyway.

    The only people who really need lots and lots of CPU time are science types, and they do better with non peer-to-peer apps like Seti@home.
    This is k5. We're all tools - duxup

    food for thought... (none / 0) (#6)
    by jeffy124 on Thu Feb 21, 2002 at 12:40:53 AM EST

    i think one possible idea is a person donates their unused cycles to a random purpose as opposed to running only Seti@Home or Distributed.net.

    Consider: I leave my machine set to allow incomming processing requests while I'm at work. A scientist needing massive computing power for a short task can use my donated resources (and those of others) in getting his task completed.

    Hence, depending on what the world of science is up to, my machine would be doing different types of processing at different times. I also cut the tie to exclusively using Seti or Distributed day in day out.
    You're the straw that broke the camel's back!
    [ Parent ]
    Already exists (5.00 / 1) (#8)
    by ucblockhead on Thu Feb 21, 2002 at 12:56:06 AM EST

    That sort of thing already exists in a non peer-to-peer fashion. (Or did, I can't recall if the projects are still around. One was going to pay people for CPU time. That flopped.)

    Since most of the clients like seti are so easy to set up and then forget, I'm not sure what the advantage to the user of a peer-to-peer solution would be.
    This is k5. We're all tools - duxup
    [ Parent ]

    advantage/disadvantage... (none / 0) (#10)
    by jeffy124 on Thu Feb 21, 2002 at 01:16:49 AM EST

    the primary advantage is there is a warm fuzzy feeling inside that my machine made a contribution. with Seti, that would be helping find E.T's signal. but in this case (i think this is what you're getting at), one does not know what their contribution to someone's project was (or the project itself), only that they helped make a contribution.

    For all a user knows, they could be helping someone find an ideal algorithm for generating large primes, or they could be contributing to a terrorist group calculating how to bomb a bridge to maximize the damage to it.
    You're the straw that broke the camel's back!
    [ Parent ]
    User friendly feedback (none / 0) (#16)
    by juahonen on Thu Feb 21, 2002 at 02:47:04 AM EST

    What if the distributed program had a user friendly feedback system (e.g. a screensaver) which described what is being done. Another option would be the possibility to disallow certain types of tasks.

    [ Parent ]
    advantage (none / 0) (#36)
    by scruffyMark on Thu Feb 21, 2002 at 04:31:37 PM EST

    Since most of the clients like seti are so easy to set up and then forget, I'm not sure what the advantage to the user of a peer-to-peer solution would be.

    If I can't benefit by this same network whenever I need some compute muscle myself, then it's hardly a network of peers. I guess that would depend on developers coming up with programs "real people" would want to use, that use this peer network - likely, "I can look for similarities in millions of genes really fast" won't attract your piano teacher's kid sister. Faster animation rendering might.

    Basically anything commonly enough used, that runs bound by CPU or memory, and is easily parallelized, could be a candidate. How large a category that is, I don't know.

    [ Parent ]

    Not for everbody, but not for nobody (none / 0) (#31)
    by scruffyMark on Thu Feb 21, 2002 at 03:12:23 PM EST

    As you say, the average home user does mostly disk bound stuff, the only CPU bound time they spend is waiting for bits of MS officebloat to grind to life.

    You cannot, however, extend that to all home users. For example, plenty of people run big photoshop / GIMP plugins. Lots of these likely use easily parallelized algorithms - the change to one pixel depends on the values of N pixels to each side of it, for some smallish N.

    So, maybe your audience is not as wide as Morpheus. But say all the computers in a small graphics shop run it, and all the employees set it up on their home machines as well. Now you have a peer-to-peer network of maybe 30 computers, and the workers at this graphics shop all of a sudden spend way less time waiting for their heavy-duty plugins to finish running. They can work from home more effectively as well, since they have effective access to all the presumably snazzy work computers from their possibly futzy old home machines.

    I also suspect that the number of science types who could benefit from extra compute power is significantly greater than the number who

    • have time, energy and money for the publicity to attract users to their one, particular, project, and
    • need that compute power all the time, so that the slow startup time of the project will pay off months or years down the road.
    Many probably need lots of compute power now, so they can go back to the lab for a couple of months and get more results to analyze. If they can just run their data on an already established network of general-purpose compute servers, and leave their computers doing analysis of other scientists' work while they're in the lab, again everyone benefits.

    [ Parent ]
    Bloatware (none / 0) (#32)
    by ucblockhead on Thu Feb 21, 2002 at 03:17:37 PM EST

    Most bloatware is actually heavily disk-bound. Big processes mean lots of swapping, which means lots time spent waiting for the disk.

    Perhaps Photoshop or Gimp, but even there, I'm not convinced that the CPU is maxed for more than a short period, too short for it to make sense to throw packets down that slow internet line. Any application that doesn't take on the order of minutes to complete is likely not going to make sense for this sort of thing.

    These days, even the damn compilers are disk bound!
    This is k5. We're all tools - duxup
    [ Parent ]

    Hmm (none / 0) (#34)
    by scruffyMark on Thu Feb 21, 2002 at 04:18:40 PM EST

    Well, the famous Apple "MHz is meaningless, a G4 is way faster than the fastest P4" demos always seem to use photoshop plugins, and IIRC to take a couple of minutes even on the ever-so-fast G4. Bigger Photoship jobs can take half an hour or more, to say nothing of rendering an entire animation.

    Now, presumably if these jobs were disk-bound, the processor you're running on wouldn't matter all that much compared to the speed of your hard drive, so Apple wouldn't be able to do a very impressive demo - "Sure, it's only four percent faster on the Mac, but when you compare just the times the task spent CPU-bound, it's really more like fourty percent," just isn't going to impress anyone.

    [ Parent ]

    Photoshop plugins (none / 0) (#39)
    by ucblockhead on Thu Feb 21, 2002 at 05:21:38 PM EST

    The thing is, the sort of thing they are demoing is not the sort of thing people do often. Most people will never do it.

    Anyway, you can answer the question for yourself. Check your CPU graph. How often is it an 100%. How many of those times are you actually waiting for something. If you are like most people, your answers will be "rarely" and "never".
    This is k5. We're all tools - duxup
    [ Parent ]

    True (none / 0) (#42)
    by scruffyMark on Thu Feb 21, 2002 at 06:50:26 PM EST

    But then I never said it would be for most people, just that it would be for some number of people nontrivially greater than zero. Unfortunately I can't easily check myself - Photoshop costa more money than I care to spend, and the GIMP is such a wretch to use, expecially for a spoilt Mac user like me.

    Incidentally, my CPU graph is constantly at 100 percent. Granted, most of the time, it's about 70-80 percent 'nice', the niced process being dnetc...

    [ Parent ]

    Thats the point!!! (none / 0) (#59)
    by junkgui on Mon Mar 04, 2002 at 11:56:31 AM EST

    People who do these types of tasks will be able to steal processor time from "average joes" will never know the difference! I think that sound like a great idea.

    [ Parent ]
    Security is a big issue (5.00 / 3) (#7)
    by MugginsM on Thu Feb 21, 2002 at 12:47:32 AM EST

    I think this is a really good, and maybe practical

    The two biggest issues I can see are a)performance
    and b) security.

    a) performance
    The problem is that while you may be able to grab a few minutes of CPU time here and there, the time needed to send the data to and from each node might be a killer. Unless you can grab hours and days of CPU time in one go, it might not be worth it to send the code+data to and from every client. Faster networks make this better, but for the type of people who turn their computers off at night, or people on modem dialup, it might not be worth the effort.

    b) security
    I think this is the number one problem. You pretty much need to have something that is good at sandboxing. eg. Java, Python, TCL, or the like. You also have other issues to deal with - will the application be allowed to store stuff on the local drive, will you be rationing RAM, etc.

    This is a topic I find quite fascinating, actually. I've spent a little time looking at a similar system - ways of sharing resources on distant devices. In my case, loading little programs into shared robots that may be out of communication range for times.

    - MugginsM

    It needs to be restricted (none / 0) (#18)
    by juahonen on Thu Feb 21, 2002 at 03:02:52 AM EST

    This kind of peer-to-peer system needs to have strict restrictions on it. If any participant could start executing any code s/he wishes, there would certainly be issues with security and privacy. No matter how much sandmoxes you put on them, they are, after all, just sandboxes.

    What I suggest is a system which allows any user to execute any kind of mathematical formulae on any data of limited size. The distributed engine working on an end user machine would then limit the available resources for the problem solving. This would eliminate the possibility of running out of memory. As far as I know, if the system is limited to mathematics only, the only security issue would be the memory usage of the problem solving process.

    [ Parent ]

    mathematical formula (none / 0) (#24)
    by kubalaa on Thu Feb 21, 2002 at 06:40:53 AM EST

    So the program would have to be written in functional language without side-effects?

    [ Parent ]
    Capabilities (none / 0) (#19)
    by mlinksva on Thu Feb 21, 2002 at 03:08:38 AM EST

    Anyone thinking of implementing something like this, read up on capability security.
    imagoodbitizen adobe unisys badcitizens
    [ Parent ]
    Check out idel. (none / 0) (#9)
    by Eloquence on Thu Feb 21, 2002 at 12:57:51 AM EST

    idel helps sandbox unsafe languages like C for distributed computing. BTW, this is a subject that would be best discussed on iA or the IRC channel, you will find plenty of like-minded hackers there.
    Copyright law is bad: infoAnarchy Pleasure is good: Origins of Violence
    spread the word!
    Tying it into the Grid (4.75 / 4) (#11)
    by skim123 on Thu Feb 21, 2002 at 01:20:35 AM EST

    If you didn't know, there are a gaggle of supercomputing centers around the world (just in Europe, Japan, and US, actually, IIRC) that are all interconnected to form a supercomputing grid. Scientists use this grid to run distributed processor-intensive applications, like protein folding simulators, etc. Maybe the grid could be expanded into home user computers via such a mechanism.

    Of course, part of the inherent problem with any suggestion of this nature is data sensitivity. If I have some chunk of data transmitted to another computer to have some work performed on it, I have to worry about:

    1. The data being privy to my eyes only - if so, it shouldn't get transferred!
    2. Someone intercepting my data on the way there or back and reporting false results, for whatever reason.
    Also, I would encourage you to check out some of the distributed operating systems out there. It's been a while since my Dist OS class, but I remember we looked at a number of academic OSes that already had such process sharing features built into them. Granted, this was over a private network, but could likely be extended to the Internet without too much hubub.

    Money is in some respects like fire; it is a very excellent servant but a terrible master.
    PT Barnum

    Re: Tying it into the Grid (none / 0) (#20)
    by juahonen on Thu Feb 21, 2002 at 03:26:54 AM EST

    It would be trivial to protect the system agains man in the middle type of attacs. Protecting against intentional falsification of computation results is quite impossible. To actually verify that the data sent back from the client is not tampered with, you'd have to calculate the results yourself. Of course, the distributed system could send the same problem to multiple machines. This would drop the distributed system efficiency by 50 percent or more. Likely more.

    If you can stand the efficiency loss and absolutely must be certain that no falsification is taking place, then there would be no problem in that. I have to say, however, that the chances are small someone will actually bother to falsificate results.

    I could say deceit is an inherent flaw in trusting others. They are capable of betrayal and deception. If you sent the data to multiple clients, there would be no way of telling if any or all of them tampered with the results. It could well be you've sent your problem to a bunch of script kiddies, each with modified client software. Since you cannot be absolutely sure about the others, it serves you as well as to trust then.

    [ Parent ]
    Globus (none / 0) (#51)
    by zavyman on Fri Feb 22, 2002 at 04:17:10 AM EST

    Globus might be the kind of supercomputing grid you are talking about.

    This is quoted from the FAQ page:

    What is Globus?

    The Globus Project is a research and development project focused on enabling the application of Grid concepts to scientific and engineering computing. (See below for an explanation of the Grid.)

    • Groups around the world are using the Globus Toolkit to build Grids and to develop Grid applications.
    • Globus Project research targets technical challenges that arise from these activities. Typical research areas include resource management, data management and access, application development environments, information services, and security.
    • Globus Project software development has resulted in the Globus Toolkit, a set of services and software libraries to support Grids and Grid applications. The Toolkit includes software for security, information infrastructure, resource management, data management, communication, fault detection, and portability.

    What is the Grid?

    The Grid refers to an infrastructure that enables the integrated, collaborative use of high-end computers, networks, databases, and scientific instruments owned and managed by multiple organizations. Grid applications often involve large amounts of data and/or computing and often require secure resource sharing across organizational boundaries, and are thus not easily handled by today's Internet and Web infrastructures.

    Two papers that provide overviews of Grid computing are Anatomy of the Grid, which defines Grid computing, proposes a Grid architecture, and discusses relationships between Grid technologies and other contemporary technologies; and Physiology of the Grid, which describes how Grid mechanisms can implement a service-oriented architecture, explains how Grid functionality can be incorporated into a Web Services framework, and illustrates how this architecture can be applied within commercial computing as a basis for distributed system integration.

    I believe source code is available so that one could hack their own grid. It's a very interesting project.

    [ Parent ]
    communications and partitioning (5.00 / 2) (#14)
    by Pink Daisy on Thu Feb 21, 2002 at 02:06:23 AM EST

    There are good systems for using multiprocessor computers and clusters for performing computations. The two big problems are that it is complicated to write parallel programs (and particularly difficult to do it efficiently so that there is a good speedup) and that most problems don't parallelize easily. Even for problems that are embarassingly parallel, there is usually significant communication for job startup and finish.

    For this to work, consider that you would have to download the computational client for each new job. It then runs in a sandboxed environment. I'll assume that the person running it doesn't mind if their program is collected and analyzed; presumably their results and data are spread over a wide collection of clients, so the attacker may get the program and a small amount of data, but the vast majority of the problem data and calculations are only in the hands of the originator of the problem.

    As for data distribution, it is a hard problem. Beyond noting that the optimal answer requires future information (ie. you can't tell when Joe Blow's 2880 MHz overclocked Athlon with the most spare cycles is going to catch fire and stop sending back data), you have to think about partitioning so that variations in computer speed and network latency and bandwidth don't cause too much trouble. Failures are another problem; you must be able to tolerate failures in both the gone offline completely sense, and the sense of having the task run in the background, next to the game of Quake 3 that the persone fired up when returning from lunch.

    As for security, the person running the sandboxed client is pretty safe. The person putting the job in the cloud has to worry that their code and data will be intercepted, and that 31337 H4Xm0NK3y is going to send in the wrong data in order to convince everyone that his penis is longer, I mean computer is faster, because he processed a work unit in three seconds instead of four.

    I think this could work, but only for a very limited class of jobs.

    Process Migration (4.50 / 2) (#17)
    by mlinksva on Thu Feb 21, 2002 at 02:51:40 AM EST

    There's a ton of literature on just this problem. Try googling for process migration. It's really hard.
    imagoodbitizen adobe unisys badcitizens
    QNX Rtp (4.00 / 1) (#23)
    by infinitewaitstate on Thu Feb 21, 2002 at 04:53:06 AM EST

    You might want to look at QNX RTP and its process sharing.

    ... but then again, what do I know?

    Sci Am and "Internet Scale Operating Systems& (4.00 / 1) (#26)
    by porkchop_d_clown on Thu Feb 21, 2002 at 09:20:26 AM EST

    This month's Scientific American has a big feature article on the idea of "internet scale" operating systems that pretty much discusses these ideas. It's sitting on my desk at home, though, haven't had a chance to read it.

    When using a nigh-omniscient computer to run your evil empire, do not install Windows. Also, be sure to disable the AppleTalk protocol - woul

    Can't wait (none / 0) (#50)
    by scruffyMark on Fri Feb 22, 2002 at 02:36:55 AM EST

    Last one I have, the only computer article is about home powerline-based networking. That's what you get for having a subscription in Canada - the newsstands even get it a week or two before us.

    [ Parent ]
    Unintended consequences (2.50 / 2) (#28)
    by bobpence on Thu Feb 21, 2002 at 12:08:58 PM EST

    The argument posed against Napster was that, if people could get the benefits of buying a CD without buying a CD, they wouldn't buy as many CD's. To counter this argument, we point out that MP3 is not quite the same quality as a CD, and that sampling has long been recognized as an effective marketing method (think cheese-on-toothpicks at the grocery store and automobile test-drives).

    Processor time, however, is a commodity. Suppose I have access to enough computing power, and am performing a CPU-intensive task that has essentially zero I/O interrupt time; the only delay is moving it to the other computers to start with, and moving it back when done. I have no incentive to buy more computers or more computer power for my desktop, since the difference experienced does not overcome the cost (e.g. no CD's are better than MP3's analogy and no impetus to buy once I try).

    I don't buy a new computer. Neither does anyone else. Computer production and advancement decline, prices skyrocket. The work eventually exceeds the ability of the network to perfrom it, and rationing occurs because adding more computers is now so expensive.

    "Interesting. No wait, the other thing: tedious." - Bender

    eer, no. (none / 0) (#41)
    by stfrn on Thu Feb 21, 2002 at 05:38:06 PM EST

    I'm not entirely sure what you are trying to say here, but one of the goals of distributed computing is that you would get paid or otherwise compenstaed for the use of your cpu, which would require that the other person would have to pay. In fact if it was profitable, people would buy more computers.

    [ Parent ]
    eer, yes. Or maybe. (none / 0) (#45)
    by bobpence on Thu Feb 21, 2002 at 10:46:15 PM EST

    From paragraph 2 of the post: We get a peer-to-peer network where, instead of sharing files, users share unused computer time. If a user, for example, is rendering an animation, part of the work would be done by other people's computers. In return, when the same user is doing nothing, his or her computer time would be available to other people. (emphasis mine)

    Distributed computing can be useful, but this doesn't seem like an application whose time has come.

    "Interesting. No wait, the other thing: tedious." - Bender
    [ Parent ]

    Works for some things, not for others (5.00 / 2) (#29)
    by hardburn on Thu Feb 21, 2002 at 12:50:54 PM EST

    Such clustering ideas only work if the problem can be easily parallized. Just taking any problem and trying to put it on a cluster is extremely naive.

    The types of problems that work best over an Internet-based cluster are ones that do not need lots of bandwidth or low latency, and are more dependent on CPU time. Examples of these types of problems are breaking encryption, generating the digits of pi, and (I think) chess. In some cases, these problems can even use a Sneaker-Net.

    There are plenty of other problems where you need to have a fast and/or low latency network. Other problems can't be very well parrellized at all, in which case a Beowulf of 2 GHz machines on fiber connections may actualy be slower than a single Pentium 100 at the same task.

    The network should be structured to reduce lag time.

    In many cases, by reducing lag time, you trade off bandwidth. Again, it all depends on the specific job being done. Some problems need lots of bandwidth, but it won't matter much if the lag time is two hours. Other jobs will need their lag time under a millisecond, but they can do it if you're down to 8 bits/second of bandwidth.

    while($story = K5::Story->new()) { $story->vote(-1) if($story->section() == $POLITICS); }

    Forgot to add (none / 0) (#30)
    by hardburn on Thu Feb 21, 2002 at 01:00:25 PM EST

    I should really provide a link to back up my claims.

    You might want to read Robert G. Brown's book on Beowulf clusters. You're sure to run into him if you sit on the Beowulf mailing list long enough. You can get a PDF of his book here, and a PostScript here. Although it focuses on Beowulfs, there is good information there for anyone doing computing clusters.

    while($story = K5::Story->new()) { $story->vote(-1) if($story->section() == $POLITICS); }

    [ Parent ]
    Missing poll option (5.00 / 3) (#35)
    by epepke on Thu Feb 21, 2002 at 04:23:54 PM EST

    I did work on that, and it worked just fine.

    It was at the Supercomputer Computations Research Institute. By about 1992 we had objects essentially living on the network, by about 1993 using a somewhat different system a cluster working in parallel on several systems with one doing the visualization (it won the 0th annual Supercomputing award for "Best Application"), and by about 1995, using again a somewhat different system a nice cluster of 128 machines connected with fiber also with a queuing and load-balancing system that made it all almost seem like a single machine.

    Then around 1998, I emerged from academia into what I laughingly call "the real world," and I found out that nobody cared. Essentially all "Enterprise" O-O books teach exactly the wrong things to do to make your code scale like this. People are gaga over COM or CORBA or .NET or some other recrudescence of 20-year-old ideas. People yawn at sustained throughput and act as if the Big Number on the CPU means everything. (It goes to 11!)

    SCRI is no longer. I'm doing boring development and getting paid a lot more. The guy most responsible for the Supercomputing award did boring work in Japan for a while, making a lot of money. The guy who worked with me on the network object stuff is still in Japan as a system administrator, making a lot of money.

    The truth may be out there, but lies are inside your head.--Terry Pratchett

    Cool (5.00 / 1) (#37)
    by greenrd on Thu Feb 21, 2002 at 05:16:37 PM EST

    Essentially all "Enterprise" O-O books teach exactly the wrong things to do to make your code scale like this. People are gaga over COM or CORBA or .NET or some other recrudescence of 20-year-old ideas.

    I suspected as much. Could you recommend any good books/papers on writing scalable code? With an OO focus? Or any books/paper which talk about why COM/CORBA/.NET etc. are crap?

    "Capitalism is the absurd belief that the worst of men, for the worst of reasons, will somehow work for the benefit of us all." -- John Maynard Keynes
    [ Parent ]

    I wish I knew (5.00 / 3) (#53)
    by epepke on Fri Feb 22, 2002 at 02:44:27 PM EST

    If I had been able to find some, I'd tell you about them. There was a 1-page article in Communications of the ACM a couple of months ago ("Hello World Considered Harmful") that was right on. There is an exceedingly brief section in the Cocal Objective C description, and some of the Java API documentation hints about it. Most books on Smalltalk have it pretty good, but all of these tend to be about the language, which is the least important thing. The philosophy is the most important thing, and although it is easier to apply to some languages, it can be applied to any language, even vanilla procedural and functional languages, as long at it's powerful enough. At least the first two projections I mentioned were in plain, vanilla C but were O-O to the very bone, including constant storage mark-and-sweep garbage collection. Also, each was about 250,000 lines of code but would probably have been a million or more if we had not designed it properly.

    I can suggest some of the books to avoid. Any book that uses the word "classes" when "objects" would do as well is to be avoided. Any book that uses the phrase "business classes" is bad. That is because the scalable philosophy causes one to focus on objects, and a class is just some syntactic sugar to get objects made. People who have the right philosophy, therefore, tend to think "object." Any book that starts off with an example of a bank account as an object with a "withdrawal" method is to be avoided. Any book that builds around UML or a similar notation is to be avoided.

    I can also describe as well as I can the philosophy that works, but obviously I can't write a whole book here. Objects should be thought of as "smart data," not just a trick for "encapsulation." The program is primarily "in" the interaction between objects, not "in" the implementation of the methods. The decisions about how to set up inheritence and which object does what are extremely important, and at least at first, you will have to refactor almost on a daily basis.

    I can also, with an example, try to give you an example of how much fun it is when it works right, based on the first one of these that I worked on, a scientific visualization package called SciAn (now, sadly, history unless I win the lottery and can sit down and rewrite it as Open Source). So, imagine this. There's a thunderstorm on the screen. It's generated dynamically using GL. It's based on several visualization objects, maybe a deformed sheet for the terrain, some isosurfaces, volume visualization, whatever. The image depends on lighting, surface properties, transparency, and also on the data. The data is cooked umpteen different ways, and in this case may even be under control of a different machine (as can anything). There are also some color tables that are fed in. Each of these has umpteen different panels with zillions of little controls. Change the transparency, exaggerate the terrain, put in grid lines and shadows, etc. If any of these changes, the system has to change the image, doing as little work as possible. Question: if you want to add another control to some fiddly bit in the system to change something, how much extra work do you have to do to make sure it comes out in the wash? Answer: essentially none. That's because essentially all of the design is in the current spacetime linkages between objects. It is far too complicated for me to understand at all, let alone sit down and write some dumb UML drawing, but fortunately I don't have to, because it's there and it always works right.

    When we decided to do some network distribution of the objects, most of this had already been written. It wasn't "we're going to make a distributed visualization," but rather, "hey, cool, we could make this distributed. Let's do it and get a demo." The same things that made it easy to add a new control made it easy to put an object somewhere else. Most of the work involved doing the IP layer, and we had to kludge around some things so I learned how to do it better next time, and we had to come up with a clever way of synchronizing the garbage collectors (which turned out already to have been thought of for incremental collectors), but the damn thing worked.

    One day, when those stupid "this is your brain on drugs" commercials were very popular, John and I sat down to make a little geeky joke. We took some EEG data that we had and put it in the middle of a thunderstorm, deforming it by some of the thunderstorm data and coloring it by others. Then we put "This is Your Brain on SciAn" as a title. It took us about 10 minutes. Try that with your Adler-inspired UML Real Enterprise Professional Design Process.

    Part of the trick of using objects properly is to do the opposite of what you are supposed to do: trust. It's a bit like learning how to do recursion. Remember the first time you learned the recursive solution to the Towers of Hanoi and grokked that it really would work even if you didn't trace out what was happening to the stack every millisecond? It's like that. You have to have trust that as long as you make the pairwise or tripletwise interactions between objects even more solid than the rock of Gibraltar, the thing is going to scale. If you aren't sure about the complexity of a certain topology, you can sit down and whip out the graph theory book and prove it for sure, but you also have to feel it in your bones. It seems mystical, but it's true. The magic is in the connections between the objects, and the magic really pays off when there are far too many to draw on a piece of paper. It's also a bit like the game of life; you can write the rules down on a 3 by 5 card, but a glider gun feels magical.

    Everything else people talk about in objects seems to be a way to pretend that O-O design is going to make incompetent developers competent. Some of this does have some importance, such as encapsulation and inheritance. Some of it is a complete joke, such as get and set methods and arguments about what language is better. But all of it pales in comparison to the true magic.

    One more thing: Don't assume that following the object syntax of a language will get you there. If you're using C++, it definitely won't. You can use C++ (or even C, for that matter), but you have to think around the syntax. The map is not the terrain. You're better off developing the skills with Java, Objective C, or SmallTalk and only then applying them to C++ or C or Visual Bloody Basic or whatever else you need to use. Also, a scalable piece of code doesn't look like a piece of school assignment C or C++. It may use C or C++, but only as a means of expressing high-level abstractions. You will be thinking at a level of abstraction much higher than the normal program; you will just encode it. At least 90% of your code is going to be almost the same no matter what language it is in; this is good, no matter what the books tell you.

    Thank you for giving me the opportunity to engage in some useless but pleasant nostalgia.

    The truth may be out there, but lies are inside your head.--Terry Pratchett

    [ Parent ]
    what about people taking advantage of it? (none / 0) (#43)
    by jdtux on Thu Feb 21, 2002 at 09:08:48 PM EST

    so are you proposing anyone is allowed to use the processing power, or people have to apply to use it?

    if it were the former, people could take advantage of it, for say, rendering an animation, using free services to make money. that doesn't sound like such a good idea to me

    RE:People taking advantage of it (none / 0) (#44)
    by qaz2 on Thu Feb 21, 2002 at 09:26:44 PM EST

    Anyone would be allowed to use it. Yes, people could use it to make money. But they are paying for the service, in a way, by allowing others to use their idle computer time. There will be a cap on processing power per person to prevent someone from tying up the network, as mentioned in the article. As also mentioned in the article, perhaps people would be allowed to apply for any processing power not used by the users of the network at any given time.

    [ Parent ]
    Gasp (none / 0) (#48)
    by scruffyMark on Fri Feb 22, 2002 at 02:33:47 AM EST

    Why, if the source code for Linux were made freely available, people could run commercial servers on Linux for free. They would be using the free services of Linux programmers to make money. Can't have that!

    [ Parent ]
    Passwords, anyone? (none / 0) (#47)
    by jzawodn on Fri Feb 22, 2002 at 02:10:21 AM EST

    The ideal task would be one which has a small amount of data and a large amount of processing to do with the data.

    Sounds a lot like password cracking, doesn't it? :-)

    Grid computing... (5.00 / 1) (#54)
    by dreamquick on Sat Feb 23, 2002 at 06:36:14 AM EST

    Isnt the idea being suggested also known as "grid computing"? i.e. where you have a large number of small systems each donating their processing time to form one large.

    By its nature grid computing leads to the formation of a super-computer, but contrary to one of the comments so far there is no need for the component systems themselves to be super-computers - they can just simply be lots of regular systems all contributing CPU cycles.

    While trying to find a definition of grid computing i did find this link which points to a number of commercial and non-commercial grid-computing resources.


    /* #include <comedy_sig.h> */
    No Reason to.. at least not yet. (none / 0) (#55)
    by Quixato on Tue Feb 26, 2002 at 03:24:08 AM EST

    I'm going to echo the statements of an early point to reiterate it - Everyday people, in fact most people, have absolutely no need for distributed computing. There are few applications that can be elegantly solved with a low bandwidth high cpu distributed network. I mean how many people are rendering graphics anyways? I could see that in the future some killer app will be developed that will require more cpu cycles than are available in our desktop machines, but until that day comes, no one needs the spare cycles of anothers' computer to listen to mp3s and chat with friends while idly surfing the net.

    "People are like smarties - all different colours on the outside, but exactly the same on the inside." - Me
    "Learn to question, question to learn." - Sl8r

    You totaly missed the point... (none / 0) (#57)
    by SkullOne on Fri Mar 01, 2002 at 06:12:54 PM EST

    The idea is not to speed up someones MP3's. The idea is that if someone had an -actual, CPU intensive task, that could be run in paralell on multiple systems, that the software take advantage of that. You really missed the point saying why would anyone need MP3's to play better.

    [ Parent ]
    I'm fairly certain you missed my point... (none / 0) (#58)
    by Quixato on Sun Mar 03, 2002 at 01:58:56 AM EST

    We don't need massive p2p parallel systems because there are no real problems that can be solved elegantly with them, at least not right now. Perhaps in the future, but not right now.

    I didn't say that we need mp3's to play better. That's pretty much what I DIDN'T say, thank you.

    "People are like smarties - all different colours on the outside, but exactly the same on the inside." - Me
    "Learn to question, question to learn." - Sl8r
    [ Parent ]

    The Consumer Grid (5.00 / 1) (#56)
    by scmmss on Tue Feb 26, 2002 at 08:13:49 AM EST

    Here at Cardiff University a group of researchers are looking at this very concept.

    Grid computing has been around in academic circles for a number of years but has recently started to take off as IBM and Sun, amongst others, have started to tout the idea to corporate customers. One of the best sources of information is the Global Grid Forum. The Global Grid Forum (GGF) is a community-initiated forum of individual researchers and practitioners working on distributed computing, or "grid" technologies. From the GGF website, Wide-area distributed computing, or "grid" technologies, provide the foundation to a number of large-scale efforts utilizing the global Internet to build distributed computing and communications infrastructures. Most grid applications focus on connecting large machines into so called "Virtual Organisations" where compute cycles and data can be shared easily and securely.

    At the recent GGF meeting in Toronto we presented a working paper document to the JINI working group, entitled The Consumer Grid, which can be read here in word format. Our aim was as the OP proposed, to combine the idea of distributed or grid computing with the lightweight nature of a peer-to-peer network. The goal is to have a general purpose application, grid-enabled using a peer-to-peer network, easily installed and run, that would enable us to use spare computing cycles to perform tasks similar in nature to the SETI project. We already have a visual programming tool called Triana, developed in the Department of Physics and Astronomy at Cardiff, that we are rewriting for this purpose. We also have a suitable problem that we can use as a demonstration, gravitational wave data analysis, overview here. This is a signal processing problem that is "embarrassingly parallel" and hence very suitable to this type of distribution.

    As one poster noted, why would anyone want to donate their cycles to this kind of problem? We have two views the first is that people may have a purely altruistic approach, they like the concept and run the software, as in SETI. The other is that in large organisations, like a university, there are often large labs of PC's or workstations idle. Software that is easily installed and run could use these machines quickly, large scale grid infrastructures on the other hand are notoriously hard to install and administer.

    We hope to have a demonstration of this at the next GGF meeting in June.

    Combining the Ideas of Distributed Computing and Peer-to-Peer Networks | 59 comments (57 topical, 2 editorial, 0 hidden)
    Display: Sort:


    All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
    See our legalese page for copyright policies. Please also read our Privacy Policy.
    Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
    Need some help? Email help@kuro5hin.org.
    My heart's the long stairs.

    Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!