Why distributed computing?
Once networking computers began to take hold it soon became necessary to share resources and data amongst them in a cohesive manner. Today numerous computer applications use distributed computing in one form or the other: From large scale ERP applications that allow monitoring of all the diverse aspects of the average corporation, to file servers, web application servers, groupware servers, database servers and even print servers that enable several machines to share a single printer.
A more recent application of distributed computing that is becoming prevalent now that the processing power of computers is sufficiently advanced and their use became widespread it harnessing the power of many personal computers in tandem and using them to process data in much the same way that mainframes were used in days of yore.
In the beginning there was RPC
The first distributed computing technology to gain widespread use was the Remote Procedure Call (RFC 1831) commonly known as RPC. RPC is designed to be as similar to making local procedure calls as possible. The idea behind RPC is to make a function call to a procedure in another process and address space either on the same processor or across the network on another processor without having to deal with the concrete details of how this should be done besides making a procedure call.
Before an RPC call can be made, both the client and the server both have to have stubs for the remote function that are usually generated by an interface definition language (IDL).
When an RPC call is made by a client the arguments to the remote function are marshalled and sent across the network and the client waits until a response is sent by the server. There are some difficulties with marshalling certain arguments such as pointers since a memory address on a client is completely useless to the server so various strategies for passing pointers are usually implemented the two most popular being a.) dissallowing pointer arguments and b.) copying what the pointer points at and sending that to the remote function.
An RPC function locates the server in one of two ways:
RPC programs, as well as other distributed systems, face a number of problems which are unique to their situation such as
- Hard coding the address of the remote server which is extremely infexible and may require a recompile if that server goes down.
- Using dynamic binding where various servers export whatever interfaces/services they support and clients pick which server that they want to use out of those that supports whatever service is needed.
There are a variety of solutions to solving these problems which can begleaned from the various links provided in this article.
- Network packets containing client requests being lost.
- Network packets containing server responses being lost.
- Client being unable to locate its server.
The need for distributed object and component systems
As distributed computing became more widespread, more flexibility and functionality was required than RPC could provide. RPC proved suitable for Two-Tier Client/Server Architectures where the application logic is either in the user application or within the actual database or file server. Unfortunately this was not enough, more and more people wanted a Three-Tier Client/Server Architectures where the application is split into client application (usually a GUI or browser), application logic and data store (usually a database server). Soon people wanted to move to N-tier aapplications where there are several seperate layers of application logic in between the client application and the
The advantage of N-tier applications is that the application logic can be divided into reusable, modular components instead of one monolithic codebase. Distributed object systems solved many of the problems in RPC that made large scale system building difficult, in much the same way Object Oriented paradigms swept Procedural programing and design paradigms. Distributed object systems make it possible to design and implement a distributed system as a group of reusable, modular and easily deployable components where complexity can be easily managed and hidden behind layers of abstraction.
A CORBA application usually consists of an Object Request Broker (ORB), a client and a server. An ORB is responsible for matching a requesting client to the server that will perform the request, using an object reference to locate the target object. When the ORB examines the object reference and discovers that the target object is remote, it marshals the arguments and routes the invocation out over the network to the remote object's ORB. The remote ORB then invokes the method locally and sends the results back to the client via the network. There are many optional features that ORBs can implement besides merely sending and receiving remote method invocations including looking up objects by name, maintaining persistent objects, and supporting transaction processing. A primary feature of CORBA is its interoperability between various platforms and programming languages.
The first step in creating a CORBA application is to define the interface for the remote object using the OMG's interface definition language (IDL). Compiling the IDL file will yield two forms of stub files; one that implements the client side of the application and another that implements the server. Stubs and skeletons serve as proxies for clients and servers, respectively. Because IDL defines interfaces so strictly, the stub on the client side has no interacting with the skeleton on the server side, even if the two are compiled into different programming languages, use different ORBs and run on different operating systems.
Then in order to invoke the remote object instance, the client first obtains its object reference via the Orb. To make the remote invocation, the client uses the same code that it would use in a local invocation but use an object reference to the remote object instead of an instance of a local object. When the ORB examines the object reference and discovers that the target object is remote, it marshals the arguments and routes the invocation out over the network to the remote object's ORB instead of to another process within the on the same computer.
CORBA also supports dynamically discovering information about remote objects at runtime. The IDL compiler generates type information for each method in an interface and stores it in the Interface Repository (IR). A client can thus query the IR to get run-time information about a particular interface and then use that information to create and invoke a method on the remote CORBA server object dynamically through the Dynamic Invocation Interface (DII). Similarly, on the server side, the Dynamic Skeleton Interface (DSI) allows a client to invoke an operation of a remote CORBA Server object that has no compile time knowledge of the type of object it is implementing.
CORBA is often considered a superficial specification because it concerns itself more with syntax than with semantics. CORBA specifies a large number of services that can be provided but only to the extent of describing what interfaces should be used by application developers. Unfortunately, the bare minimum that CORBA requires from service providers lacks mention of security, high availability, failure recovery, or guaranteed behavior of objects outside the basic functionality provided and instead CORBA deems these features as optional. The end result of the lowest common denominator approach is that ORBs vary so wildly from vendor to vendor that it is extremely difficult to write portable CORBA code due to the fact that important features such as transactional support and error recovery are inconsistent across ORBs. Fortunately a lot of this has changed with the development of the CORBA Component Model, which is a superset of Enterprise Java Beans.
Distributed Component Object Model (DCOM)is the distributed version of Microsoft's COM technology which allows the creation and use of binary objects/components from languages other than the one they were originally written in, it currently supports Java(J++),C++, Visual Basic, JScript, and VBScript. DCOM works over the network by using proxy's and stubs. When the client instantiates a component whose registry entry suggests that it resides outside the process space, DCOM creates a wrapper for the component and hands the client a pointer to the wrapper. This wrapper, called a proxy, simply marshals methods calls and routes them across the network. On the other end, DCOM creates another wrapper, called a stub, which unmarshals methods calls and routes them to an instance of the component.
DCOM servers object can support multiple interfaces each representing a different behavior of the object. A DCOM client calls into the exposed methods of a DCOM server by acquiring a pointer to one of the server object's interfaces. The client object can the invoke the server object's exposed methods through the acquired interface pointer as if the server object resided in the client's address space. All DCOM components and interfaces must inherit from IUnknown, the base DCOM interface. IUnknown consists of the methods AddRef(), Release() and QueryInterface(). AddRef() and Release() are used to for reference counting and memory management. Essentially, when an object's reference count becomes zero, it must self-destruct.
Remote Method Invokation (RMI) is a technology that allows the sharing of Java objects between Java Virtual Machines (JVM) across a network. An RMI application consists of a server that creates remote objects that conform to a specified interface, which are available for method invocation to client applications that obtain a remote reference to the object. RMI treats a remote object differently from a local object when the object is passed from one virtual machine to another. Rather than making a copy of the implementation object in the receiving virtual machine, RMI passes a remote stub for a remote object. The stub acts as the local representative, or proxy, for the remote object and basically is, to the caller, the remote reference. The caller invokes a method on the local stub, which is responsible for carrying out the method call on the remote object. A stub for a remote object implements the same set of remote interfaces that the remote object implements. This allows a stub to be cast to any of the interfaces that the remote object implements. However, this also means that only those methods defined in a remote interface are available to be called in the receiving virtual machine.
RMI provides the unique ability to dynamically load classes via their byte codes from one JVM to the other even if the class is not defined on the receiver's JVM. This means that new object types can be added to an application simply by upgrading the classes on the server with no other work being done on the part of the receiver. This transparent loading of new classes via their byte codes is a unique feature of RMI that greatly simplifies modifying and updating a program.
The first step in creating an RMI application is creating a remote interface. A remote interface is a subclass of java.rmi.Remote, which indicates that it is a remote object whose methods can be invoked across virtual machines. Any object that implements this interface becomes a remote object.
To show dynamic class loading at work, an interface describing an object that can be serialized and passed from JVM to JVM shall also be created. The interface is a subclass of the java.io.Serializable interface. RMI uses the object serialization mechanism to transport objects by value between Java virtual machines. Implementing Serializable marks the class as being capable of conversion into a self-describing byte stream that can be used to reconstruct an exact copy of the serialized object when the object is read back from the stream. Any entity of any type can be passed to or from a remote method as long as the entity is an instance of a type that is a primitive data type, a remote object, or an object that implements the interface java.io.Serializable. Remote objects are essentially passed by reference. A remote object reference is a stub, which is a client-side proxy that implements the complete set of remote interfaces that the remote object implements. Local objects are passed by copy, using object serialization. By default all fields are copied, except those that are marked static or transient. Default serialization behavior can be overridden on a class-by-class basis.
Thus clients of the distributed application can dynamically load objects that implement the remote interface even if they are not defined in the local virtual machine. The next step is to implement the remote interface, the implementation must define a constructor for the remote object as well as define all the methods declared in the interface Once the class is created, the server must be able to create and install remote objects. The process for initializing the server includes; creating and installing a security manager, creating one or more instances of a remote object, and registering at least one of the remote objects with the RMI remote object registry (or another naming service such as one that uses JNDI), for bootstrapping purposes. An RMI client behaves similarly to a server; after installing a security manager, the client constructs a name used to look up a remote object. The client uses the Naming.lookup method to look up the remote object by name in the remote host's registry. When doing the name lookup, the code creates a URL that specifies the host where the server is running.
Pick your poison
I'm a big fan of Java so I'm partial to both RMI and CORBA. Anyone who has had experience with any of the three aforementioned technologies is welcome to post below.
References and further reading
Component Engineering Corncupia
Java RMI Tutorial
Introduction To CORBA (uses Java)
Dr. GUI's Gentle Guide To COM
Using CORBA and Java IDL