The open source development
model has brought significant benefits to software creators. Opening the
source code of a project attracts developers, results in more stable software,
and introduces unexpected creativity. Every programmer reading the source
code is another programmer who can find subtle bugs. Every new perspective
can result in unexpected breakthroughs, like the extensions to Zope and
the modules of Apache. The benefits of the open source model have become
clear over the past few years and more and more companies have begun to
open source their work.
doubt the advantages that open source brings to software development,
but most believe the benefits stop when the code is shipped. A lack of
observable consequences prevents users from seeing any differences between
the finished products of open and closed source development. Presented
with the binaries of two different libraries, a software developer cannot
determine, simply by using them, which one is open source and which one
is closed. Presented with two spreadsheets, an end user cannot tell from
using them which was developed openly..
Open reuse makes open source
a technology and not just a development model. It eliminates the traditional
problems of entropy death and frozen APIs. An open reuse library has advantages
over a closed source counterpart even after it is compiled and
shipped. The license under which a project is distributed becomes part
of its technology.
reuse has always been tied to the openness of code. The most primitive
reuse model is "Copy and Paste." With copy and paste, a programmer literally
copies the source code from one place and pastes it into another. Rediscovered
every year by novice programmers, copy and paste is the worst reuse model.
Its most serious flaw is that the multiple copies of code it creates makes
it extremely difficult to guarantee that all of the code has been updated
throughout a project.
APIs are a significant
improvement over copy and paste. In
API reuse, software is released in unreadable bundles that are made to
perform actions through a small set of public names (i.e., APIs). After
the source code is compiled, these names are the only parts of a software
project that are open. Clients that want to make the software perform
actions use APIs to tell the software what to do. In this way, the
same code can be reused in different parts of a program simply by calling
a method or instantiating an object.
But there is a cost. In the
process of using APIs, clients create intrinsic dependencies
on the names of the library. As a result, every year programmers all
over the world update their code to the new APIs of the libraries they
depend upon. When heavily used libraries like core
Window APIs change, the entire industry rewrites its programs.
As software projects mature and programmers incorporate more and more
libraries, the number of intrinsic dependencies in code increases sharply
until they reach a critical point. At that point, the cost of moving the
algorithms in the code to new technologies is greater than the time it
would take to rewrite them.
The programmers of commonly
used APIs also suffer as a result of API reuse. Everyone who has written
a heavily used library is aware of the paradox of success. As the number
of users of that library increase, the ability of the library's programmers to change
APIs decreases proportionately. Every change the library writer introduces
forces users of the library to rewrite code. The more revolutionary the
improvements to a library, the less likely that clients will adopt them
due to the additional work that is required. APIs become victims of their
The problem that both the creators
and users of libraries experience are some of the largest in the industry.
These problems are going to become exponentially worse as software moves
to the Internet. A single SOAP component for a credit card service could
have a thousand dependencies, none of them known to the provider. Those
dependencies, in turn, could be providing services themselves and have
millions of dependencies. In this way, a change in one component could
cause millions of services to stop working and cause problems for tens
of millions of people.
There is a direct relationship
between how software can be reused and how open it is. Copy and paste,
for example, requires that all of the code be open while the traditional
API model only requires that method signatures be open. If none of the
code is open, neither API reuse nor copy and paste are
possible. If only some of it is open, the only reuse that is possible
is API reuse. In general, as source code is opened, the
number and types of reuse models that can be used by software developers
"Open reuse", like
copy and paste, requires all of the source code to be open. It solves
the problems of the API model by making software more flexible than is
traditionally possible with closed source libraries. In contrast to API
reuse, which puts direct dependencies on the syntax of library names and
violates the encapsulation discussed in "Word
Oriented Programming," open reuse couples syntactical and semantic
dependencies together. This coupling brings a flexibility to reuse that
eliminates entropy death and API freezing.
The reuse model is simple:
writers of APIs manage API calls for their clients. Clients reuse software
by writing parsable descriptions of their needs. These descriptions are
then parsed to generate the source code that calls them or mapped to libraries
by generators through internal API calls. From a software developer's
perspective, it is as if software rewrites itself to adapt to changes
in technology over time.
Open reuse is already beginning
to appear in the open source world. The most common form occurs when a
CGI script reads an XML description of a web page and generates versions
for Internet Explorer, Netscape, or small wireless devices.
In contrast to cross-browser libraries, generators and parsable descriptions
give web authors the ability to control their dependencies on different
browsers. In addition, because all of the dependencies of a platform are
in the generator rather than scattered throughout hundreds of files, the
generator is the proof that a browser is being supported correctly.
There are two reasons that
open reuse has not become more widespread. The first is that it requires
source code - i.e., descriptions - to be open. The second and historically
more difficult is that it requires the syntaxes of different domains to
be integrated. Even simple programs like FTP clients integrate networking,
output to a user, and the file system. All of these tasks can be done
in different ways and would benefit from open reuse. But since
it is very difficult to integrate different syntaxes in traditional programming
models, FTP clients are not written with parsable descriptions.
Enter word-oriented programming.
Word-oriented programming naturally uses open reuse through its coupling
of syntactical and semantic relationships. In word-oriented programing, it is possible to inherit
the data, behavior, and rule/relationships of words. The result
is scalable, parsable languages that can be richly integrated with each
other. In this way, a client can use integrated network, GUI, and file
syntaxes to describe a complex solution like an FTP client. These complex
descriptions can then be mapped through inheritance to Java, Windows,
GTK toolkits for display, HTTP, SMTP for network transport, and any of
a number of file storage technologies.
Consider a word-oriented HTML
syntax. Any of the words in the syntax - HTML, Body, Table, Tr, Img, DIV,
etc. - can be overridden and given a different meaning. A GTKBody word
will match 'Body,' and create
a GTK window. A QTBody will match 'Body' and
create a QT window. In this way, the same HTML page can have two different
meanings depending on which word it is instructed to use at runtime. Open
reuse is not the purpose of word-oriented programming any more than the
API model is the purpose of structured and object-oriented programming.
It is simply the standard method for reusing software.
The only types of programs
that word-oriented languages and open reuse can produce are open source.
A developer who releases a word-oriented program has released
the source to the program, even if it is compiled and even if the libraries
it uses are compiled. This is because the words in the libraries that
the program uses can be inherited and instructed to call their parents
and print their symbol. As the program runs under the new interpretations,
it will print itself out. It is possible to release a single static binary,
but then all of the benefits of word-oriented programming are lost.
Open source is a fundamental part of the programming model.
the code produced by word-oriented languages, even if distributed
in binary form, always has the following characteristics:
The source code
is retrievable in the form of a parsable description.
The meaning of
that description can be changed through simple inheritance of data,
methods, and rule-relationships.
These two characteristics are
very close to the characteristics of code that is produced under the GPL
and other open source licenses: open access to source code and the freedom
to distribute modifications of it. What is different
is that the code of word-oriented programs is open, not because the licenses
legally require it, but because the code cannot be closed without eliminating
all of the benefits of the programming model. The licenses transition
from being legal documents to part of the technology of the project.
look at the Benefits to the Open Source World
When we read books, we often
discover that a phrase or a word has a different meaning than we originally
thought. The phrase "What do you mean by that?" turns out not
to have been asked by a student, but as a prelude to a bar fight. Chris
turns out to be the girl and Pat the boy in a relationship. When this
occurs, we simply remap definitions to create a new interpretation of
what we are reading. This comes to us so naturally that we are unaware
of how remarkable it is. In the software world, when APIs change, programs
Today, there are well over a hundred
different open source platforms for developing applications and web sites.
Some popular projects with strong communities are Python, Perl, PHP, wxWindows,
GTK, QT, and the Apache XML tools. Every year, these projects release
new versions of their libraries with changes to their APIs. Every year
their clients have to update their source code to benefit from the changes
in the new libraries.
cycle of release and update limits the speed at which projects can evolve.
The full benefits of a new version of the KDE desktop, for example, are
only realized after all of the KDE application developers adopt the new
KDE APIs. The process of adoption
is frequently longer than the time it takes to update the platform.
In addition, because the platform needs clients to verify that edge cases
have been correctly accounted for, libraries do not reach full stability
until widespread adoption.
Open reuse offers an alternative
model. With open reuse, every time developers release a new version of
their libraries, they release a generator that maps the old descriptions
to new APIs. In this way, all of the clients are instantly updated to
the new APIs, greatly accelerating the development cycle. The edge cases
of the platform can be instantly tested because all of the clients are
test cases for the new APIs. The traditional cycle of releasing a platform
and updating clients to that platform is eliminated.
The stability of the platform
is also increased. Rather than depending on all of the clients to correctly
use libraries and scattering dependencies through thousands of client
files, the generator is proof that the clients are correctly using the
APIs. Today, when clients incorrectly use APIs,
programmers have to painstakingly search through thousands of files to
eliminate bugs and potential security flaws. With open reuse, the security
flaws that arise from incorrect use of APIs are localized in a single
program and can be instantly fixed by changing the way the generator maps
Open reuse also benefits the
clients of libraries. Parsable descriptions
have a natural tendency to attract more and more generators to themselves
over time. HTML can now be read by dozens of different browsers.
The proliferation of browsers for HTML occurs because a new browser instantly
benefits from all of the content that has already been created in HTML.
By creating a generator that internally maps HTML dependencies, KDE's
Konqueror automatically benefited from the content of the Web.
In practice, this means that
clients that use parsable descriptions can expect more and more generators
to be written for their code. As support increases, the code of a project
will be able to be remapped instantly to a different language, a different platform,
a different technology. A library written with the intention of using
CORBA as its networking layer and Java SWING as its interface will be able to
instantly switch to GNOME and SOAP by using their generators.
Open reuse means that algorithms
are no longer locked into the particular platforms or technologies they
were originally designed for. Words in software projects that were originally
designed to target one set of specifications regarding graphical toolkits,
network technologies, and mathematical frameworks can be used to target
a different set. Apache can benefit from words from GNOME, GNOME can benefit
from words from PERL, and PERL can benefit from words from Python.
Humans adjust automatically
when the meaning of a word or phrase changes. Today's software does not
- it just crashes. The API model is a dead albatross from the world of
closed source software and should be abandoned. In a web of software only
a minuscule fraction of the current Web, a change in a single API could
render entire sections inoperable. As software moves to the Web, as the
service model grows in importance, as the number of dependencies increase,
closed source solutions will no longer be practical. Scalable network
models moving on Internet time demand open parsable descriptions.
Open parsable descriptions
have a large history of prevailing over closed standards. Although
there are many closed source alternatives to HTML, from Microsoft Word
documents to Adobe PDF files, none of these standards has experienced
the success of HTML. HTML can be read by dozens of different browsers.
Every year, these HTML browsers undergo
major changes and operating systems change their
APIs. Yet the Web, those billions of pages, requires no rewrites.
These facts change our view
of software licenses. In the past, it has generally been assumed that
although there are substantial differences between closed source and open
source development models, there are no technology differences between
the finished products. With word-oriented programming and open reuse,
though, the license is part of the code. Closed source is not just
an inferior way to develop software, it is inferior technology.
A heavily revised version of The Word Model,
based on feedback from dozens of Kuro5hin readers,
has been posted. A FAQ is also
up with answers to the most common questions concerning word-oriented programming and open reuse. Many
thanks to Eric Raymond for his terminology suggestions. Any additional clarity is a result of
his gracious help. All mistakes are the author's.
1. See Eric Raymond's
and the Bazaar" for the benefits open source licenses can bring
to the development process if managed correctly.
2. The widespread
belief is that when programmers have finished fixing all of a program's
critical bugs, when the project has been converted from source code to
an binary package, and the customer is looking at the finished software
product on the shelf, the fact that software was developed openly no longer
matters. As the customer weighs one program against another, they choose
the program with the best support, the largest feature set, and the greatest
stability. The fact that a library or program was developed openly plays
only an incidental role in the decision making process.
As software moves
to the Internet, the importance of open source in the decision making
process is expected to decrease even further. Clients of a SOAP service
do not care if the service is implemented with closed or open source software:
all that matters is that the service works. Nor are the providers of the
service under any obligations, under the terms of current open source
licenses, to release their changes to the software back to the community.
There are widespread worries as a result that the trend of software moving
to the Internet presents a grave challenge to the open source movement.
This paper explains why those worries are unfounded.
3. Today, the industry
continually refactors code as technology changes to avoid entropy death.
Programs are still fragile, however, whenever there is a substantial technology
shift. A few years ago most popular programs depended on libraries from
Microsoft. When the Internet, Linux, and other disruptive technologies
became important, these dependencies become liabilities and prevented
companies from moving quickly to new markets. In many cases, companies
rewrote their applications or critical libraries from scratch to make
them usable with new Internet-driven technologies. These rewrites were
very expensive and prevented companies from moving to new markets quickly.
4. The problems with
having numerous dependencies on closed source libraries is well documented
in many books. For example, a popular book from Microsoft Press - Maguire,
the Development Process, Microsoft Press, (c) 1998, p. 15 - says:
"One of the easiest ways for your project to spin out of control
is to have it be too dependent on groups you have no control over."
5. This analysis
has been heavily influenced by the work of Simon Phipps from IBM. Readers
are strongly encouraged to read his work: "Parallel
worlds: Why Java and XML will succeed" and "Escaping
Entropy Death: Where XML Fits in and Why."
6. For the open source
world, the consequences of the Traditional API Model are even more severe.
Both GNOME and KDE are great desktops created by talented programmers.
A rivalry has arisen between two because of the consequences should one
of these desktops become the category leader.
7. The Apache
Project is using parsable descriptions to generate different content
for a variety of platforms. Cocoon,
one of the most interesting projects on the site, uses XSL stylesheets
to render HTML, PDF, XML, WML, and XHTML from XML documents.
8. See http://www.opensource.org
for a discussion on the different software licenses currently in use.
9. The reuse model
described in this paper allows for better disambiguity algorithms than
in traditional natural language theory. A very simple algorithm, for example,
is to have a disambiguity algorithm assume an interpretation until
the interpretation "breaks." Then, by remapping the closest
dependency with an alternative path, try again. Although inelegant, such
an algorithm provably disambiguates content as new information is processed
and also appears to model human behavior. For an introduction to the disambiguity
problem, see Norvig and Russell, Artificial
Intelligence: A Modern Approach, (c) 1995, ch 22.
10. Thanks to Eric
Raymond for terminology suggestions and his observation on generators
as proofs of correct library use.
of memes are tunes, ideas, catch-phrases, clothes fashions, ways of making
pots or of building arches. Just as genes propagate themselves in the
gene pool by leaping from body to body via sperm or eggs, so memes propagate
themselves in the meme pool by leaping from brain to brain ..." Dawkins, The Selfish Gene.
12. There are a number
of popular browsers, the two most popular being Mozilla
and Internet Explorer.
13. Mozilla was a
complete rewrite of the original Netscape code. See http://www.mozilla.org
for a complete history of the open source project.
14. Java, in contrast,
is a different story. While HTML pages are open for all to read, Java
programs are distributed as binaries that are run by the Java Virtual
Machine (JVM) on a user's platform. Because Java code is always released
in binary form, Java clients are forced to make intrinsic dependencies
on specific Java virtual machines. When Java was a young language and
there was only one Java virtual machine, this was not a problem. But with
the advent of Java SWING and new libraries in Java 1.2, Java 1.3, and
now, Java 2.0, Java programs written for an older version of the language
do not work with new JVMs. If the Web was built around applets rather
than HTML, the entire Web would have to be rewritten every time a new
JVM was released.
This is not a criticism
of Java, just its reuse model, which the majority of the software world
uses today. Cross-platform libraries promised to solve the problem of
entropy death by providing a common API for incompatible platforms. Java,
for example, promised to leverage code across multiple operating systems,
eliminating the need to rewrite code for each new platform. Its wide adoption
has resulted from this promise and SUN's aggressive marketing.
solutions do not live up to this hope. They do not solve the problems
of entropy death or API freezing. Clients are still required to create
intrinsic dependencies on those platforms. When the market shifts rapidly,
the clients have no recourse but to wait for the platform to adapt to
the market or rewrite their code to new APIs.
The developers of the cross-platform solution,
in turn, experience all of the standard problems of frozen APIs
as they acquire more and more clients.
Many users of Sun's
Java solution discovered the extreme limitations of the cross-platform
approach when KDE and GNOME arrived on the scene. In the first two years
of their existence, neither of these platforms were supported by Java.
As a result, open source Java programmers that wanted their programs to
work in these environments either had to content themselves for a long
wait or rewrite their code in another language.