I'm going to assume you're a working programmer who comes from a C, C++, Java, or similar background, so I'm not going to go into the details of how Scheme differs from other more-or-less exotic languages like Common Lisp, Standard ML, or Haskell, as interesting and worthy of discussion as those languages are. Suffice it to say that many of the ideas I present here as features of Scheme are also present in some other languages.
Furthermore, there are a large variety of Scheme systems out there, and they tend to do some important things in different ways. In this article I'll be talking about a particular distribution, PLT Scheme; the code in this article will not run in other Schemes without some modification. Most serious Scheme systems have equivalents or near-equivalents to everything I use here, but the details differ.
Those disclaimers out of the way, let's get on to the good stuff.
I figured the best way to explain why I like Scheme would be to show something I've actually had occasion to use in real work: a multithreaded port scanner. Here it is, written basically the way I'd really write it if I needed to:
; scan : string[hostname] (listof int) -> listof (list int string)
; gives the number and well-known service name of each port in the given
; list that is open on the given host
(define (scan host ports)
(lambda (p) (list p (port->name p)))
(open-ports host ports)))
So what does this do? I've defined a function called scan that takes two arguments: a string called host and a list of numbers called ports. It gets every open port using the function open-ports we'll define later and returns each one paired with its service name. To understand how it does it, we're going to need to make a brief diversion into looping.
A brief diversion into looping
Scheme doesn't do looping the way mainstream languages do. That leads a lot of people to think that Scheme doesn't do looping at all, which isn't true — Scheme actually has a very sophisticated and flexible way of writing loops that just takes a while to get used to (no pun intended).
The "assembly language" of loops in Scheme is recursion. Recursion in Scheme is not at all inefficient and does not waste memory, and frequently leads to clear and elegant solutions to problems. For instance, here's an implementation of a range function (similar to features of Python and Perl) that takes two numbers and builds a list of all the numbers between them in order, written recursively:
(define (range low high)
[(> low high) null]
[else (cons low (range (+ low 1) high))]))
This might look alien at first, but there's a very compelling logic to what the function says: a range where the low number is higher than the high number is just an empty list (represented by null) and any other range is just a list consisting of the low number followed by another range with the same high point but a low point that's 1 bigger (cons is a function that takes an element and a list and returns a new list consisting of the given element followed by the given list). It's a logically elegant way of thinking of the problem that also happens to be an executable program. (Of course the fact that it's elegant doesn't mean it's easy to write or read. It turns out, though, that with a little practice you can actually write recursive functions more easily than their iterative counterparts. See How to Design Programs for an online textbook that teaches that skill.)
Of course recursion isn't the end of the loop story. I say recursion is the assembly language of loops because while it's the fundamental construct that allows you to get looping to happen in Scheme, real Scheme programmers frequently use constructs built on it rather than using it directly. Since in Scheme declaring a nameless function is a simple affair that doesn't even need to use a single line of text, a common idiom is to write loops as functions that express the essence of a particular style of loop and take another function to use as the body.
The function map I used in the scan function is one example of such a loop. It accepts a function and a list as input, applies the function to each individual element of the list, and returns a new list containing the results. So for instance (map add1 (list 0 1 2 3)) would yield (list 1 2 3 4). It's built in, but if we wanted to implement it, we might write something like:
; map : (X -> Y) * (listof X) -> (listof Y)
(define (map func a-list)
[(null? a-list) null]
[else (cons (func (car a-list)) (map func (cdr a-list)))]))
where car extracts the first element of a list and cdr returns the rest of the list without the first element. (Sorry if I just did somebody's homework, by the way.)
With this in mind, we can get back to our port scanner.
On with the show
So what's going on with the scan function? It's mapping a function that takes a port number and produces a list containing a port number and a service name over the list of all open ports (the result of calling open-ports). What we'll get back is a list consisting of port-number/service-name pairs, which is exactly what we want.
Next we need to implement open-ports. Here it is:
(require (lib "list.ss")) ; for filter
; open-ports : string[hostname] (listof int) -> (listof int)
; returns the sublist of numbers that represent open ports on the
; given host, performing all checks concurrently
(define (open-ports host ports)
(filter (lambda (x) (not (eq? 'closed x)))
(lambda (port) (if (can-connect? host port) port 'closed))
The open-ports function checks every port in its list in parallel to see if they are open, and, when it's done, uses a library function called filter to take the closed ports out of the list. It gets all the open ports concurrently with the help of two new functions: threaded-map and can-connect?. Since can-connect? is easier, I'll explain it first.
can-connect?: exceptions and sockets
The can-connect? function isn't bad at all:
; can-connect? : string[hostname] number -> bool
; determines if the host is listening on the given port
(define (can-connect? host port)
(with-handlers ([exn:i/o:tcp? (lambda (e) #f)])
(let-values ([(ip op) (tcp-connect host port)])
(close-input-port ip) (close-output-port op) #t)))
It's just a small wrapper around the tcp-connect library function, which returns an input port and an output port if it succeeds at connecting and raises an exception if it doesn't. PLT Scheme has a built-in exception mechanism very similar to Java's or C++'s in that any code can raise an exception, and that exception propagates up the stack until it reaches an appropriate handler (specified by try/catch in C++ or Java, with-handlers in PLT Scheme). The can-connect? function calls tcp-connect and returns #t (true) if it succeeds. If tcp-connect raises an i/o:tcp exception (which it will do if it can't establish a connection), with-handlers catches that and returns #f (false).
threaded-map: cocurrency, finally
We want threaded-map to behave just like map, which we've seen before, except that while map applies a function to each element sequentially, we want threaded-map to apply the function to each element in parallel. Here's code to do that:
; threaded-map : (X -> Y) * (listof X) -> (listof Y)
; maps the given function over the given list with each computation
; done in parallel
(define (threaded-map f l)
(let ((cs (map (lambda (x) (make-channel)) l)))
(for-each (lambda (x c) (thread (lambda () (channel-put c (f x))))) l cs)
(map channel-get cs)))
PLT Scheme uses concurrency mechanisms taken from Concurrent ML to create and synchronize threads. Rather than locks on shared memory, the CML primitives allow threads to send values to each other over channels and block until those values are transmitted. In this case, threaded-map creates a new channel for each item in the input list, spawns a thread for each item that runs the input function on it and then sends the result over the appropriate channel. Once all threads are spawned, the function builds a new list by getting values out of each channel in order.
With that done, we've got everything we need to detect open ports, and all that's left is to look up service names.
port->name: libraries, HTTP communication and regular expressions
There's a little complication in implementing port->name. If I only wanted my program to run on Unixish systems I might just call the getservbyport C function through PLT Scheme's foreign function interface (it only takes a few lines, actually), but I'd like it to run on Windows and Macs and various other platforms that don't have access to getservbyport (or the /etc/services file it uses). Fortunately, the Internet Assigned Numbers Authority has an online list of ports in the same format as /etc/services, so we'll make our program fetch that if a local file isn't available. Here's the code:
(require (lib "url.ss" "net")) ; for get-pure-port and string->url
(let ([ip (if (file-exists? "/etc/services")
(get-pure-port (string->url "http://www.iana.org/assignments/port-numbers")))]
(while m (regexp-match "([^ \t\r\n])[ \t]+([0-9]+)/tcp[ \t]+([^\r\n])" ip)
(hash-table-put! nametable (string->number (list-ref m 2)) (list-ref m 1)))
(define (port->name p) (hash-table-get NAMES p (lambda () "unknown")))
After loading in a library that defines some useful HTTP-related functions, we define the global variable NAMES to be a hash table that maps numbers to strings. To do that, we get a new input port that's connected to either /etc/services (if it exists) or to the IANA's web page, and we also make an empty hash table. Then we use a while loop to iterate over the input port getting each port-number/name combination (as determined by a regular expression) and add each to the hash table, and finally we return the hash table which becomes the value of NAMES. Unfortunately Scheme has no while loop, but since it's such a nice fit for this problem we'll just define it ourselves.
while: the power of macros
We've implemented lots of loops so far, but while is especially tricky because of the way we want to use it. In our previous loops we required users to pass in an anonymous function, and furthermore none of our loops introduced their own variable names. We could probably make a variant of the while loop that conformed to those expectations, but I'm stubborn and I'd rather have Scheme conform to me than to have me conform to it. If I were programming in Java or C, I'd just be out of luck, but Scheme lets me have my way. Scheme has a very unusual facility called macros (a distant cousin of C macros, the well-known black sheep of the family) that take a piece of a parse tree as input and produce another one as output, allowing programmers to automatically rewrite code as they see fit. They've got two key features that make this source-code rewriting tractable and useful: First, they operate on syntax trees, not on characters. Second, they know enough about the language to automatically avoid unintentional name collisions, a property we'll see more about in a minute.
We can use macros to make a while loop that will behave just like I want it to. Here we go:
(define-syntax (while stx)
(syntax-case stx ()
[(_ var test body)
#'(let loop ((var test))
(when var body (loop test)))]))
It's scary-looking, but don't fear, it's really pretty simple once you get used to the way macros are written. There's some junk at the top that says that this is a macro and sets us up to do some work, but the important stuff begins on the third line, which specifies a pattern for what a use of while should look like. In particular, each use should look like
(while [some-variable] [some-test] [some-body])
where the variable has to literally be a legal variable name but the test and the body can be any old Scheme expression. It also gives each of these pieces a name (var, test, and body, respectively). Those names are important for the rest of the macro: starting with the #'(let ...) line, the whole rest of the code isn't actually code, but rather a template for what the output syntax should be, and it's returned literally except that every time var, test, or body appears in the template, whatever actually appeared in that blank is substituted in instead.
Once you get the hang of it, macros like these are pretty simple. For example, our use of while in defining the NAMES table turns into the following after Scheme expands it (code from the original scan in bold, code introduced by the macro in normal font weight):
(let loop ((m (regexp-match "([^ \t\r\n])[ t]+([0-9]+)/tcp[ \t]+([^\r\n])" ip)))
(hash-table-put! nametable (string->number (list-ref m 2)) (list-ref m 1))
(loop (regexp-match "([^ \t\r\n])[ \t]+([0-9]+)/tcp[ \t]+([^\r\n])" ip))))
Of course we won't actually ever see this code or type it in — Scheme generates it for us behind the scenes when it runs our program. What the expanded code actually does is use one of Scheme's built-in loops, the so-called "named let," to run the test expression and bind it to the variable m. The named let syntax lets you type in an identifier (when I program, I almost always use the identifier loop, but you could use charlie or frobnosticate or whatever) and a body, where in the body invoking the identifier as a function makes the program run another iteration of the loop. In the case of while, if m isn't false (in Scheme anything that isn't #f is true), it evaluates the body with m still bound to the test's result and then loops, checking the test again and again until eventually m is #f.
One of the neatest things about Scheme macros is that variable names actually know whether they're "printed in bold" (in code written by the user) or not (in macro-generated code), and boldface variables don't interfere with normal variables and vice versa. So, taking the macro above for example: since the identifier loop is introduced without boldface, if the name loop happened to appear in boldface in the expanded code (for instance because the while loop was nested inside some other loop that used the let loop convention) the two names wouldn't interfere with each other. Because of that property, it's relatively simple and easy to make new syntax that you can think of as being built-in, and allows Schemers to write huge complex macros like object-oriented programming syntax and embedded that make Scheme into a programmatic PowerPoint-like slideshow definition language.
Anyway, since the while loop was the last thing we needed to write, we're done and we've got ourselves a complete working port scanner. Hooray!
Wait a minute, where's the I/O?
I don't need any, though of course if we wanted to we could easily add it in. Scheme, like Python, is designed around an interactive read-eval-print loop. I can open DrScheme, type in all the definitions, and then just use my function however I want. Here's an example run:
> (scan "localhost" (range 1 100))
((22 "ssh") (25 "smtp"))
Now that we're finished with the scanner, we should put the finishing touches on and bundle it up as a library. That's pretty easy: in a file called scanner.ss, we'd write
(module scanner mzscheme
(require (lib "contract.ss"))
(scan (string? (listof natural-number?)
. -> .
(listof (list/p natural-number? string?)))))
[... all the above code ...])
This declares a library module that provides only the scan function. Furthermore, it attaches a contract to scan, sort of an optional type-check that occurs at run-time rather than compile-time and can be much more flexible for that reason. Every time the function gets invoked, the contract verifies that it's being given the proper arguments (in this case a string and a list of natural numbers) and that when it returns it's giving back the right sort of thing (in this case a list of lists where each sublist consists of a natural number and a string). These contracts don't force static typing on programmers but let them get a lot of the benefit of static typing anyway. In my experience judicious use of contracts is essential for large software projects.
There's lots more reading you can do that elaborates greatly on issues I've barely scratched the surface of in this article. Two great books that introduce programming using Scheme are available free online: Structure and Interpretation of Computer Programs by Abelson, Sussman, and Sussman, and How to Design Programs by Felleisen, Findler, Flatt, and Krishnamurthi. Both books cover the foundations of Scheme well and are very accessible books.
For more advanced topics, I'd recommend you go to (read scheme), an online bibliography of Scheme-related research. There you'll find much deeper treatments of many of the topics I've talked about here as well as comprehensive discussions of important topics I've omitted entirely (continuations, web programming, object oriented programming, efficient language implementation, behavioral contracts, and many more).
Of course you can also read the de facto Scheme standard, called the R5RS, online. Another good starting point is www.schemers.org.
Executing the code
If you actually want to execute this code, install DrScheme (when it asks what language level to use, set it to "(module ...)"), open it, and copy all the code from this article into the definitions window (the one on top). Hit the "execute" button and you're off. You can use any of the functions from the program directly in the interactions window.
So, we've written a multithreaded port scanner, nicely bundled up as a library for use in our other programs. In the process we've touched on a lot of my favorite features of Scheme (and PLT Scheme in particular): its clever loop constructs, its higher-order functions, concurrency, macros, modules, and contracts. The interesting thing about all these features is that at nearly every stage of the process Scheme gave me the ability to say what I meant, or failing that gave me the tools to let me build a way to say what I meant. Using that feature I could make the language do things my way rather than it making me do things its way. Schemers take full advantage of that ability, and it's hard to find a group of programmers more stubborn in their pursuit of writing down exactly the concept they are trying to convey rather than having to clutter their programs with extraneous concepts.
Those of you who've seen Scheme before, especially as part of a college class, might be surprised at how different this code looks from what you saw there. That's another interesting point: people frequently learn Scheme as though it were a functional language. It's true that Scheme lets you program in a functional style, but I think it's really more properly thought of as the quintessential multiparadigm language than a functional language. You can write declarative, imperative, object-oriented, aspect-oriented, logical-reasoning, lazy, strict, functional-reactive, or any other kind of programs you want, all using exactly the syntax and semantics you want, all in Scheme, all working together. To my way of thinking, that's a great reason to like Scheme. Give it a shot.