The Cathedral and The Bazaar

Eric S Raymond's famous 1997 essay on the Linux style of open source is here: cathedral-bazaar.pdf. In it Raymond compares the Linux approach (the bazaar) to the GNU approach (the cathedral), and finds the former to be much more responsive to user needs. The essay eventually became a full-length book.

Raymond had been a longtime contributor to open-source, in the Berkeley and GNU worlds.

At the start of the essay, Raymond writes

But I also believed there was a certain critical complexity above which a more centralized, a priori approach was required. I believed that the most important software (operating systems and really large tools like the Emacs programming editor) needed to be built like cathedrals, carefully crafted by individual wizards or small bands of mages working in splendid isolation, with no beta to be released before its time.

Linus Torvalds’s style of development—release early and often, delegate everything you can, be open to the point of promiscuity—came as a surprise. No quiet, reverent cathedral-building here—rather, the Linux community seemed to resemble a great babbling bazaar of differing agendas and approaches ... out of which a coherent and stable system could seemingly emerge only by a succession of miracles.

The cathedral idea is that very large projects cannot be distributed over an arbitrarily large team. This concept was also present in Fred Brooks' classic text on software engineering, The Mythical Man-Month, as Brook's Law: "adding human resources to a late software project makes it later".

Much of the essay relates to Raymond's work on fetchmail. This simple utility (relatively speaking; nothing is ever "simple" with SMTP, which ironically is an acronym for Simple Mail Transport Protocol) automatically reads mail from a remote system (perhaps with POP, though IMAP and other mail-reading protocols are also supported), and merges it into a local mail folder. I used to use it to retrieve email to pdordal@luc.edu and add it to my pld@cs.luc.edu mailbox, during a period when ITS was not able to support proper forwarding (this was fixed in 2011).

Raymond's problem was similar to mine. He wanted to retrieve mail from a system mailbox, and stick it into a local mailbox he could then read locally. But he needed reply-to addresses not to end up broken in the process (eg pointing to the system the system mailbox was on).

Raymond took over popclient, started by Carl Harris.

In the rest of Raymond's essay, he defends the Bazaar approach.

After adopting popclient, Raymond arrives at a series of Rules. The first, for our purposes, is

6. Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.

This is related to the Bazaar style: consider users to be full-fledged participants in the process.

Raymond then acknowledges that the Bazaar style had in fact been used on some earlier GNU projects, in particular the Lisp library used within the Emacs editor:

In contrast to the cathedral-building style of the Emacs C core and most other GNU tools, the evolution of the Lisp code pool was fluid and very user-driven.

Raymond's next distinction between cathedral and bazaar is

7. Release early. Release often. And listen to your customers

Linux releases averaged a few weeks apart, sometimes only a few hours (those were probably due to "oops" moments, though). Typical release intervals on other projects ran to several months, if not years. (How long between Windows 8 and Windows 10?)

Raymond's thesis is that by releasing often, Linux kept developers and users engaged.

Raymond's next rule is his most famous:

8. Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.

Raymond then rephrased this as Linus's Law:

Given enough eyeballs, all bugs are shallow.

This formulation is particularly well-known in security contexts; the idea is that if enough people look at the code, the vulnerabilities will all be noticed. Life hasn't always worked out that way, but it's a start. In terms of cathedral v bazaar, the cathedral idea is that debugging is hard, and that experts need to do it, and that the long intervals between releases are necessary for thorough debugging. While the bazaar theory is that "ok, here's one fix; anybody see anything else?"

Raymond then tries to reconcile Linus's Law with Brooks' Law. Bugs often arise in misunderstandings about interfaces. Brooks assumed that with N programmers, they would have to expend effort O(N2) for everyone to communicate with one another. Linux teams didn't do that; devs only followed the email threads they were interested in.

Fetchmail

Raymond then gets to how he applied the bazaar theory to his own project. Here are a few of his ideas:

On lots of projects, people are reluctant to send comments because they believe they will be ignored. This is a crucial point about engagement, and it ended up as another Rule:

10. If you treat your beta-testers as if they’re your most valuable resource, they will respond by becoming your most valuable resource.

At this point Raymond had his deepest technical insight. One of the developers proposed a mechanism for taking the fetched emails and delivering them to the local system the same way that big mail servers deliver to one another: via SMTP. The local machine, in other words, becomes the local user's mail server. And that once this was done, there was no need to consider any other mail delivery mechanism. Raymond describes this as

11. The next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better.

One last Rule, relevant to security. Users had asked for encryption of the email password, which had to be kept in a plaintext file. Raymond refused, on the theory that, as fetchmail itself would have to have access to the decryption key, this wasn't really secure.

17. A security system is only as secure as its secret. Beware of pseudo-secrets.

One of Raymond's final comments is that the bazaar style only works for incremental improvement. Not for initial development, and probably not for major redesign. Another is to acknowledge the fundamental role played by the Internet. Without Internet communication (and the early GNU projects did not have Internet access), arguably the bazaar style cannot exist. While Raymond attributes the success of the bazaar style to Torvalds himself, perhaps the real credit should go to the Internet.

Responses

One common response among Linux developers and users is "Yay Linux! We're doing it Right!". Another was that Wikipedia is fundamentally a bazaar approach to documentation. People at Netscape found the Bazaar approach very persuasive, and claimed this was why they decided to open-source their code as the foundation of the Mozilla project.

One negative response was by Nikolai Bezroukov, now (revised version) at firstmonday.org/ojs/index.php/fm/article/view/708/618. Bezroukov took issue with these points of Raymond's:

For the first, Bezhroukov argues that Linux isn't really escaping Brooks' law; perhaps OS kernels really are a collection of parallel small problems. Major Linux changes, such as a new process scheduler, did indeed take a lot of communication and architecture.

Bezroukov is most incensed with the second point above. Debugging is hard. The many security bugs out there can be taken as evidence of that; lots of eyes don't necessarily find them.

In Bezhroukov's first version, he wrote that Raymond "promoted an overoptimistic and simplistic view of open source, as a variant of socialist (or, to be more exact, vulgar Marxist) interpretation of software development". That seems a bit much. The second and final version of Bezhroukov's essay leaves out these political remarks; it is at firstmonday.org/ojs/index.php/fm/article/view/708/618.

More criticism comes from the current fetchmail team. Raymond was out of the picture by 2004, and they wrote in fetchmail.info/design-notes.html:

This document is supposed to complement Eric S. Raymond's (ESR's) design notes. The new maintainers don't agree with some of the decisions ESR made previously, and the differences and new directions will be laid out in this document. It is therefore a sort of a TODO document, until the necessary code revisions have been made.

Here is a comment from the getmail FAQ (http://pyropus.ca/software/getmail/faq.html#faq-about-why). Getmail is an alternative to fetchmail, written by Charles Cazabon (who also wrote the FAQ).

Why did you write getmail? Why not just use fetchmail?

Short answer: … well, the short answer is mostly unprintable. The long answer is … well, long:

I do not like some of the design choices which were made with fetchmail. getmail does things a little differently, and for my purposes, better. In addition, most people find getmail easier to configure and use than fetchmail. Perhaps most importantly, getmail goes to great lengths to ensure that mail is never lost, while fetchmail (in its default configuration) frequently loses mail, causes mail loops, bounces legitimate messages, and causes many other problems.

When people have pointed out problems in fetchmail's design and implementation, it's maintainer has frequently ignored them, or (worse yet) gone in the completely wrong direction in the name of "fixing" the problems. For instance, fetchmail's configuration file syntax has been criticized as being needlessly difficult to write; instead of cleaning up the syntax, the maintainer instead included a GUI configuration-file-writing program, leading to comments like:

The punchline is that fetchmail sucks, even if it does have giddily-engineered whizbang configurator apps.

As an example, Dan Bernstein, author of qmail and other software packages, once noted to the qmail list:

Last night, root@xxxxxxxxxxxxxxxxx reinjected thirty old messages from various authors to qmail@xxxxxxxxxxxxxx

This sort of idiocy happens much more often than most subscribers know, thanks to a broken piece of software by Eric Raymond called fetchmail. Fortunately, qmail and ezmlm have loop-prevention mechanisms that stop these messages before they are distributed to subscribers. The messages end up bouncing to the wrong place, thanks to another fetchmail bug, but at least the mailing list is protected.

--D. J. Bernstein

The maintainer also ignored dozens of complaints about fetchmail's behaviour, stating (by fiat) that fetchmail was bug-free and had entered "maintenance mode", allowing him to ignore further bug reports.

....

But this is not the strongest language. Perhaps the most snarky response to fetchmail comes from Terry Lambert (docs.freebsd.org/cgi/getmsg.cgi?fetch=585008+0+archive/2001/freebsd-arch/20010218.freebsd-arch):

As to fetchmail: it is an abomination before God. If someone in the press ever paid for an audit of the source code, the result would refute the paper "The Cathedral and the Bazaar" to such an extent that it could damage the Open Source movement, which has pinned so much on the paper, in ill-considered haste.

ESR has constantly maintained that fetchmail is "not an MTA", and he is right: it could be, but it's not.

When mail is delivered to a POP3 maildrop, envelope information is destroyed. To combat this, you would need to tunnel the envelope information in headers. Generally, sendmail does not support "X-Envelope-To:" because it exposes "Bcc:" recipients, since fetchmail-like programs generally _stupidly_ do not strip such headers before local re-injection of the email. Without this information, it can not recover the intended recipient of the email. The fetchmail program delivers this mail to "root".

The program has another bug, even if you elect single message delivery (in order to ensure a "for <user@domain>" in the "Received:" timestamp line. The bug is that it assumes the machine from which the download is occurring is a valid MX for your domain.... [pld: the end result is misaddressed replies]

...

Unfortunately, ESR would not accept patches for the mistaken MX problem, nor for the preference order problem, nor for the tunneled envelope information stripping problem. He seemed to be too busy with speaking engagements, and has since declared fetchmail to be in "maintenance mode", in order to demonstrate a recognizable commercial software lifecycle for an Open Source project, to give business the warm fuzzies.

Ouch.


Postscript

In 2019 there appeared an interesting postscript to Raymond's essay, by Mark Tarver: The Cathedral and the Bizarre (marktarver.com/thecathedralandthebizarre.html).

Now here is an unpalatable truth, twenty years on: most open source code is poor or unusable. A search through open source repositories like Sourceforge or Github will convince you of that.  If you haven't (as I have done) tried to piece together code from a repository armed only with few pages of code comments and virtually no documentation, you have not lived the Github experience.  In fact an article puts the abandonment rate of open source projects on Github at about 98% - meaning that there is no activity on 98% of projects after a year.  This has coined a phrase - abandonware.

Tarver does cite the OpenSSL HeartBleed bug, and there certainly are major open-source projects that are not getting much support. However, the vast majority of "abandonware" projects are projects that simply never took off. Still, Open Source has some structural issues here.