Open Source Notes

Open Source Computing Notes

Peter Dordal, Loyola University Chicago Dept of Computer Science.

In the beginning, all software was free. Or, to be more precise, IBM dominated the computer industry in the 1960's, and the software you needed was bundled in with your hardware (and, of course, inefficient software resulted in the sales of faster hardware!) IBM only began selling application software separately in 1969, following the filing of the US v IBM antitrust lawsuit. One of the government's primary claims was that bundling hardware with software was anticompetitive, in that the policy pretty much made third-party software an unsustainable model.

Throughout the 1960's, IBM almost always distributed software in source form. "Source" here usually meant Assembly language, though. After the unbundling decision in 1969, IBM continued to bundle "system" software with their hardware; only application software was separate. Hardware costs were usually much higher than software, however.

Even earlier, in 1953, Univac released their A2 linker/loader system, distributed with source code, and invited users to send any improvements back to Univac. (IBM, on the other hand, was not always very interested in customer improvements.)

Up through the 1970's, most mainframe application software (eg customer billing) was locally written. It was usually not written to be portable, though sometimes developers tried to write in a more "universal" style.

Before 1974, software was not even copyrightable. In that year, the Commission on New Technological Uses of Copyrighted Works decided that source code was in effect a creative work, and software became copyrightable. In the 1983 case Apple v Franklin, in which Apple sued clone-maker Franklin for copying its system software, the courts recognized that copyright applied to object code as well as to source. The Apple II had been introduced in 1977.

Unix was developed at AT&T's Bell labs starting in 1970. In the early 1970's AT&T began licensing the operating system to non-profit institutions (and eventually some for-profit ones), distributing the operating system on magnetic tape. The licensing requirement was a significant stumbling block at some sites, although the cost of a license (typically ~$1,000) was vastly less than the cost of the hardware needed to run it (minimum $100,000).

Oracle was founded in 1977 (under another name), the same year that IBM released the first RDBMS, System R (which also introduced the SQL language). System R was considered "experimental", though used commercially; IBM's first commercial mainframe RDBMS package was DB2, released around 1983 but based on the 1981 SQL/DS. The Oracle database was first released in 1979, presumably running on IBM hardware. Oracle is thus among the earliest "mainframe" software-only companies. By 1983, Oracle DB was introduced for the DEC VAX computer, and Oracle has been associated with Unix/VMS systems ever since, although they never gave up on IBM.

The first "personal" computer was the 8080-based Altair in 1974. The Apple I was introduced in 1976, followed in 1977 by the Apple II. The Tandy / Radio Shack TRS-80 was also introduced in 1977. The IBM PC was introduced in 1982, and the Apple Macintosh in 1984.

The same era saw the development of the Berkeley Software Distribution (BSD) flavor of AT&T Unix. Until 1991, however, BSD users needed an AT&T Unix license, by which point AT&T System V was well established (in effect as a competitor to BSD). The 1991 release was a full open-source release. In 1993, following a legal settlement, the forks FreeBSD and NetBSD were released, followed by OpenBSD in 1995.

A group at MIT began working on the X-Windows system (a window manager) in 1984. This was another famous open-source project, and the origin of the MIT license.

The Apache http server was started in 1993; the Apache Software Foundation was incorporated in 1999. Apache projects are all open-source, often with commercial support available. Most Apache projects are server-side projects.

Netscape was founded in 1994; it was a closed-source project. Its browser competed with Internet Explorer through the 1990's. In 1998, with a sale to AOL looming, Netscape made its browser code open-source, and created the Mozilla Organization to host it. The Mozilla Organization went on to develop Firefox.

Richard Stallman

Richard M Stallman (sometimes "rms") began working at MIT in 1971, while a Freshman at Harvard. A bit after graduating from Harvard in 1974 he was hired by the MIT Artificial Intelligence Lab; at that time, AI largely meant Lisp hacking. Stallman became a vocal advocate for open computing, and even launched an initiative in 1977, when passwords were introduced in the Lab, to encourage everyone to disable their password. In 1979, Stallman protested early DRM applied to the Scribe word-processing package; Stallman's open Texinfo package was part of his response. (Donald Knuth's open TeX package was first released in 1978, though it wasn't out of beta until 1989.) In 1980, the MIT lab's new Xerox 9700 laser printer arrived with no source code, making it impossible for Stallman to hack it to enable user notifications of printing issues.

Richard Stallman, 2007

In the early 1980's, the MIT AI lab saw two spinoffs: Lisp Machines Inc and Symbolics. Both of these companies offered closed-source software, and both hired away several members of the MIT AI lab.

In September 1983, Stallman announced on Usenet his plan for the GNU project: a free and open operating system. GNU is nominally an acronym for GNU's Not Unix, and was named with full awareness of the 1957- Flanders and Swann song (in which all initial "silent" letters are voiced, and several extraneous initial G's also appear). Stallman's emacs editor was one of the first GNU software contributions.

In 1984, Stallman left MIT to manage the GNU project fulltime. In 1985 he founded the Free Software Foundation, and published his GNU Manifesto, in which he declared his ideas about free software and ethics.

For Stallman, sharing software was part of the ethical duty of helping others, and if copyrights prevented that, then copyrighted software must be avoided. Stallman did not suggest that copyrights simply be infringed. Implicitly, this suggests that the Intellectual Property model, in which you can "own" content but not be allowed to share it with others, even though there is no cost in doing so, is fundamentally unethical. (Keep in mind that Stallman absolutely loved tinkering with software.)

The usual industry justifications for adopting the use of open-source software are usually pragmatic: you avoid vendor lock-in (below) and the possibility that your vendor might refuse to implement the features you need, or even go out of business. But this is not Stallman's justification: he believes in open source as a moral imperative, because proprietary software cannot be shared with others. This is reflected in the GNU software licenses, below.

The first release of gcc, the GNU C compiler, was in 1987. This had a major impact on software development, as most compilers at the time were not free. In particular, gcc enabled the distribution of Unix software in source form.

Later gcc became the GNU Compiler Collection, reflecting the merge of the EGCS fork back into GCC, and the fact that GCC had, from the beginning, supported other languages: c, c++, pascal and fortran.

The Free Software Foundation was funded in part by donations. However, the FSF also sold gcc on tape (later CD). Many sites paid for gcc even though they were able to download it for free, as a way of supporting the GNU project. Often this was done by technical IT personnel, without telling management that the $500 bill for compiler software was optional.

The GNOME graphical desktop environment, associated with GNU, was started in 1997.

In addition to software, the Free Software Foundation has also publicized licenses for open source. These are collectively known as the General Public License, or GPL; GPL v2 remains popular (and is used by Linux) but the current version is GPL v3. An early version of the GPL first appeared in 1989. GPL licenses have the so-called "copyleft" feature, that obligates anyone distributing code based on a GPL-licensed project to release their own code under the GPL as well.

The goal of the GNU project had always been to create an operating system and complete suite of utilities. The project did very well with the utilities, including gcc; this part was largely complete by 1990. However, work on an OS kernel, to be called hurd, ran into technical difficulties. Those difficulties have continued; we still don't have a true production version of hurd. There is apparently now a working 64-bit release, though; see lists.gnu.org/archive/html/bug-hurd/2023-06/msg00038.html.

Proprietary Unix in the 1990s

Back in 1986, the CS department got three AT&T 3B2 computers; a couple years later ITS got a 3B5. Because they were from AT&T, they ran System V Unix, with no extraneous bells and whistles.

But AT&T had its own proprietary 3BNET for these machines: it was just like Ethernet, but it allowed double the packet size! 3 KB instead of 1500 bytes! Twice the, um, well, not really throughput, but twice as many bytes in any one packet!

Why AT&T really did this: to lock users into their hardware, and also their system.

There were a lot of diverse Unix implementations in that era: Solaris (Sun), HP-UX (HP), AIX (IBM), IRIX (Silicon Graphics), DEC Ultrix, DG\UX (Data General) .... All of them were motivated by at least some degree of vendor lock-in. Commercial users hated this. That's why so many commercial users were (and are) so invested in Open Source: it's to avoid vendor lock-in.

Posix was another approach to vendor lock-in: it was an attempt to define a "core" operating system interface, mostly Unix-like, for application development. (By the way, it was Richard Stallman who suggested the name "Posix".) Ultimately, however, Posix faltered; the abstract Posix interface was unable to keep up with real-world hardware. See also this "Posix has become outdated" article: www.usenix.org/system/files/login/articles/login_fall16_02_atlidakis.pdf.

Linux

Linus Torvalds release his Linux operating system kernel in 1991. The following year an update was released under the GNU public license. From the beginning the Linux system was bundled with all the gnu utilities; Torvalds' contribution was, in essence, just the kernel.

The Linux kernel is monolithic, versus the "microkernel" architecture of hurd and minix. The microkernel idea is probably superior technically, though Linux achieves most microkernel functionality through loadable (and unloadable) device drivers.

Starting in 1996 Linux began including proprietary drivers (such as for Wi-Fi, though not in 1996), and sometimes codecs. There are fully free Linux versions out there, however.

Richard Stallman has suggested that Linux should properly be referred to as GNU/Linux. It often is. Stallman has also said

There is no system but GNU, and Linux is one of its kernels.

Torvalds has never been a fan of Stallman's "free software" approach, though he did adopt the Gnu license (v2) for Linux, and has repeatedly affirmed that was a very fortuitous choice.

Linux eventually attracted considerable "commercial" support. IBM, in particular, became a major contributor starting in 1999. IBM's contributions greatly improved the performance and reliability of filesystem drivers and I/O generally.

In 2003, SCO sued IBM, claiming that IBM had taken AT&T Unix features and incorporated them into Linux. SCO claimed to own AT&T Unix at that point (a later court decision determined that Novell was the true owner), and also claimed that a licensing agreement with IBM forbade IBM from contributing to Linux. This lawsuit cast a pall over early adoption of Linux. It later turned out that Microsoft had helped SCO raise money for their anti-Linux lawsuits.

Free vs Open Source

In the Windows world, there is lots of "free" software, sometimes called freeware. (Once upon a time there was "shareware", where happy users were supposed to send a contribution to the developer.) There are many free apps for Android and iPhone, too.

Most "free" software in this sense is in fact either "adware", supported by intrusive advertising, or outright spyware or malware.

Richard Stallman uses the term "free software" to mean software that is unencumbered by restrictions. You can share it, and you can read the source. He developed the GNU license to encourage continued sharing: if you modify software you obtained under the GNU license, and distribute your modifications, these must also be covered by the GNU license. The first GNU license appeared in 1989; it was followed in 1991 by Version 2.

The word "free" in English can mean either "no charge" or "having liberty". Stallman often uses the phrase "free as in speech, not as in beer" to reflect this distinction. More recently, he has been suggesting the term "Free/Libre" to discuss software that is free in his sense. The purpose of the GNU license, in a sense, is to ensure "transitivity" of freeness.

The MIT and Berkeley licenses are different from the GNU license: it is permitted for someone to copy the software, introduce changes, and sell the result. This was almost essential for the X-windows project, where end-users were almost never the ones installing the software. X-windows was distributed to workstation vendors, who distributed it in turn to their customers. Wind River took BSD Unix and built their proprietary vxWorks, a real-time operating system.

The term "open source" was coined by Christine Peterson at a 1998 meeting. It caught on with almost everyone except rms, who felt that some open-source licenses didn't perpetuate the software's freedom. That said, Stallman's core concerns about software have always been that it should be sharable and should come with source; all open-source licenses meet these criteria.

Licenses

Open-source software is usually covered by a license, granting specific permissions to the code user. The GNU license, above, specifically requires that any changes to the code also be distributed under the GNU license, if they are distributed at all. This means that you cannot take a GNU-licensed project, improve it, and sell the result (you can do this, but you also have to make the project available for free, which sort of undermines the sales plan).

Other licenses have very different terms.

GNU General Public License: This allows you to distribute the software however you like. You can even sell it. However, you must include the source code with any distribution. Any changes, if distributed, must also be distributed under the GPL. In other words, if some code is licensed under the GPL, then any distributed extensions must also be licensed under the GPL. (It is technically legal to make changes to GPL software, not distribute them, and therefore not distribute the source.)

Rights under the GPL, especially the rule that any distributed changes must be released under the GPL, are often referred to as copyleft, in contrast to copyright. The idea is that you do not have to accept the license; you can do what you want with the source code under copyright. But copyright doesn't grant you many rights, and no distribution rights, so if you want to distribute your modifications you have to accept the license.

This caused some (probably unnecessary) worry about GPL libraries. If these are linked to proprietary code, does the latter become covered by the GPL? The so-called Lesser GPL (or LGPL) clarifies this point: libraries do not bring the original work under the scope of the GPL.

The GPL comes in two versions: 2 and 3. Version 3 was introduced in 2007, and contains some technical improvements. The new terms are now more international, now cover patents, require that a hardware device not prevent changing the software, address DRM, and address mixing the GPL with other licenses. The patent rules mean that if you own a patent and add a feature to the software that uses the patent, and release the code under GPLv3, then you automatically give everyone using the code a license to the patent. If you sue anyone for infringement, you lose the right to use the software. Not everybody is happy with all the changes.

BSD License: this allows any form of redistribution, including within proprietary software. The license must be distributed with the source code, however, and also the disclaimer of warranty.

MIT License: This allows any form of redistribution, as long as the license accompanies the code.

Apache License: in its simplest form (eg, ignoring some optional language for coverage of patents, which is similar to what the GPLv3 provides), the Apache license guarantees that rights are perpetual, worldwide, irrevocable, non-exclusive, and free of charges or royalties. You can distribute an executable without distributing the source. If you do distribute the source, you must document what changes you made (in a general sort of way); the BSD and MIT licenses does not require this.

Affero License: this is a variant of the GPL, often known as the AGPL. The idea here is that if you even so much as allow others to use the binaries (eg you're a software-as-a-service provider, or a cloud provider), then you have to make the source available to those users. The Affero license affects companies like Amazon that have (a) modified a software package, and (b) made the modified version widely available (for Amazon, on AWS). Amazon does exactly this for MySQL, for example, but MySQL is not Affero-licensed.

Creative Commons: This license is mostly used for non-software, but is sometimes used for software. It grants an irrevocable right of free use. Options restrict this to noncommercial use (not all that well defined, actually), or limits the creation of derivative works, or requires attribution when using the content elsewhere.

The Mozilla Foundation also has its own license, though it is seldom used outside Mozilla.

It is only the Gnu licenses that are self-perpetuating, requiring that if you modify the source code and distribute it then you must release it under the same Gnu terms.

Why Open Source

There are a number of reasons why a person or company might get involved in developing open source. Here is a brief summary:

It's the Right Thing To Do.
You're more interested in credit than marketing.
You recognize that the dollar value of your software in the market is negligible. (This is surprisingly often the case.)
You want to build your resume.
You plan to sell support, or documentation (there are many non-free books on famous open-source projects).
Your company wants to push the product as a standard.
You are hoping that the user community writes the many extensions and add-ons your project will need to be a success.
Your company has produced the software, but their primary business is quite different, and they don't want to bother with trying to sell it.
Your company hopes eventually to move users from the free version to the proprietary version.

Where does MySQL fall? It is owned by Oracle.

As for using open source, companies do this not just because it is free, but because they fear vendor lock-in at the one hand, and vendor bankruptcy at the other.