Open Source Computing Notes

Peter Dordal, Loyola University Chicago Dept of Computer Science.

In the beginning, all software was free. Or, to be more precise, IBM dominated the computer industry in the 1960's, and the software you needed was bundled in with your hardware. IBM only began selling application software separately in 1969, following the filing of the US v IBM antitrust lawsuit. One of the government's primary claims was that bundling hardware with software was anticompetitive, in that the policy pretty much made third-party software an unsustainable model.

Throughout the 1960's, IBM almost always distributed software in source form. "Source" here usually mean Assembly language, though. After the unbundling decision in 1969, IBM continued to bundle "system" software with their hardware; only application software was separate. Hardware costs were usually much higher than software, however.

Even earlier, in 1953, Univac released their A2 linker/loader system, distributed with source code, and invited users to send any improvements back to Univac. (IBM, on the other hand, was not always very interested in customer improvements.)

Up through the 1970's, most mainframe application software (eg customer billing) was locally written. It was not written to be portable, though sometimes developers tried to write in a more "universal" style.

Before 1974, software was not even copyrightable. In that year, the Commission on New Technological Uses of Copyrighted Works decided that source code was in effect a creative work, and software became copyrightable. In the 1983 case Apple v Franklin, in which Apple sued clone-maker Franklin for copying its system software, the courts recognized that copyright applied to object code as well as to source. The Apple II had been introduced in 1977.

Unix was developed at AT&T's Bell labs starting in 1970. In the early 1970's AT&T began licensing the operating system to non-profit institutions (and eventually some for-profit ones), distributing the operating system on magnetic tape. The licensing requirement was a significant stumbling block at some sites, although the cost of a license (typically ~$1,000) was vastly less than the cost of the hardware needed to run it (minimum $100,000).

Oracle was founded in 1977 (under another name), the same year that IBM released the first RDBMS, System R (which also introduced the SQL language). System R was considered "experimental", though used commercially; IBM's first commercial mainframe RDBMS package was DB2, released around 1983 but based on the 1981 SQL/DS. The Oracle database was first released in 1979, presumably running on IBM hardware. Oracle is among the earliest "mainframe" software-only companies. By 1983, Oracle DB was introduced for the DEC VAX computer, and Oracle has been associated with Unix/VMS systems ever since, although they never gave up on IBM.

The first "personal" computer was the 8080-based Altair in 1974. The Apple I was introduced in 1976, followed in 1977 by the Apple II. The Tandy / Radio Shack TRS-80 was also introduced in 1977. The IBM PC was introduced in 1982, and the Apple Macintosh in 1984.

The same era saw the development of the Berkeley Software Distribution (BSD) flavor of AT&T Unix. Until 1991, however, BSD users needed an AT&T Unix license, by which point AT&T System V was well established (in effect as a competitor to BSD). The 1991 release was a full open-source release. In 1993, following a legal settlement, the forks FreeBSD and NetBSD were released, followed by OpenBSD in 1995.

A group at MIT began working on the X-Windows system (a window manager) in 1984. This was another famous open-source project, and the origin of the MIT license.

The Apache http server was started in 1993; the Apache Software Foundation was incorporated in 1999. Apache projects are all open-source, often with commercial support available. Most Apache projects are server-side projects.

Netscape was founded in 1994; it was a closed-source project. Its browser competed with Internet Explorer through the 1990's. In 1998, with a sale to AOL looming, Netscape made its browser code open-source, and created the Mozilla Organization to host it. The Mozilla Organization went on to develop Firefox.

Richard Stallman

Richard M Stallman (sometimes "rms") began working at MIT in 1971. He moved to the MIT Artificial Intelligence Lab in 1975; at that time, AI largely meant Lisp hacking. Stallman became a vocal advocate for open computing, and even launched an initiative in 1977, when passwords were introduced in the Lab, to encourage everyone to disable their password. In 1979, Stallman protested early DRM applied to the Scribe word-processing package; Stallman's open Texinfo package was part of his response. (Donald Knuth's open TeX package was first released in 1978, though it wasn't out of beta until 1989.) In 1980, the MIT lab's new Xerox 9700 laser printer arrived with no source code, making it impossible for Stallman to hack it to enable user notifications of printing issues.

In the early 1980's, the MIT AI lab saw two spinoffs: Lisp Machines Inc and Symbolics. Both of these companies offered closed-source software, and both hired away several members of the MIT AI lab.

In September 1983, Stallman announced on Usenet his plan for the GNU project: a free and open operating system. GNU is nominally an acronym for GNU's Not Unix, and was named with full awareness of the 1957- Flanders and Swann song (in which all initial "silent" letters are voiced, and several extraneous initial G's also appear). Stallman's emacs editor was one of the first GNU software contributions.

In 1984, Stallman left MIT to manage the GNU project fulltime. In 1985 he founded the Free Software Foundation, and published his GNU Manifesto, in which he declared his ideas about free software and ethics. For Stallman, sharing software was part of the ethical duty of helping others, and if copyrights prevented that, then copyrighted software must be avoided. Stallman did not suggest that copyrights simply be infringed.

The first release of gcc, the GNU  C compiler, was in 1987. This had a major impact on software development, as most compilers at the time were not free.

Later gcc became the GNU Compiler Connection, reflecting the merge of the EGCS fork back into GCC, and the fact that GCC had, from the beginning, supported other languages: c, c++, pascal and fortran.

The Free Software Foundation was funded in part by donations. However, the FSF also sold gcc on tape (later CD). Many sites paid for gcc even though they were able to download it for free, as a way of supporting the GNU project. Often this was done by technical IT personnel, without telling management that the $300 bill for software was optional.

The GNOME graphical desktop environment, associated with GNU, was started in 1997.

The goal of the GNU project had always been to create an operating system and complete suite of utilities. The project did very well with the utilities, including gcc; this part was largely complete by 1990. However, work on an OS kernel, to be called hurd, ran into technical difficulties. Those difficulties have continued; a production version of hurd has still [2018] not been released.

Linux

Linus Torvalds release his linux operating system kernel in 1991. The following year an update was released under the GNU public license. From the beginning the linux system was bundled with all the gnu utilities; Torvalds' contribution was, in essence, just the kernel.

The linux kernel is monolithic, versus the "microkernel" architecture of hurd and minix. The microkernel idea is probably superior, though linux achieves most microkernel functionality through loadable (and unloadable) device drivers.

Starting in 1996 linux began including proprietary drivers (such as for Wi-Fi, though not in 1996), and sometimes codecs. There are fully free linux versions out there, however.

Richard Stallman has suggested that linux should properly be referred to as GNU/Linux. It often is. Stallman has also said, "there is no system but GNU, and Linux is one of its kernels."

Torvalds has never been a fan of Stallman's "free software" approach.

Linux eventually attracted considerable "commercial" support. IBM, in particular, became a major contributor starting in 1999. IBM's contributions greatly improved the performance and reliability of filesystem drivers and I/O generally.

In 2003, SCO sued IBM, claiming that IBM had taken AT&T Unix features and incorporated them into Linux. SCO claimed to own AT&T Unix at that point (a later court decision determined that Novell was the true owner), and also claimed that a licensing agreement with IBM forbade IBM from contributing to Linux. This lawsuit cast a pall over early adoption of Linux. It later turned out that Microsoft had helped SCO raise money for their anti-linux lawsuits.

Free vs Open Source

In the Windows world, there is lots of "free" software, sometimes called freeware. (Once upon a time there was "shareware", where happy users were supposed to send a contribution to the developer.) There are many free apps for Android and iPhone, too.

Most "free" software in this sense is in fact either "adware", supported by intrusive advertising, or outright spyware or malware.

Richard Stallman uses the term "free software" to mean software that is unencumbered by restrictions. You can share it, and you can read the source. He developed the GNU license to encourage continued sharing: if you modify software you obtained under the GNU license, and distribute your modifications, these must also be covered by the GNU license. The first GNU license appeared in 1989; it was followed in 1991 by Version 2.

The word "free" in English can mean either "no charge" or "having liberty". Stallman often uses the phrase "free as in speech, not as in beer" to reflect this distinction. More recently, he has been suggesting the term "Free/Libre" to discuss software that is free in his sense. The purpose of the GNU license, in a sense, is to ensure "transitivity" of freeness.

The MIT and Berkeley licenses are different from the GNU license: it is permitted for someone to copy the software, introduce changes, and sell the result. This was almost essential for the X-windows project, where end-users were almost never the ones installing the software. X-windows was distributed to workstation vendors, who distributed it in turn to their customers.

The term "open source" was coined by Christine Peterson at a 1998 meeting. It caught on with almost everyone except rms, who felt that some open-source licenses didn't perpetuate the software's freedom. That said, Stallman's core concerns about software have always been that it should be sharable and should come with source; all open-source licenses meet these criteria.

Licenses

Open-source software is usually covered by a license, granting specific permissions to the code user. The GNU license, above, specifically requires that any changes to the code also be distributed under the GNU license, if they are distributed at all. This means that you cannot take a GNU-licensed project, improve it, and sell the result (you can do this, but you also have to make the project available for free, which sort of undermines the sales plan).

Other licenses have very different terms.

GNU General Public License: This allows you to distribute the software however you like. You can even sell it. However, you must include the license with any distribution. Any changes, if distributed, must also be distributed under the GPL.

Rights under the GPL, especially the rule that any distributed changes must be released under the GPL, are often referred to as copyleft, in contrast to copyright. The idea is that you do not have to accept the license; you can do what you want with the source code under copyright. But copyright doesn't grant you many rights, and no distribution rights, so you're better off with the license.

This caused some (probably unnecessary) worry about GPL libraries. If these are linked to proprietary code, does the latter become covered by the GPL? The so-called Lesser GPL (or LGPL) clarifies this point: libraries do not bring the original work under the scope of the GPL.

The GPL comes in two versions: 2 and 3. Version 3 was introduced in 2007, and contains some technical improvements. The new terms are now more international, now cover patents, require that a hardware device not prevent changing the software, address DRM, and address mixing the GPL with other licenses. The patent rules mean that if you own a patent and add a feature to the software that uses the patent, and release the code under GPLv3, then you automatically give everyone using the code a license to the patent. If you sue anyone for infringement, you lose the right to use the software. Not everybody is happy with all the changes.

BSD License: this allows any form of redistribution, including within proprietary software. The license must be distributed with the source code, however, and also the disclaimer of warranty.

MIT License: This allows any form of redistribution, as long as the license accompanies the code.

Apache License: in its simplest form (eg, ignoring some optional language for coverage of patents, which is similar to what the GPLv3 provides), the Apache license guarantees that rights are perpetual, worldwide, irrevocable, non-exclusive, and free of charges or royalties. You can distribute an executable without distributing the source. If you do distribute the source, you must document what changes you made (in a general sort of way); the BSD and MIT licenses does not require this.

Creative Commons: This license is mostly used for non-software, but is sometimes used for software. It grants an irrevocable right of free use. Options restrict this to noncommercial use (not all that well defined, actually), or limits the creation of derivative works, or requires attribution when using the content elsewhere.

The Mozilla Foundation also has its own license, though it is seldom used outside Mozilla.

Why Open Source

There are a number of reasons why a person or company might get involved in open source. Here is a brief summary:

Where does MySQL fall? It is owned by Oracle.

GitHub

The most popular site for hosting open-source projects appears to be github.com. (Perhaps ironically, the software at github.com is not open-source.)

Github derives from git, a version-control system developed in 2005 by Linus Torvalds. Typically github users don't necessarily make routine use of the ability of git to maintain multiple independent "branches" simultaneously.

Here's a rather spare intro to git: maryrosecook.com/blog/post/git-in-six-hundred-words. Another overview is eagain.net/articles/git-for-computer-scientists.

A popular alternative to git is subversion, an Apache project started in 2000.

Git demo

Command-line git needs to be preconfigured with your identity:

git config --global user.name "Peter Dordal"
git config --global user.email pld@cs.luc.edu

Now suppose this is done. Let's start a project:

mkdir project1
cd project1
git init
    # this sets up git for this folder and subfolders

Now let's edit two files:

vi one.py two.py
git add *.py

Here's what I put into the files:

one:
    import sys
    sys.path.append('.')
    import two
    two.foo()

two:
    def foo():
        print('here I am in two.foo()')

git status

git commit -m master

In the last command, -m specifies a message identifying the commit. Don't forget these. Now let's create a new branch:

git branch changes1
git checkout changes1    # or git checkout -b changes1

Now we'll edit the files by adding a print() statement in one.py, print('about to call two.foo()'), and in two.py, print('at end of two.foo()'). We have to add the modified files to the repository (or use git commit with the -a option).

git add *.py
git commit -m changes1

or: git commit -a -m changes1

Now for the fun part:

git checkout master

And all our changes have disappeared! Even if we access from another window!

But we can revert with

git checkout changes1

Now we combine them:

git checkout master
git merge changes1

At this point we could delete the changes1 branch with git branch -d changes1.

Auto-merging works here because we simply added to the changed files, and didn't commit corresponding changes to the master branch. Had we done that, we'd be forced to merge manually.

Finally we'll try cloning:

cd ..
mkdir project2
cd project2
git clone ../project1

Here's a more interesting example:

git clone https://github.com/OrgName/ProjectName.git