Open Source Computing Notes

Peter Dordal, Loyola University Chicago Dept of Computer Science.

In the beginning, all software was free. Or, to be more precise, IBM dominated the computer industry in the 1960's, and the software you needed was bundled in with your hardware. IBM only began selling application software separately in 1969, following the filing of the US v IBM antitrust lawsuit. One of the government's primary claims was that bundling hardware with software was anticompetitive, in that the policy pretty much made third-party software an unsustainable model.

Throughout the 1960's, IBM almost always distributed software in source form. "Source" here usually meant Assembly language, though. After the unbundling decision in 1969, IBM continued to bundle "system" software with their hardware; only application software was separate. Hardware costs were usually much higher than software, however.

Even earlier, in 1953, Univac released their A2 linker/loader system, distributed with source code, and invited users to send any improvements back to Univac. (IBM, on the other hand, was not always very interested in customer improvements.)

Up through the 1970's, most mainframe application software (eg customer billing) was locally written. It was usually not written to be portable, though sometimes developers tried to write in a more "universal" style.

Before 1974, software was not even copyrightable. In that year, the Commission on New Technological Uses of Copyrighted Works decided that source code was in effect a creative work, and software became copyrightable. In the 1983 case Apple v Franklin, in which Apple sued clone-maker Franklin for copying its system software, the courts recognized that copyright applied to object code as well as to source. The Apple II had been introduced in 1977.

Unix was developed at AT&T's Bell labs starting in 1970. In the early 1970's AT&T began licensing the operating system to non-profit institutions (and eventually some for-profit ones), distributing the operating system on magnetic tape. The licensing requirement was a significant stumbling block at some sites, although the cost of a license (typically ~$1,000) was vastly less than the cost of the hardware needed to run it (minimum $100,000).

Oracle was founded in 1977 (under another name), the same year that IBM released the first RDBMS, System R (which also introduced the SQL language). System R was considered "experimental", though used commercially; IBM's first commercial mainframe RDBMS package was DB2, released around 1983 but based on the 1981 SQL/DS. The Oracle database was first released in 1979, presumably running on IBM hardware. Oracle is among the earliest "mainframe" software-only companies. By 1983, Oracle DB was introduced for the DEC VAX computer, and Oracle has been associated with Unix/VMS systems ever since, although they never gave up on IBM.

The first "personal" computer was the 8080-based Altair in 1974. The Apple I was introduced in 1976, followed in 1977 by the Apple II. The Tandy / Radio Shack TRS-80 was also introduced in 1977. The IBM PC was introduced in 1982, and the Apple Macintosh in 1984.

The same era saw the development of the Berkeley Software Distribution (BSD) flavor of AT&T Unix. Until 1991, however, BSD users needed an AT&T Unix license, by which point AT&T System V was well established (in effect as a competitor to BSD). The 1991 release was a full open-source release. In 1993, following a legal settlement, the forks FreeBSD and NetBSD were released, followed by OpenBSD in 1995.

A group at MIT began working on the X-Windows system (a window manager) in 1984. This was another famous open-source project, and the origin of the MIT license.

The Apache http server was started in 1993; the Apache Software Foundation was incorporated in 1999. Apache projects are all open-source, often with commercial support available. Most Apache projects are server-side projects.

Netscape was founded in 1994; it was a closed-source project. Its browser competed with Internet Explorer through the 1990's. In 1998, with a sale to AOL looming, Netscape made its browser code open-source, and created the Mozilla Organization to host it. The Mozilla Organization went on to develop Firefox.

Richard Stallman

Richard M Stallman (sometimes "rms") began working at MIT in 1971. He moved to the MIT Artificial Intelligence Lab in 1975; at that time, AI largely meant Lisp hacking. Stallman became a vocal advocate for open computing, and even launched an initiative in 1977, when passwords were introduced in the Lab, to encourage everyone to disable their password. In 1979, Stallman protested early DRM applied to the Scribe word-processing package; Stallman's open Texinfo package was part of his response. (Donald Knuth's open TeX package was first released in 1978, though it wasn't out of beta until 1989.) In 1980, the MIT lab's new Xerox 9700 laser printer arrived with no source code, making it impossible for Stallman to hack it to enable user notifications of printing issues.

In the early 1980's, the MIT AI lab saw two spinoffs: Lisp Machines Inc and Symbolics. Both of these companies offered closed-source software, and both hired away several members of the MIT AI lab.

In September 1983, Stallman announced on Usenet his plan for the GNU project: a free and open operating system. GNU is nominally an acronym for GNU's Not Unix, and was named with full awareness of the 1957- Flanders and Swann song (in which all initial "silent" letters are voiced, and several extraneous initial G's also appear). Stallman's emacs editor was one of the first GNU software contributions.

In 1984, Stallman left MIT to manage the GNU project fulltime. In 1985 he founded the Free Software Foundation, and published his GNU Manifesto, in which he declared his ideas about free software and ethics. For Stallman, sharing software was part of the ethical duty of helping others, and if copyrights prevented that, then copyrighted software must be avoided. Stallman did not suggest that copyrights simply be infringed.

The first release of gcc, the GNU  C compiler, was in 1987. This had a major impact on software development, as most compilers at the time were not free.

Later gcc became the GNU Compiler Connection, reflecting the merge of the EGCS fork back into GCC, and the fact that GCC had, from the beginning, supported other languages: c, c++, pascal and fortran.

The Free Software Foundation was funded in part by donations. However, the FSF also sold gcc on tape (later CD). Many sites paid for gcc even though they were able to download it for free, as a way of supporting the GNU project. Often this was done by technical IT personnel, without telling management that the $300 bill for software was optional.

The GNOME graphical desktop environment, associated with GNU, was started in 1997.

The goal of the GNU project had always been to create an operating system and complete suite of utilities. The project did very well with the utilities, including gcc; this part was largely complete by 1990. However, work on an OS kernel, to be called hurd, ran into technical difficulties. Those difficulties have continued; a production version of hurd has still [2018] not been released.

Linux

Linus Torvalds release his Linux operating system kernel in 1991. The following year an update was released under the GNU public license. From the beginning the Linux system was bundled with all the gnu utilities; Torvalds' contribution was, in essence, just the kernel.

The Linux kernel is monolithic, versus the "microkernel" architecture of hurd and minix. The microkernel idea is probably superior technically, though Linux achieves most microkernel functionality through loadable (and unloadable) device drivers.

Starting in 1996 Linux began including proprietary drivers (such as for Wi-Fi, though not in 1996), and sometimes codecs. There are fully free Linux versions out there, however.

Richard Stallman has suggested that Linux should properly be referred to as GNU/Linux. It often is. Stallman has also said

There is no system but GNU, and Linux is one of its kernels.

Torvalds has never been a fan of Stallman's "free software" approach, though he did adopt the Gnu license (v2) for Linux, and has repeatedly affirmed that was a very fortuitous choice.

Linux eventually attracted considerable "commercial" support. IBM, in particular, became a major contributor starting in 1999. IBM's contributions greatly improved the performance and reliability of filesystem drivers and I/O generally.

In 2003, SCO sued IBM, claiming that IBM had taken AT&T Unix features and incorporated them into Linux. SCO claimed to own AT&T Unix at that point (a later court decision determined that Novell was the true owner), and also claimed that a licensing agreement with IBM forbade IBM from contributing to Linux. This lawsuit cast a pall over early adoption of Linux. It later turned out that Microsoft had helped SCO raise money for their anti-Linux lawsuits.

Free vs Open Source

In the Windows world, there is lots of "free" software, sometimes called freeware. (Once upon a time there was "shareware", where happy users were supposed to send a contribution to the developer.) There are many free apps for Android and iPhone, too.

Most "free" software in this sense is in fact either "adware", supported by intrusive advertising, or outright spyware or malware.

Richard Stallman uses the term "free software" to mean software that is unencumbered by restrictions. You can share it, and you can read the source. He developed the GNU license to encourage continued sharing: if you modify software you obtained under the GNU license, and distribute your modifications, these must also be covered by the GNU license. The first GNU license appeared in 1989; it was followed in 1991 by Version 2.

The word "free" in English can mean either "no charge" or "having liberty". Stallman often uses the phrase "free as in speech, not as in beer" to reflect this distinction. More recently, he has been suggesting the term "Free/Libre" to discuss software that is free in his sense. The purpose of the GNU license, in a sense, is to ensure "transitivity" of freeness.

The MIT and Berkeley licenses are different from the GNU license: it is permitted for someone to copy the software, introduce changes, and sell the result. This was almost essential for the X-windows project, where end-users were almost never the ones installing the software. X-windows was distributed to workstation vendors, who distributed it in turn to their customers.

The term "open source" was coined by Christine Peterson at a 1998 meeting. It caught on with almost everyone except rms, who felt that some open-source licenses didn't perpetuate the software's freedom. That said, Stallman's core concerns about software have always been that it should be sharable and should come with source; all open-source licenses meet these criteria.

Licenses

Open-source software is usually covered by a license, granting specific permissions to the code user. The GNU license, above, specifically requires that any changes to the code also be distributed under the GNU license, if they are distributed at all. This means that you cannot take a GNU-licensed project, improve it, and sell the result (you can do this, but you also have to make the project available for free, which sort of undermines the sales plan).

Other licenses have very different terms.

GNU General Public License: This allows you to distribute the software however you like. You can even sell it. However, you must include the source code with any distribution. Any changes, if distributed, must also be distributed under the GPL. In other words, if some code is licensed under the GPL, then any distributed extensions must also be licensed under the GPL. (It is technically legal to make changes to GPL software, not distribute them, and therefore not distribute the source.)

Rights under the GPL, especially the rule that any distributed changes must be released under the GPL, are often referred to as copyleft, in contrast to copyright. The idea is that you do not have to accept the license; you can do what you want with the source code under copyright. But copyright doesn't grant you many rights, and no distribution rights, so you're better off with the license.

This caused some (probably unnecessary) worry about GPL libraries. If these are linked to proprietary code, does the latter become covered by the GPL? The so-called Lesser GPL (or LGPL) clarifies this point: libraries do not bring the original work under the scope of the GPL.

The GPL comes in two versions: 2 and 3. Version 3 was introduced in 2007, and contains some technical improvements. The new terms are now more international, now cover patents, require that a hardware device not prevent changing the software, address DRM, and address mixing the GPL with other licenses. The patent rules mean that if you own a patent and add a feature to the software that uses the patent, and release the code under GPLv3, then you automatically give everyone using the code a license to the patent. If you sue anyone for infringement, you lose the right to use the software. Not everybody is happy with all the changes.

BSD License: this allows any form of redistribution, including within proprietary software. The license must be distributed with the source code, however, and also the disclaimer of warranty.

MIT License: This allows any form of redistribution, as long as the license accompanies the code.

Apache License: in its simplest form (eg, ignoring some optional language for coverage of patents, which is similar to what the GPLv3 provides), the Apache license guarantees that rights are perpetual, worldwide, irrevocable, non-exclusive, and free of charges or royalties. You can distribute an executable without distributing the source. If you do distribute the source, you must document what changes you made (in a general sort of way); the BSD and MIT licenses does not require this.

Affero License: this is a variant of the GPL, often known as the AGPL. The idea here is that if you even so much as allow others to use the binaries (eg you're a software-as-a-service provider, or a cloud provider), then you have to make the source available to those users. The Affero license affects companies like Amazon that have (a) modified a software package, and (b) made the modified version widely available (for Amazon, on AWS). Amazon does exactly this for MySQL, for example, but MySQL is not Affero-licensed.

Creative Commons: This license is mostly used for non-software, but is sometimes used for software. It grants an irrevocable right of free use. Options restrict this to noncommercial use (not all that well defined, actually), or limits the creation of derivative works, or requires attribution when using the content elsewhere.

The Mozilla Foundation also has its own license, though it is seldom used outside Mozilla.

Why Open Source

There are a number of reasons why a person or company might get involved in open source. Here is a brief summary:

Where does MySQL fall? It is owned by Oracle.

Git and GitHub

The most popular site for hosting open-source projects appears to be github.com. (Perhaps ironically, the software at github.com is not open-source.)

Github derives from git, a version-control system developed in 2005 by Linus Torvalds, who later claimed

I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'.

The Monty Python Mr and Mrs Git video appears to have been scrubbed from the Internet by the copyright holders, but here's a text version: montypython.net/scripts/mrgit.php. The idea of the sketch was that Mr and Mrs Git appear resigned to the fact that their last name is Git, but are oblivious to the social implications of their first names: "A Snivelling Little Rat-Faced" and "Dreary Fat Boring Old".

Typically github users don't necessarily make routine use of the ability of git to maintain multiple independent "branches" simultaneously.

Here's a rather spare intro to git, starting with cloning: maryrosecook.com/blog/post/git-in-six-hundred-words. Another overview, with more focus on branching, is eagain.net/articles/git-for-computer-scientists.

A popular alternative to git is subversion, an Apache project started in 2000. An earlier alternative is the Concurrent Versions System, or CVS.

Git was perhaps the first source-code manager to adopt the file as the basic unit, rather than maintaining the original plus a series of "patch" files. That said, git can merge files like the following, which presumes at least some awareness of line structure:

here is the file
second line
new line 2.5 here
line three
line four
that's all, folks!
here is the file
second line
line three
line four
new line 4.5 here
that's all, folks!

The file-based orientation means that git can be much faster than its predecessors. Except for merging, no operations on file contents are necessary. Earlier systems had to apply each patch in turn to get the current end-result. CVS, in particular, was very slow at merging in large projects.

Git was also the first fully-decentralized repository. If Bob clones Alice's repository, then Alice can decide later that Bob's repo is the master, and merge it into her own. For Torvalds, this went a long way to smooth over unpleasant political debates about whose was the master repo. It also meant that arbitrarily complex work could be done offline. (An earlier model for versioning was that a programmer would "check out" part of the code to work on; the programmer would make changes and then "check back in". During the interval, that part of the code might be held as read-only. Git fixed this.)

In 2002, the Linux team adopted the commercial product BitKeeper, by BitMover, for source-control management. They got a deal to use it for free, but it was not free software in general. Andrew Tridgell attempted to reverse-engineer the network BitKeeper protocol, following his earlier efforts to reverse-engineer Microsoft SMB which led to Samba. This led to BitMover's withdrawal of the free-use offer in April 2005, claiming license violations, though Tridgell had never actually used BitKeeper and so never agreed to the "click-wrap" license. This, in turn, led to the (rapid) development of Git as a replacement, with Torvalds writing at least the first draft.

Having your vendor withdraw your license due to some dispute about usage is, to some, exactly why we should all use open source in the first place.

Here's one quote from Torvalds about git versus CVS (from here):

I credit CVS in a very very negative way. Because I, in many ways, when I designed git, it's "what would Jesus do" except that it's "what would CVS never ever do"-kind of approach to source control management. I've never actually used CVS for the kernel. For the first 10 years of kernel maintenance, we literally used tarballs and patches, which is a much superior source control management system than CVS is, but I did end up using CVS for 7 years at a commercial company, and I hate it with a passion.

Here's another quote attributed to Torvalds on slide 9 of slideshare.net/odimulescu/git-presentatio (and other places):

Take CVS as an example of what not to do; if in doubt, make the exact opposite decision.
Aka WWCVSND (What would CVS never do)

But the slides claim this was said in Torvalds' Google talk on git, at youtube.com/watch?v=4XpnKHJAok8, around the 2:30 mark. He doesn't quite say this. He probably said it at another talk; it does seem that he did say it somewhere.

In 2016, BitMover made BitKeeper open-source (in this case, this was a recognition that it wasn't selling).

References:


Git demo

Command-line git needs to be preconfigured with your identity:

git config --global user.name "Peter Dordal"
git config --global user.email pld@cs.luc.edu

Now suppose this is done. Let's start a project (pld's example is in ~/412/projects)

mkdir project1
cd project1
git init
    # this sets up git for this folder and subfolders

Now let's edit two files:

vi one.py two.py
git add *.py

Here's what I put into the files:

one:
    import sys
    sys.path.append('.')
    import two
    two.foo()

two:
    def foo():
        print('here I am in two.foo()')

git status

git commit -m master

In the last command, -m specifies a message identifying the commit. Don't forget these.

We can modify a file, and repeat the git add and git commit, as necessary.

Branching

Now let's create a new branch:

git branch changes1
git checkout changes1    # or git checkout -b changes1

Branches allow separate "versions" of the project to coexist. You can be working on a complicated new feature, in branch feature37, and be called back to fix a major bug in branch master. But the branch feature37 work is still there, suspended. You may or may not use branches in your project, because you and your teammates will presumably working off separate clones (below).

Now we'll edit the files by adding a print() statement in one.py, print('about to call two.foo()'), and in two.py, print('at end of two.foo()'). We have to add the modified files to the repository (or use git commit with the -a option).

git add *.py
git commit -m changes1

or: git commit -a -m changes1

Now for the fun part:

git checkout master

And all our changes have disappeared! Even if we access from another window!

But we can revert with

git checkout changes1

Now we combine them:

git checkout master
git merge changes1

At this point we could delete the changes1 branch with git branch -d changes1.

Auto-merging works here because we simply added to the changed files, and didn't commit corresponding changes to the master branch. Had we done that, we'd be forced to merge manually.

For example, suppose we do the above, but commit this to changes1:

one:
    import sys
    sys.path.append('.')
    import two
    print('getting ready to call two.foo()')
    two.foo()

And this to master:

one:
    import sys
    sys.path.append('.')
    import two
    print('about to call two.foo()')
    two.foo()

Now when we try git merge changes1 (from within master), we get

Auto-merging one.py
CONFLICT (content): Merge conflict in one.py
Automatic merge failed; fix conflicts and then commit the result.

If we try git diff, we get this (note that git is keeping track of the fact that we're attempting a merge):

diff --cc one.py
index 8be88c0,040e69a..0000000
--- a/one.py
+++ b/one.py
@@@ -1,5 -1,5 +1,9 @@@
  import sys
  sys.path.append('.')
  import two
++<<<<<<< HEAD
 +print('geting ready to call two.foo()')
++=======
+ print('about to call two.foo()')
++>>>>>>> changes1
  two.foo()

This is a form of the diff command; it attempts a line-by-line file comparison. It is successful, in that it does illustrate what lines are different. One way to fix this is to modify changes1/one.py until it's consistent with master/one.py; you may need git merge --abort to be able to go back to the changes1 branch.

Cloning

Now let's try cloning (sometimes called forking, though forking is really a social construct meaning that you are starting a new project; cloning typically means that you intend to merge your changes back into the original, but need to work on a copy for the time being):

cd ..
mkdir project2
cd project2
git clone ../project1

Now we have an entirely separate directory (the actual files are in project2/project1). The project2 team can, if allowed, push their changes back into project1, or ask the project1 team to pull the changes from project2 into project1.

Let's have project2 make some changes. We'll add this line to the end of one.py:

print('done with two.foo()')

Now, let's go back to project1, and enter

git remote add p2 ../project2/project1

And then for the pull:

git pull p2 master

Now our project2 changes are merged into project1!

If we wanted the clone to be at the top level of project2, we could have done this, from within project1:

git clone . ../project2

Here's an example of a remote clone:

git clone https://github.com/OrgName/ProjectName.git

This clones the remote repository. Github is set up to make this easy.