Open-Source Security

Is open source more secure because more people look at the code? Or less secure, because bad guys can look at the code and find bugs?

This question has been debated for quite some time. For a while, so-called "fuzzing" techniques (for generating random input) did work better when the source was available: the user would examine the source and generate input until all the edge cases were triggered. The goal was to execute every part of the code.

But then that changed: tools were developed that could do this given only the object code. At that point, having the source was no longer a liability. A large number of Windows exploits started to appear.

On the other hand, the truth is that not everybody looks at the code. In most cases, not very many do.

TLS, for Transport Layer Security, is an encryption layer used above TCP (or UDP). It is what creates the 's' in https, for secure http.

Debian OpenSSL bug

At some point (probably 2006), Debian removed a couple lines from the Debian copy of OpenSSL because the lines generated complaints from code-analysis software. Here's the code as of today, from boinc.berkeley.edu/android-boinc/libssl/crypto/rand/md_rand.c; the function is static void ssleay_rand_add(const void *buf, int num, double add). The variable m is a pointer to a certain structure. The call to MD_Update() itself is actually a macro:

#define MD_Update(a,b,c)	EVP_DigestUpdate(a,b,c)

The variable m is a pointer to a certain structure. Here is part of the code:

	for (i=0; i<num; i+=MD_DIGEST_LENGTH)
		{
		j=(num-i);
		j=(j > MD_DIGEST_LENGTH)?MD_DIGEST_LENGTH:j;

		MD_Init(&m);
		MD_Update(&m,local_md,MD_DIGEST_LENGTH);
		k=(st_idx+j)-STATE_SIZE;
		if (k > 0)
			{
			MD_Update(&m,&(state[st_idx]),j-k);
			MD_Update(&m,&(state[0]),k);
			}
		else
			MD_Update(&m,&(state[st_idx]),j);

		/* DO NOT REMOVE THE FOLLOWING CALL TO MD_Update()! */
		MD_Update(&m,buf,j);
		/* We know that line may cause programs such as
		   purify and valgrind to complain about use of
		   uninitialized data.  The problem is not, it's
		   with the caller.  Removing that line will make
		   sure you get really bad randomness and thereby
		   other problems such as very insecure keys. */

MD_Update(&m,(unsigned char *)&(md_c[0]),sizeof(md_c)); MD_Final(&m,local_md); md_c[1]++; buf=(const char *)buf + j; for (k=0; k<j; k++) { /* Parallel threads may interfere with this, * but always each byte of the new state is * the XOR of some previous value of its * and local_md (itermediate values may be lost). * Alway using locking could hurt performance more * than necessary given that conflicts occur only * when the total seeding is longer than the random * state. */ state[st_idx++]^=local_md[k]; if (st_idx >= STATE_SIZE) st_idx=0; } }  

The code-analysis software thought the repeated calls (this is inside a loop) to MD_Update(&m,buf,j) made use of uninitialized data. This may actually have been the case, in that perhaps some entropy was indeed supposed to come from the uninitialized data. The code does look odd, though, from a deterministic point of view. Still, the point of the repeated calls to MD_Update() was to generate additional randomness.

Commenting out the call to MD_Update(&m,buf,j) greatly reduced the total available entropy. Supposedly the entropy now came from a single 16-bit number. This really breaks the random-number generator.

Debian discovered and fixed the error in 2008.

Apple TLS bug

Here is the source code (Apple has open-sourced a lot of OS X, but not the GUI parts of it):

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
	OSStatus        err;
	...

	if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
		goto fail;
	if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
		goto fail;
		goto fail;
	if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
		goto fail;
	...

err = sslRawVerify();
...

fail: SSLFreeBuffer(&signedHashes); SSLFreeBuffer(&hashCtx); return err; }

Note the duplicated "goto fail". Note the lack of enclosing {}. Despite the appearances of the indentation, the second goto fail is thus always executed. The call to SSLVerifySignedServerKeyExchange() always jumps to fail. But most likely we still have err==0 at this point (because none of the err != 0 checks actually succeeded), and so the connection is verified even if there was a certificate mismatch.

It's times like this that it is really handy to have the compiler doing dead-code detection. Fail.

Ok, nobody was looking at the Apple source here outside of Apple.

OpenTLS bug

Most servers used OpenSSL, officially OpenTLS. TLS contains a heartbeat provision: the client sends occasional "heartbeat request" packets and the server is supposed to echo them back, exactly as is. This keeps the connection from timing out. That is, the client sends a (len,data) pair, and the server is supposed to echo back that much data. Part of the reason for echoing back data is so the client can figure out which request triggered a given response. It might make the most sense for the client request data to represent consecutive integers: "0", "1", ..., "10", "11", "12", ....

The problem is that the client can lie: the client can send a request in which len ≠ data.length. If len < data.length, this is harmless; just the first len bytes of data get sent back. But what happens if len > data.length, and the server sends back len bytes? In this case the server would try to send too much back. In a sensible language, this would result in an array-bounds exception for data. In C, however, the result is the grabbing of a random chunk of memory beyond data. Suppose a sneaky client sent, say, a 3-byte payload, but declared the payload length (the value of len) as, say, 1000 bytes. The server now sends back 997 bytes of general heap memory, which may contain interesting content. Unpredictable, but interesting.

xkcd diagram of heartbleed exploit


Here is the original code, from tls1_process_heartbeat(SSL * s). The variable payload represents the length of the heartbeat payload.

n2s(p, payload);	// extract value of payload (the length)
pl = p;
...
if (hbtype == TLS1_HB_REQUEST) { unsigned char *buffer, *bp; int r; /* Allocate memory for the response, size is 1 bytes * message type, plus 2 bytes payload length, plus * payload, plus padding */ buffer = OPENSSL_malloc(1 + 2 + payload + padding); bp = buffer; /* Enter response type, length and copy payload */ *bp++ = TLS1_HB_RESPONSE; s2n(payload, bp); memcpy(bp, pl, payload); // pld: copy payload bytes from pointer pl to pointer bp (=buffer, above) bp += payload; /* Random padding */ RAND_pseudo_bytes(bp, padding); r = ssl3_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding); if (r >= 0 && s->msg_callback) // pld: this will get moved s->msg_callback(1, s->version, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding, s, s->msg_callback_arg); OPENSSL_free(buffer); if (r < 0) return r; }

There is no check here that the value of payload, the declared length of the payload, matches the actual length of the payload.

Here is the fix:

if (hbtype == TLS1_HB_REQUEST) 
	{
	unsigned char *buffer, *bp;
	int r;
	/* Allocate memory for the response, size is 1 bytes
	 * message type, plus 2 bytes payload length, plus
	 * payload, plus padding
	 */
	buffer = OPENSSL_malloc(1 + 2 + payload + padding);
	bp = buffer;

	/* Enter response type, length and copy payload */
	*bp++ = TLS1_HB_RESPONSE;
	s2n(payload, bp);

if (r >= 0 && s->msg_callback) // this got moved, so it always executes even if one of the returns below is made s->msg_callback(1, s->version, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding, s, s->msg_callback_arg);

if (1+2+16 > s->s3->rrec.length) // check if actual payload length > 0!
return 0;

memcpy(bp, pl, payload); // copy payload bytes from pointer pl to pointer bp (=buffer, above)

if (1+2+payload+16 > s->s3->rrec.length) // check if actual payload length is longer than declared length return 0; // RFC 6520 section 4 says to silently discard

bp += payload; /* Random padding */ RAND_pseudo_bytes(bp, padding); r = ssl3_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding); OPENSSL_free(buffer); if (r < 0) return r; }

Google reported the bug to the OpenSSL Foundation on April 1, 2014. It is estimated that somewhere between 15 and 60% of sites using SSL were affected.

The bug is now fixed. Nobody knows how much it was exploited.

The more interesting question might be why OpenSSL didn't get more attention before. There were people outside the OpenTLS Foundation looking at the code, but none of them noticed Heartbleed.

Ultimately, the problem was that OpenTLS was severely underfunded. The president of the OpenSSL foundation is (or was, in 2014; he's since left) Steve Marquess. In a blog post after Heartbleed, he described himself as the fundraiser. The OpenSSL foundation received $2,000/year in donations, and also did some support consulting (the latter earned a lot more).

In the week after Heartbleed, the OpenSSL foundation received $9,000, mostly in small donations from individuals. Not from the big corporate users of OpenSSL.

The foundation has one paid employee, Stephen Henson, who has a PhD in graph theory. He is not paid a lot. Before Steve M created the OpenSSL foundation, Steve H's income has been estimated at $20K/year. (The heartbleed error was not his.)

Despite the low level of funding, though, in the eight (or more) years before Heartbleed the OpenSSL Foundation was actively seeking certification under the NIST Cryptographic Module Validation Program. They understood completely that cryptography needs outsider audits.

As of 2014, at least, the two Steves had never met face-to-face. Like Steve M, Steve Henson moved on to other projects in 2017.

A month after the bug's announcement, the Linux Foundation announced that it was setting up the Core Infrastructure Initiative. They lined up backing from Google, IBM, Facebook, Intel, Amazon and Microsoft. The first project on the agenda was OpenSSL, and its struggle to gain certification from the US government.

In 2015 a formal audit of OpenSSL was funded. It is still ongoing; see isopensslauditedyet.com.

As of 2016, Black Duck reported that 10% of the applications they tested were still vulnerable to Heartbleed, 1.5 years after the revelation (info.blackducksoftware.com/rs/872-OLS-526/images/OSSAReportFINAL.pdf).

See also the Buzzfeed article "The internet is being protected by two guys named Steve".


More Open-Source Security Issues

Magento

Magento is an e-commerce platform owned by Adobe. It is described as having an "open-source ecosystem". It is available on github at github.com/magento. I assume that is the "community edition"; there is also an "enterprise edition".

A vulnerability was announced in April 2017 by DefenseCode LLC, six months after it was reported privately to Magento. Magento did not respond directly to DefenseCode. The vulnerability "could" lead to remote code execution, though there are some additional steps to get that to work, and it was not clear whether that actually happened in the wild. See www.defensecode.com/advisories/DC-2017-04-003_Magento_Arbitrary_File_Upload.pdf.

If a site adds a link to a product video stored remotely on Vimeo, the software automatically requests a preview image. If the file requested is not in fact an image file, Magento will log a warning message, but it will still download the file. The idea is to trick Magento into downloading an executable file, eg a .php file. An updated .htaccess file also needs to be downloaded, but once the basic file-download idea is working this is not difficult. Because of the way files are stored, the .php file should begin with ".h", so the .htaccess file ends up in the same directory.

Parts of the strategy also involve cross-site request forgeries (CSRF) against someone with Magneto management privileges at the target site. Even low-level privileges are sufficient. There are numerous strategies for trying to get someone to click on a suitable link; none of them are sure bets.

Magneto's failure to respond more aggressively to DefenseCode is puzzling. This seems less likely in a "true" open-source project.

Equifax

In August 2017 credit-reporting company Equifax discovered that personal records on 140 million US people had been exfiltrated. The actual vulnerability was in Apache Struts 2, which is  a framework for developing Java web applications. It extends the Java Servlet API. Struts has a history of serious vulnerabilities. The vulnerability in question was CVE-2017-5638, which allows for remote code execution.

Equifax was notified of the issue in March 2017. They looked, but could not find a place on their site where Struts was being used! The vulnerability went unpatched, even as it was being routinely exploited in the wild.

Making sure tools and frameworks are updated is tedious work. Something like Struts can be buried in the archives of a project that was largely completed years ago. Still, Equifax's negligence is hard to understand.

Shellshock

This is the bug that was in bash that allowed very easy remote code execution. It was discovered and fixed in 2014. The problem is that the act of setting an environment variable -- normally benign -- could be tricked so as to execute a shell script as well, if the environment variable began with a shell function definition.

The shell maintains the environment as a list of (name,value) pairs. Specifically, we can assign an environment variable as follows:

FOO=here_is_my_string

echo $FOO

We can also assign shell functions, eg with

FN() { echo hello;}

(the space before "echo" is significant). Prior to Shellshock, this would be stored in the environment as the pair (FN, "() { echo hello;}").

The problem was that one could sneak a command following the function definition, and it would be executed at the time the environment variable was set:

FN() { echo hello;}; echo "hello world!"

(The command is not actually part of the function.) This is due to a mistake in recognizing the end of the function body. The above looks to older versions of bash like the following environment-variable assignment, followed by a command.

FN='{ echo hello;}; echo "hello world!"'

Whenever this definition is executed (eg at the time it is made, and whenever the environment is passed to a new process), the 'echo "hello world!"' command would go along for the ride. The core problem here is with shell variables that begin with '()'; normal shell-variable strings do not result in this problem. That is, assigning a variable FN2='echo "hello world"' would not result in command execution; nor would FN3="foo"; echo "hello world" (though when done for the first time the FN3 example would echo "hello world" once).

This is most of the problem. The other part is that web servers, and quite a range of other software, routinely accept strings from clients that they then turn into environment variables. For example, if the client sends the following GET request to the server (from blog.cloudflare.com/inside-shellshock):

GET / HTTP/1.1
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,fr;q=0.6
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36
Host: cloudflare.com

then the server turns it into the following environment settings (before executing the GET request):

HTTP_ACCEPT_ENCODING=gzip,deflate,sdch
HTTP_ACCEPT_LANGUAGE=en-US,en;q=0.8,fr;q=0.6
HTTP_CACHE_CONTROL=no-cache
HTTP_PRAGMA=no-cache
HTTP_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36
HTTP_HOST=cloudflare.com

The server does this so that a cgi script on the server will have access to these values, through its environment. However, because of the bug, setting environment variables can lead to code execution!

Now, if the client instead sent the following text as User-Agent (there is no actual "string" here, in fact):

User-Agent: () { :; }; /bin/eject

Then the command /bin/eject would be executed on the server. (As for the shell function, ":" is the bash null command; the legal function part is () { :; };.

A consequence of the patch is that bash functions are no longer stored in the environment at all. (This wouldn't strictly have been necessary, but it probably was a good idea.)

Meltdown and Spectre

The Meltdown and Spectre vulnerabilities affect Intel CPUs; Intel open-sources nothing about its processors. But open-source operating systems still had to create patches.

The problem is that open-source patches had to be created while there was still a news embargo on the details of the vulnerability, lest miscreants get a leg up on exploiting them.

Meltdown and Spectre were discovered in the summer of 2017 by Jann Horn of Google, Werner Haas and Thomas Prescher from Cyberus Technology, and Daniel Gruss, Moritz Lipp, Stefan Mangard and Michael Schwarz from Cyberus Technology. The discoverers informed Intel, and all agreed to announce the flaw only once there was a fix in place.

In November 2017, Alex Ionescu noticed an upgrade to Windows 10 that made the CPU run slower, with no obvious benefit. He suspected this was a vulnerability patch.

On Wednesday January 3, there were new commits to the Linux kernel. Observers quickly noticed that the commits didn't seem to make sense, from a performance perspective. Rampant speculation that they were related to a hardware vulnerability led to the announcement of Spectre and Meltdown on that date. The scheduled release/announcement date was to be January 9, Microsoft's "Patch Tuesday".

Still, the Linux community by and large did abide by the embargo rules. This is complicated, because it means not using the public discussion system that has been put in place.

The head of OpenBSD, Theo de Raadt was extremely vexed, as OpenBSD was not given advanced warning:

Only Tier-1 companies received advance information, and that is not responsible disclosure – it is selective disclosure.... Everyone below Tier-1 has just gotten screwed.

De Raadt also argued that, while Spectre might be considered an unforeseen bug, the issues behind Meltdown were long understood to at least some degree. Intel decided to go ahead with their speculative-execution design anyway, in order to beat the competition.

OpenBSD did announce a fix before the end of the embargo for the Wi-Fi Krack vulnerability of October 2017. The theory at that time was that OpenBSD would therefore not be given advance warning of the next vulnerability. But several other smaller OS vendors (Joyent, SmartOS) also were not told about Meltdown/Spectre in advance.

Still, there is a real problem: to abide by embargo rules means sitting on a fix, knowing your users might be being attacked. It means your source is, until the embargo ends, no longer "open".


Windows Security

Windows systems are quite hard to secure, partly because there are so many files that are not well-understood within the security community, and partly because of the register.

Licensing rules don't help. If you want to verify the SHA-3 checksum of every file, for example, you pretty much have to boot off a clean bootfile. Otherwise, malware that has made it into the kernel can make it seem that all files are unchanged. However, in Windows, a separate bootfile technically requires a separate license. And in practice making a bootable Windows USB drive is not easy, without paying for a separate license.

Cryptography Software

It is certainly reasonable to think of commercial software as reliable. After all, one can read the reviews, and a company couldn't stay in business if it didn't keep its users happy. If the software doesn't work properly, won't users know?

If the software is a word processor, and it keeps crashing, or failing to format text properly, the failure is obvious. If the software does complex engineering calculations, that's harder to detect, but usually users do at least a little testing of their own.

But if closed-source software does encryption, it is almost impossible to verify that it was done correctly. Just what random-number algorithm was used? Was the AES encryption done properly? (If it was not, then the problem becomes very clear if the software is sending encrypted content to an independent decryption program, but this is often not the case. Lots of times, the same program is used to encrypt files and then later to decrypt them, in which case an algorithm with a security flaw is almost undetectable.) Was the encryption done first, followed by the authentication hash (HMAC), or was it the other way around? Doing it the first way is much safer, as the HMAC then provides no information about whether brute-force decryption is on the right track.

For reasons like this, some commercial encryption software is audited. But usually it is not. The bottom line is that commercial crypto is hard to trust.

Sometimes open-source isn't audited either. But it's hard to find out. And even with OpenSSL, people outside the OpenSSL foundation were looking at the basics of encryption, to make sure it looked ok.

Software Trust

Whatever one can say about open-source reliability (which, in general, is comparable to commercial-software reliability), open source wins the trust issue hands down. Once upon a time, most software was trustworthy, in that you could be reasonably sure the software was not spying on you. Those days are gone. Most Android apps spy, to some degree or another, on the user. Microsoft Office appears not to, but Windows 10 sends quite a bit of information back to Microsoft (some of this can be disabled through careful configuration).

Spyware is almost unknown, however, in the open-source world. Ubuntu has a search feature, that returns some information to Canonical, but that's reasonably up-front. Firefox often asks to send crash reports, but, again, that's open. The reason open-source spyware is so rare is:

Ironically, "free" software that is not open-source is usually spyware: "if you're not paying for the product, you are the product". Most browser add-ons, in particular, are spyware. Many Android apps are spyware; flashlight apps are notorious.

Open-source trust is not always quite straightforward. Firefox, for example, is the browser with the most privacy-protection features, hands down. A version of Firefox is the basis for the Tor browser. That said, many Firefox privacy features are disabled by default, because some commercial sources of Firefox funding have had concerns.