Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
July 11
Shotts:
Coming up, probably during the week of July 21, there will be a series of quizzes on Sakai, each involving the writing of one bash script. You can do them all together, or one at a time. These will count like a midterm exam.
A few nice, significant examples: www.macs.hw.ac.uk/~hwloidl/Courses/LinuxIntro/x864.html
Let's start with this script that reads a file (the filename is provided as the single argument), converts the entire file to lowercase, and writes to stdout. It has no options itself, but we'll use it in something else that does. The script works by reading in the input file one line at a time, and using the tr command to convert that line to lowercase, and then writing to stdout.
#!/bin/bash
set -eu
set -o pipefail
if [ $# -ne 2 ]
then
FILENAME=$1
fi
while read THELINE
do
echo $THELINE | tr A-Z a-z
# convert THELINE to lowercase, and write it to
stdout
done < $FILENAME
Now we want to embed this in another script, lcbunch, that converts all the regular files listed on the command line as file arguments. lcbunch also has the following options, supplied as command-line "option arguments":
-v
print the
name of each file converted on stdout
-e extension add the
supplied extension to each new file created
-d dirname put the
converted files in the given directory (which must already exist)
The user must supply either -e or -d. Both can be supplied. All options must come before any files. As an example,
lcbunch -v -d mydir *.text
converts all the .text files and puts the converted files into mydir.
We will do this two ways, first using bare-bones bash, and the shift command, and then with getopts, which automates this kind of option processing.
Here is the bare-bash version:
#!/bin/bash
#
-v
print the name of each file converted on stdout
# -e extension add the supplied
extension to each new file created
# -d dirname put the converted
files in the given directory (which must already exist)
set -eu
set -o pipefail
OPTDONE=0
VERBOSE=0
OUTDIR="."
EXTENSION=""
USAGE="usage: lcbunch [-v] [-e extension] [-d directory] files"
while test $OPTDONE -eq 0 -a $# -ge 1
# options processing is done in
this loop
do
case $1 in
-v)
# echo
"setting VERBOSE"
VERBOSE=1
shift # for the -v
;;
-e)
EXTENSION=$2
# echo
"setting EXTENSION to $EXTENSION"
shift # for the -e
shift # for the option value following -e
;;
-d)
OUTDIR=$2
shift # for the -d
shift # for the option value following -d
;;
*)
OPTDONE=1
;;
esac
done
if test -n "$EXTENSION"
then
EXTENSION='.'$EXTENSION
# add the dot
fi
# check if EXTENSION or OUTDIR was supplied
if test -z "$EXTENSION" -a "$OUTDIR" == "." # neither
changed
then
echo "Must supply -e or -d"
echo $USAGE
exit 2
fi
# At this point, $* just consists of files to be converted
while test $# -gt 0
do
INFILE=$1
if test -f $INFILE
# make sure
it's a regular file
then
OUTFILE=$OUTDIR/${INFILE}${EXTENSION} # curly braces
to avoid running into one another
if test $VERBOSE -eq 1; then
echo "converting $INFILE" to $OUTFILE; fi
lcasecopy $INFILE >
$OUTFILE # copy this file
fi
shift
# next file
done
The first while loop does the options processing. We work through the $* string of all arguments (with shift), after setting shell variables for all the option defalts (VERBOSE, EXTENSION and OUTDIR). If the first argument is -v, we set VERBOSE, and shift. Now $1 is the next argument, and we go around the while loop again.
Likewise, if the first argument is -e, we set EXTENSION to $2. We then shift twice, to pop both those values off of $*. Similarly for -d.
As long as we have options, either -v alone or an -e or a -d with something following, we keep going. Each time we start the while loop, if there are any more options then $1 must be -v, -e or -d. When we are done with the options, and get to the files, the case *) matches, and we set OPTDONE=1. This leads to our exiting from the loop, and moving on to the file-processing section.
After the while loop, we add a "." to EXTENSION, if necessary, and check that either EXTENSION or OUTDIR is different (otherwise we'll be writing files in place, which is bad).
At this point we've popped all the options from $*, and all that's left is the files to convert. We handle them one at a time, shifting after each one. We're done when the argument count, $#, reaches zero.
Note that we redirect the stdout of lcasecopy into the filename $OUTFILE
The only thing that changes is the first loop. It is now
while
getopts "ve:d:" opt
do
case $opt in
v)
echo
"setting VERBOSE"
VERBOSE=1
;;
e)
EXTENSION=$OPTARG
# OPTARG is set by getopts
echo
"setting EXTENSION to $EXTENSION"
;;
d)
OUTDIR=$OPTARG
;;
*)
echo $USAGE
exit 1
;;
esac
done
shift $(($OPTIND-1))
getopts is built into bash. The first string is the list of option letters, and whether something must follow them. "ve:d:" means that the options are -v, -e and -d, and the latter two have an argument following them (as indicated by the colon). The opt is a shell variable name of our choosing.
getopts also sets the two special variables OPTARG and OPTIND. OPTARG is the option argument, if one is expected; it plays the role of $2 in the first version. OPTIND is how many places in $* we have advanced.
We do not do a shift after each argument. Instead we do it all at the end, with shift $(($OPTIND-1)).
We also omit the hyphen before the option letters: they are v, e and d, not -v, -e and -d.
createfile.py
prefix*badfile
ls vs badls
Newlines in file names (that are listed as such by ls) make it impossible to write reliable shell scripts that use the output of ls. Even something simple like
for i in $(ls)
would fail: if a directory contains files "a", "b\nc" and "d" (where \n is a newline) then a newline-printing ls would produce
a
b
c
d
which is the wrong list!
So ls today will, as far as I can tell, never output a literal newline in a filename.
strings
Try echo '\n'. It does not print a newline! If you want newlines, use echo $'\n'. What about $"\n"? No.
This is a pure bashism.
IFS and ifsdemo
hello.c:
#include <stdio.h>
#include "hello.h"
void main() {
printf("hello, world!\n");
}
gcc hello.c
gcc -o hello hello.c
make and Makefile
hello: hello.c hello.h
gcc -o hello hello.c
What make actually does: touch hello.h
The existence of .h files make dependencies complicated in C. Make solves this problem.
The IDE vs make debate
pretty much all platforms
support make
it's straightforward to add complicated
testing or other alternative builds to Makefiles
The command line gives greater execution flexibility
than GUIs
On the other hand, Makefiles can be confusing
Makefile and leading tab
diction
download diction-1.11.tar.gz from
ftp.gnu.org/gnu/diction/
tar xzf diction-1.11.tar.gz
cd
./configure
make
now touch one of the source files and make
again
make install
signatures
gpg --import gnu-keyring.gpg
;; from ftp.gnu.org/gnu
gpg --verify diction-1.11.tar.gz.sig
The trust rabbit hole: WARNING: This key is not certified with a trusted signature!
configure options: Search for "Installation directory options"
ImageMagick
git clone
https://github.com/ImageMagick/ImageMagick.git
cd ImageMagick
./configure
make
Make and java: yes you can do it. but java files pretty much all compile independently. Still, make is helpful in that you can use it to recompile only the files you changed.
Regular expressions are a way to describe a set of strings. We've already seen them. First, the file-name matching is a form of regular expression: * matches any number of characters, ? matches one character, and [m-p] matches the letters m, n, o, or p. But note that this file-name matching uses a simplified form of regular expressions.
In 1997, Jamie Zawinski (then a dev at Netscape) posted the following on Usenet:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Abig part of the trouble with regular expressions is that they are so hard to read. There is no place for extraneous whitespace; blanks are, of course, significant characters. There is no way to embed comments. One does get somewhat used to the syntax,
The grep command is also all about matching regular expressions, although so far we've just used it to match strings. Grep has a bunch of useful options:
-i ignore
case
-v list lines not
matching the pattern (invert)
-n include line numbers
-q don't print anything, just return 0
if found or 1 if not found
Here are the "metacharacters", that have special meaning in matches:
.
matches any one character
^ anchor for the start of the string or line; negation
if the first character within [ ]
$ anchor for the end of the string or line
[ ] for creating character ranges, just like with filename matching
( ) for regular grouping
{ } match a regular subexpression a specific number of times
- for character ranges
? match a regular subexpression 0 or 1
times
* match a regular subexpression 0 or more times
+ match a regular subexpression 1 or more times
| between two alternative regular expressions, such as
in January|Jan in the case example
\ for quoting one of these to use it as a literal
character rather than a metacharacter
Basic regular expressions (BRE) use only ^ $ . [ ] *. We will here be using extended regular expessions (ERE). These are not recognized by plain grep; you must either use egrep or grep -E.
The file-regular-expression symbol "?" corresponds to the BRE ".": both match one character. As we will see below, the file-regular-expression symbol "*" corresponds to the BRE ".*": the "." means any single character, and the "*" means "the previous regular expression repeated 0 or more times". File-regular-expression and BRE for character ranges is pretty similar.
There are also "Perl-compatible regular expressions", the library is known as pcre. This is an entirely different regular-expression notation.
As an example of alternation, in its simplest form, suppose we want to search the output of ps -ef for either "chrome" or "firefox". Then we can use
ps -ef | egrep 'chrome|firefox'
We need the quotes because '|' has special meaning to bash (it is the pipe symbol). Also, note that I used egrep instead of plain grep; egrep enables a much larger set of regular expressions. Plain grep doesn't work here at all (although grep -E is the same as egrep, and officially is preferred).
As an example of character ranges, if we wanted to search for "loyola" or "Loyola" with grep, the pattern "[Ll]oyola" would work.
The ^ character marks the start of the string, in most cases. But as the first character of a range, it means match the characters not in the list. So "[^aeiou]a[^a-s]" matches "cat", but not "eat" or "cab".
There are also some built-in ranges:
[:alnum:] The alphanumeric
characters. In ASCII, equivalent to: [A-Za-z0-9]
[:word:] The same as [:alnum:], with the addition
of the underscore (_) character.
[:alpha:] The alphabetic characters. In ASCII,
equivalent to: [A-Za-z]
[:blank:] Includes the space and tab characters.
[:cntrl:] The ASCII control codes. Includes
the ASCII characters 0 through 31 and 127.
[:digit:] The numerals 0 through 9.
[:graph:] The visible characters. In ASCII, it includes
characters 33 through 126.
[:lower:] The lowercase letters.
[:punct:] The punctuation characters. In ASCII,
equivalent to: [-!"#$%&'()*+,./:;<=>?@[\\\]_`{|}~]
[:print:] The printable characters. All the
characters in [:graph:] plus the space character.
[:space:] The whitespace characters including
space, tab, carriage return, newline, vertical tab, and form feed. In
ASCII, equivalent to: [ \t\r\n\v\f]
[:upper:] The uppercase characters.
[:xdigit:] Characters used to express hexadecimal
numbers. In ASCII, equivalent to: [0-9A-Fa-f]
Warning: IBM mainframes use EBCDIC encoding, not ASCII, and in EBCDIC the alphabetic letters are not contiguous. So [a-z] fails. So if you find yourself in an EBCDIC environment, stop. (You can also use the above, or [a-ij-rs-z]).
A more serious character-set issue is ascii vs utf-8. grep actually does fairly well with that. Try grep ": ." grepdemo2.text.
As with file-matching regular expressions, the square brackets above still need another set of square brackets to make them ranges.
Grouping is often useful. If we wanted to search for "received" or "receiving", we can use the basic alternation form "received|receiving", or the equivalent but (maybe) simpler form with grouping, "receiv(ed|ing)"
In filename matching, '*' stands for "zero or more characters". In ERE, * stands for "repeat the previous sub-expression zero or more times". So ".*" matches any character zero or more times. "a(b|c)*" matches a, ab, ac, abb, abbcbbcccb, etc.
There is also ?, meaning "match zero or one times (but not more)", and +, which means "match one or more times". So the following matches signed integers:
(+|-)?[:digit:]+
That is, the + or - at the start is optional, and there has to be at least one digit.
Schotts has this regular expression for matching phone numbers, either in the (nnn) nnn-nnnn form or the nnn nnn-nnnn form
^\(?[0-9][0-9][0-9]\)? [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$
I only use nnn-nnn-nnn. You could fix this by changing that first space to ( |-), but that would also accept (nnn)-nnn-nnnn. Realistically you would need grouping (note that I am using the {n} quantifier from below):
^(\([0-9]{3}\) )|([0-9]{3}( |-))[0-9]{3}-[0-9]{4}$
There is also { }, which lets you match a specific number of times:
{3} match the previous subexpression 3 times
{3,} match the preceding subexpression 3 or more times
{3,6} match the preceding subexpression between 3 and 6 times
{,3} match the preceding subexpression between 0 and 3 times
There is also multiline mode. In that mode, "^" and "$" match the beginning and end of each line, versus the beginning and end of the entire string. We won't cover this.
Regular expression: * means repeat 0 or more times, ? means either 0 or 1 times
What strings match these?
What does a finite-state recognizer for these look like?
How about b (aa)* a? There's a difference here.
("machine" == "automaton", by the way)
Inputs, by the way, can be:
From last week:
Regular expression: * means repeat 0 or more times, ? means either 0 or 1 times
What strings match these?
What does a finite-state recognizer for these look like?
More examples of regular expressions:
These use slightly extended regexes (The google example does not support + or *)
\d matches any digit 0-9,
same as [0-9]
\W matches anything other than a letter, digit or
underscore, same as [^a-zA-Z0-9_]
\s matches a space
^ matches the start of the line; $ matches the end of the line
{3,6} means that whatever single-character thing preceding this can match
between 3 and 6 times
What does varname\W*=[^=] match?
Warning: there are quite a few different standards for regular expressions. Always read the documentation.
Let's call the finite-state recognizers finite automata. So far the finite-state recognizers have all been deterministic: we never have a state with two outgoing edges, going two different directions, that are labeled with the same input. A deterministic finite automaton is abbreviated DFA.
How about b (ab)* a? There's a difference here. Now we do have a state with two different edges labeled 'a'. Such an automaton is known as nondeterministic, that is, as an NFA. We can still use an NFA to match inputs, but now what do we do if we're at a vertex and there are multiple edges that match the current input?
There are two primary approaches. The first is to try one of the edges first, and see if that works. If it does not, we backtrack to the vertex in question and at that point try the next edge. This approach does work, but with a poorly chosen regular expression it may be extremely slow. Consider the regular expression (a?)n an. This means up to n optional a's, followed by n a's. Let us match against an, meaning all the optional a's must not be used. The usual strategy when matching "a?" is to try the "a" branch first, and only if that fails do we try the empty branch. But that now means that we will have 2n - 1 false branches before we finally succeed.
Example: (a?)3 a3.
A much faster approach is to use the NFA with state sets, rather than individual states. That is, when we are in state S1 and the next input can lead either to state S2 or state S3, we record the new state as {S2,S3}. If, for the next input, S2 can go to S4 and S3 can go to either S5 or S6, the next state set is {S4,S5,S6}. This approach might look exponential, but the number of states is fixed.
Example: (a?)3 a3.
See also https://swtch.com/~rsc/regexp/regexp1.html, "Regular expression
search algorithms", the paragraph beginning "A more efficient ...."
By the way, a much better regular expression for between n and 2n a's in a row is an (a?)n. We parse n a's at the beginning, and the optional a's are all following.
The implementation of an NFA/DFA recognizer does literally use the graph approach: for each current state, and each next-input symbol, we look up what next states are possible with that input symbol. The code to drive the NFA/DFA does not need to be changed for different NFA/DFAs. This is a big win from a software-engineering perspective.
One more example of NFA state-set recognizer: aaa|aab|aac|aad
+----a->(3)--a->(7)+----a->(6)--d->(10)
Let's call the finite-state recognizers finite automata, which is the usual term used. So far the finite-state recognizers have all been deterministic: we never have a state with two outgoing edges, going two different directions, that are labeled with the same input. A deterministic finite automaton is abbreviated DFA.
How about the regular expression b (ab)* a? There's a difference here. Now we do have a state with two different edges labeled 'a'. Such an automaton is known as nondeterministic, that is, as an NFA. We can still use an NFA to match inputs, but now what do we do if we're at a vertex and there are multiple edges that match the current input?
There are two primary approaches. The first is to try one of the edges first, and see if that works. If it does not, we backtrack to the vertex in question and at that point try the next edge. This approach does work, but with a poorly chosen regular expression it may be extremely slow. Consider the regular expression (a?)n an. This means up to n optional a's, followed by n a's. Let us match against an, meaning all the optional a's must not be used. The usual strategy when matching "a?" is to try the "a" branch first, and only if that fails do we try the empty branch. But that now means that we will have 2n - 1 false branches before we finally succeed.
Example: (a?)3 a3, or (a?)(a?)(a?)aaa. Suppose the input is aaa. Here are the steps:
A much faster approach is to use the NFA with state sets, rather than individual states. That is, when we are in state S1 and the next input can lead either to state S2 or state S3, we record the new state as {S2,S3}. If, for the next input, S2 can go to S4 and S3 can go to either S5 or S6, the next state set is {S4,S5,S6}. This approach might look exponential, but the number of states is fixed.
Example: (a?)3 a3.
/---empty---\ /---empty---\
/---empty---\
(0) --- a --- (1)
--- a --- (2) --- a ---
(3) --- a --- (4) --- a --- (5)
--- a --- (end)
The steps:
See also https://swtch.com/~rsc/regexp/regexp1.html,
"Regular expression search algorithms", the paragraph beginning "A more
efficient ...."
By the way, a much better regular expression for between n and 2n a's in a row is an (a?)n. We parse n a's at the beginning, and the optional a's are all following.
The implementation of an NFA/DFA recognizer does literally use the graph approach: for each current state, and each next-input symbol, we look up what next states are possible with that input symbol. The code to drive the NFA/DFA does not need to be changed for different NFA/DFAs. This is a big win from a software-engineering perspective.
One more example of NFA state-set recognizer: aaa|aab|aac|aad
+----a->(3)--a->(7)
|
|
/--a->(4)--b->(8)
(1)--a->(2)
| \--a->(5)--c->(9)
|
+----a->(6)--d->(10)
We can "factor out" the initial "aa" to get aa(a|b|c|d).
How about aaa|aab|aac|add? We can now only factor one 'a' out: a(aa|ab|ac|dd). But fom (aa|ab|ac) we can factor another a: a((a(b|d))|dd). There are limits to this technique, but sometimes it is useful.
So far, everything you've done on your VM has been done as user
"comp141". We can become the superuser, or root, if we know the root
password. But we don't. However, we can also use the sudo
command:
sudo bash
The password to type is your regular user password.
Warning: the usual advice these days is to use sudo to run individual commands, not to create a root shell. With the latter, one mistake and it's all over.
Being root lets you look at the log files in /var/log, for example. It also lets you add new users, reconfigure the network and install packages.
If you have a root password, the su command might be a better bet.
it is common to need to
[Shotts chapter 14] Yes, you can compile packages. But it's a chore, and when there are compilation problems it's a real chore.
There are many different distributions of Linux: Ubuntu, Debian, Red Hat, Mint, CentOS, Fedora, Gentoo .... One of the biggest differences between them is the style of package management, and to some extent the back-end maintenance of packages.
The high-level package tool on Debian-like distributions, which includes Ubuntu, is apt (or apt-get, an older version).
It's always a good idea to start with apt-get update, which updates the known list of package repositories. Some installations:
dpkg -l lists all installed packages. Note, however, that some packages will likely have been auto-installed in the process of installing some other package.
[Shotts Chapter 17] The unix find command is pretty handy, though it doesn't let you search "inside" complicated filetypes like .docx or .pdf.
The syntax is find directory search-options
There may be no search options. For example, find ~ lists every file within your home directory. find ~ |wc -l counts them. find ~ -type d |wc -l counts your directories and subdirectories.
What if you want to find a file in or below the current directory by name, say "foo.text"? find . -iname '*foo*' works, where -iname is case-insensitive search (there is also -name for case-sensitive search). Note that you need the quote marks, if there is a match for *foo* in the current directory. We don't want shell filename expansion to get in the way here; we want the asterisks to be "expanded" by find.
Other useful options are -empty, -mtime, -newer, -perm mode, -type [d|f|...]. -maxdepth levels,
On page 229, Shotts describes an interesting technique for dealing with files with spaces in them.
Then there is the -exec option. This runs a command on every file found. For example,
find . -type f -exec file {} ';'
The {} represents the name of the file in question, and the ';' marks the end of the file command and the return to find. it almost always has to be quoted so bash doesn't interpret it as an end-of-command indication. Next, let's search for the string 'key' in any files in files10/project
find . -exec grep -i key {} \;
We can't tell where the files are, and we have a lot of messages about directories there. How about this:
find . -type f -and -exec grep -i -H key {} \;
The -H makes grep always print the file name, and the -type f check prevents checking directories.
There are optimizations to do the inner command just once on the whole lot of files found.
[Shotts chapter 15] If I plug in my usb drive, where does it appear in the filesystem? On Windows, it would get a new drive letter, like G:. On Ubuntu, it's usually in /media/username.
This is achieved through the mount command (automatically in this case). A disk device has a filesystem on it, which is a tree of directories, and files. There is one "root" directoriy. The mount command attaches such a device to a specific directory on the existing filesystem.
The file /etc/fstab lists what gets mounted where, by default.
A typical physical disk usually has multiple partitions. Each partition has a filesystem. You can view these with fdisk, but you can also destroy your disk so be careful. Fdisk takes a parameter representing the device for the entire disk, eg /dev/nvme0n1.
Mount shows a lot of "virtual" mounts. But we can focus on the disk mounts with mount | grep nvme0n1.
As for filesystems, they can be a variety of types. NTFS is the most common windows filesystem. Linux filesystems include ext2, ext3, ext4, btrfs, xfs, and more. Part of the disk is set aside as an array of inodes, which contain the file's permissions. Directories are tables of pairs <filename, inode>. Linux shows you the inode numbers with ls -i.
[Shotts chapter 16] Systems have network devices. Most acquire their IP address on startup through the Dynamic Host Configuration Protocol, or dhcp.
Once we're connected, we have these:
Shotts p 210. This is the secure shell, a remote-login (and remote-command-execution) utility. Per-user files (like keys) are kept in ~/.ssh.
This is based on public-key authentication. If you, on host A, want to log in to host B, you connect and ask for some authentication data from B. Unless this is your first connection, you already have B's public key. B can send a message signed with B's private key, and A can validate it.
Initial public/private key pairs are created by ssh-keygen.
Previously received public keys are kept in the file known_hosts.
Typically, new hosts are assumed to be trusted. This is called Trust On First Use, or TOFU.
Once B has validated its identity to A (to prevent A from falling prey to sending its passwords to the wrong host), the user on A might log in the usual way, by providing a username and password. The problem here is that random password guessing is still possible. (This is why the Loyola CS dept no longer allows password-based ssh logins, except through the VPN.)
A better approach is to authenticate by pre-arrangement. A will place A's public key in B's file authorized_keys. This means that A is allowed to log in to B without providing a password. A must prove it has the matching private key by encrypting some message selected by B using its private key. A sends it back to B; if B can decrypt the message with A's public key, all is good.
Getting ones public key (typically in
RSA was the original algorithm. Elliptic-curve algorithms are much more popular, and shorter.
If any permissions on the .ssh directory or any of its files are not what they should be (for example, if the private key is readable by anyone other than the ownin user), then the connection fails.
[Shotts chapter 35] Arrays in bash are arrays of strings. This is quite practical. In a bash script, $* is the unquoted string of all arguments, and "$*" is the quoted string of all arguments. Neither is quite right. Here is an updated version of echoeach.sh (in files7):
#!/bin/bash
echo '$*'
for i in $*
do
echo $i
done
echo '"$*"'
for i in "$*"
do
echo $i
done
echo '"$@"'
for i in "$@"
do
echo $i
done
If we invoke ./echoeach.sh foo 'bar baz' quux,
I used to use $* all the time, but it fails miserably for arguments with spaces. Here is my current script word, which starts OpenOffice on a file:
PROG=/usr/bin/soffice
$PROG "$@" 2>/dev/null &
(the 2>/dev/null just makes the stderr messages go away. They are usually useless.)
[Shotts chapter 25] The apache webserver can be installed with "apt install apache2". We can then open a browser on the same machine, and go to "http://localhost". We get the default page.
How about adding another html page (eg foo.html)? It goes in /var/www/html.
How about adding a shell script? Like this:
#!/bin/bash
echo "<html>
<head><title>System Information
Page</title></head>
<body>
<h1>System Information Page</h1>
<p>
hostname: $HOSTNAME
</body>
</html>"
What we want is for the script to run, and have the output show up as the web page. Note that we've got a multiline quote here, and we've got a system bash variable (HOSTNAME) in the script.
We have to enable this in /etc/apache2/sites-enabled/default.conf, with some weird magic. We also have to put the script itself in /usr/lib/cgi-bin, not /var/www/html. Having websites run scripts, with possibly untrusted input, can be risky; we don't want anything unexpected to be runnable.
It still doesn't work. But there's a fix, that turns out to have to do with HTTP itself, not apache2.