Comp 141 Virtual Class 7 notes

Fourth of July, part 2

Shotts: 

This is the second part of the July 4 Alternative Lecture, and we'll cover a variety of things:

Many of the bash scripts used in this lecture are here.

Video at luc.zoom.us/rec/share/k0sz6k5ACj7mwa8hcV1O6bQCxx4Lf3FqSrxkc02f6MhM0hjcqooLmR-87HdNQ1s.aFCg9sFeUzGYsBDZ.

Options Processing for bash scripts

Let's start with this script that reads a file (the filename is provided as the single argument), converts the entire file to lowercase, and writes to stdout. It has no options itself, but we'll use it in something else that does. The script works by reading in the input file one line at a time, and using the tr command to convert that line to lowercase, and then writing to stdout.

#!/bin/bash
set -eu
set -o pipefail

if [ $# -ne 2 ]
then
    FILENAME=$1
fi

while read THELINE
do
    echo $THELINE | tr A-Z a-z        # convert THELINE to lowercase, and write it to stdout
done    < $FILENAME

Now we want to embed this in another script, lcbunch, that converts all the regular files listed on the command line as file arguments. lcbunch also has the following options, supplied as command-line "option arguments":

    -v                    print the name of each file converted on stdout
    -e extension    add the supplied extension to each new file created
    -d dirname     put the converted files in the given directory (which must already exist)

The user must supply either -e or -d. Both can be supplied. All options must come before any files. As an example,

    lcbunch -v -d mydir *.text

converts all the .text files and puts the converted files into mydir.

We will do this two ways, first using bare-bones bash, and the shift command, and then with getopts, which automates this kind of option processing.

Here is the bare-bash version:

#!/bin/bash
#    -v                    print the name of each file converted on stdout
#    -e extension    add the supplied extension to each new file created
#    -d dirname     put the converted files in the given directory (which must already exist)

set -eu
set -o pipefail

OPTDONE=0
VERBOSE=0
OUTDIR="."
EXTENSION=""
USAGE="usage: lcbunch [-v] [-e extension] [-d directory] files"

while test $OPTDONE -eq 0 -a $# -ge 1            # options processing is done in this loop
do
    case $1 in
       -v)
           # echo "setting VERBOSE"
           VERBOSE=1
           shift    # for the -v
           ;;
       -e)
           EXTENSION=$2
           # echo "setting EXTENSION to $EXTENSION"
           shift    # for the -e
           shift    # for the option value following -e
           ;;
       -d)
           OUTDIR=$2
           shift    # for the -d
           shift    # for the option value following -d
           ;;
       *)
           OPTDONE=1
           ;;
    esac
done

if test -n "$EXTENSION"
then
    EXTENSION='.'$EXTENSION        # add the dot
fi

# check if EXTENSION or OUTDIR was supplied
if test -z "$EXTENSION" -a "$OUTDIR" == "."    # neither changed
then
    echo "Must supply -e or -d"
    echo $USAGE
    exit 2
fi

# At this point, $* just consists of files to be converted
while test $# -gt 0
do
    INFILE=$1
    if test -f $INFILE                # make sure it's a regular file
    then
        OUTFILE=$OUTDIR/${INFILE}${EXTENSION}    # curly braces to avoid running into one another
        if test $VERBOSE -eq 1; then echo "converting $INFILE" to $OUTFILE; fi
        lcasecopy $INFILE > $OUTFILE        # copy this file
    fi
    shift                    # next file
done

The first while loop does the options processing. We work through the $* string of all arguments (with shift), after setting shell variables for all the option defalts (VERBOSE, EXTENSION and OUTDIR). If the first argument is -v, we set VERBOSE, and shift. Now $1 is the next argument, and we go around the while loop again.

Likewise, if the first argument is -e, we set EXTENSION to $2. We then shift twice, to pop both those values off of $*. Similarly for -d.

As long as we have options, either -v alone or an -e or a -d with something following, we keep going. Each time we start the while loop, if there are any more options then $1 must be -v, -e or -d. When we are done with the options, and get to the files, the case *) matches, and we set OPTDONE=1. This leads to our exiting from the loop, and moving on to the file-processing section.

After the while loop, we add a "." to EXTENSION, if necessary, and check that either EXTENSION or OUTDIR is different (otherwise we'll be writing files in place, which is bad).

At this point we've popped all the options from $*, and all that's left is the files to convert. We handle them one at a time, shifting after each one. We're done when the argument count, $#, reaches zero.

Note that we redirect the stdout of lcasecopy into the filename $OUTFILE

The getopts version

The only thing that changes is the first loop. It is now

while getopts "ve:d:" opt
do
    case $opt in
       v)
           echo "setting VERBOSE"
           VERBOSE=1
           ;;
       e)
           EXTENSION=$OPTARG            # OPTARG is set by getopts
           echo "setting EXTENSION to $EXTENSION"
           ;;
       d)
           OUTDIR=$OPTARG
           ;;
       *)
           echo $USAGE
           exit 1
           ;;
    esac
done

shift $(($OPTIND-1))

getopts is built into bash. The first string is the list of option letters, and whether something must follow them. "ve:d:" means that the options are -v, -e and -d, and the latter two have an argument following them (as indicated by the colon). The opt is a shell variable name of our choosing.

getopts also sets the two special variables OPTARG and OPTIND. OPTARG is the option argument, if one is expected; it plays the role of $2 in the first version. OPTIND is how many places in $* we have advanced.

We do not do a shift after each argument. Instead we do it all at the end, with shift $(($OPTIND-1)).

We also omit the hyphen before the option letters: they are v, e and d, not -v, -e and -d.


bash debugging

The first and most basic technique is to print out key variables at selected points, with

    echo $myvar

or, better yet,

    echo "filename is $myvar"

or something else that will help you understand what you're looking at.

It is helpful to have these print out only if some variable, eg DEBUG, is set to 1:

DEBUG=1

if test $DEBUG -eq 1; then echo "filename is $myvar"; fi

debug mode

There is also debug mode, enabled by running your script as

    bash -x myscript.sh

Debug mode prints out

As an example let's look at numsum:

#!/bin/bash
# adds up the integers from 1 to $1

NUM=$1
SUM=0
while test $NUM -ne 0
do
    SUM=$(expr $SUM + $NUM)
    NUM=$(( $NUM - 1))        #alternative way to do arithmetic
done
echo $SUM

The output of bash -x numsum 4 looks like this (some comments added).

+ NUM=4
+ SUM=0
+ test 4 -ne 0            # first time through the loop
++ expr 0 + 4
+ SUM=4
+ NUM=3
+ test 3 -ne 0            # second time through the loop
++ expr 4 + 3
+ SUM=7
+ NUM=2
+ test 2 -ne 0            # third time through the loop
++ expr 7 + 2
+ SUM=9
+ NUM=1
+ test 1 -ne 0            # fourth time through the loop
++ expr 9 + 1
+ SUM=10
+ NUM=0
+ test 0 -ne 0
+ echo 10
10

The lines beginning with ++ show external commands (not built-ins) that are invoked. In this case, that's expr.

Recall the example from earlier of linecounter where I piped the file into the while loop, instead of redirected the input. Use of debug mode let me see that the NUMLINES variable was being regularly incremented, only to revert to 0 at the end. From that, and with a little help from StackExchange, I was able to figure out that pipes live in subshells, and subshell variables have no connection to parent-shell variables.

You can turn debug mode on and off from within your script dynamically with

    set  -x    # enable debugging

    set +x    # disable debugging

Note that minus, "-", is used to add debugging, and plus, "+", is used to take away debugging.

enabling syntax highlighting in your text editor can also help here. In vi/vim, you do that with :syntax on, in command mode.

Syntax errors

Here are a few common ones:

Note that leaving off a do or then can be confusing, since bash does allow multiple commands as part of if/while tests, and the do/then marks the end of this list.

Shotts has a good section on this in Chapter 30.

Expansion problems

Recall that bash does line expansion whenever a command (even a built-in) is executed. This exposes you to issues with file-name globbing and variable expansion. The Shotts example is this:

number=            # empty string!
if test $number = 1
then
    echo something
fi

We should have written if test "$number" = 1, with quotes, but we didn't. So the second line expands to if test = 1, which is an illegal test expression.

Expansion problems in bash are legion. Watch out!

If we create a file with the above (so that the if test $number = 1 is line 7) and run it, we get

    line 7: test: =: unary operator expected

That's mysterious, since looking at the code we're clearly using = as a binary operator. But here it is with bash -x:

+ number=
+ test = 1
trouble: line 7: test: =: unary operator expected

Now it is clear that the arguments to test are = 1, and indeed something is missing.

If we put "" around $number (as we should), the original script runs fine (though as written it produces no output, because "" is not equal to 1, and there is no else clause).

A couple more bash-specific issues are: