Comp 141 makeup notes for July 4, Part 1

Shotts: 

Many of the bash scripts used in this lecture are here.

Video at luc.zoom.us/rec/share/e8WerKk2qeFqkquZr1sbmpyLQiAoMPuKYep5eVUd0fGewXvesJtp5IQNiS3JcjVY.6kn0QVqbf-z7JbNR

Shell Scripting

This is an overview of the basics of shell scripting

Control structures:

    expressions and test
        strings
        files
        numbers

    if

if test ! -e $filename                    # note where the ! goes, for not
then
    echo "file $filename not found"
fi

    case

case $MONTH in
    january)                    # no quotation marks
        MONTH=01;;       # you can also put the ;; on a line by itself
    february)
        MONTH=02;;
    *)                               # these are actually patterns; * matches anything
        MONTH=13;;
esac

    for

for i in *.text
do
    if grep --silent hello $i
    then
        echo "$i says hello"
    fi
done

    while

i=0
sum=0
while test $sum -lt 1000000
do
    i=$(( $i + 1))
    sum=$(($sum+$i))
done
echo $i

This adds consecutive numbers until the sum is greater than a million, and prints how many were needed.

bash arguments

Shell scripts can be given arguments on the command line. The first ten arguments are $0 through $9, with $0 being the command name itself. (But there is no $10, or anything higher!) The list of all the arguments is $* (though all the arguments in a bash array is $@, and is usually a better choice). The number of arguments is $#. Below is echo3.sh, which echoes arguments 0 through 3.

#!/bin/bash
# echo3.sh
echo $0
echo $1
echo $2
echo $3

Notice that in this script we are referencing arguments even if they are not supplied! (This will break with the set -u statement, which prohibits the use of undefined shell variables, even, say, $3.)

If more than three arguments are provided to echo3.sh, they are ignored. If only two are provided, then $3 is undefined, and is (without the set -u option) equivalent to the empty string, and is echoed as such, creating a blank line. Note the appearance of $0. Also, note what happens to ./echo3.sh foo 'bar baz' quux: argument two is the string 'bar baz'.

Next, consider this echoeach command:

#!/bin/bash

for i in $*
do
    echo $i
done

Try ./echoeach.sh foo bar baz. Try it also with fewer arguments, or more. Note that we are not accessing any undefined arguments.

A more interesting demo is ./echoeach foo 'bar baz' quux. What does it do that is wrong? It gets four arguments, which is not what we want. A fix is to use "$@", with the quotes. $@ is a bash array of all the arguments (never mind exactly what this is), and putting quotes on it quotes each element in turn. This technique of putting quotes on does not work with "$*".

You are strongly encouraged to use "$@" for the list/array of command-line arguments when the arguments might have embedded spaces.

The number of arguments is $#, not including $0. It is very useful for determining if the right number of arguments have been supplied. See the "mycal" example in the case section.

You can use the command shift to pop off the first argument $1, and move $2 to $1, $3 to $2 etc. $# is updated. See the while section below.


bash if

Suppose we want our script to check for some condition, and have the result of that check affect its further action. The bash if command helps here. We also need bash conditions. Any command can be used as a condition. Bash looks at the command's exit code, with true represented by an exit code of 0 (the normal case) and false represented by anything else. (Note that this is the reverse of the C convention.) We will start with the test command, and the equivalent [ command:

    test expression

    expression  ]

There is a newer version of test, [[   ]], that also includes support for regular-expression matching.

Here are some file tests:

    test -f file    # file is a regular file

    test -d file    # file is a directory

    test -e file    # file exists; useful for wildcard matching

    test -L file    # file is a symlink

    test -r file    # file is readable (also -w for writable)

    test file1 -nt file2    # file1 is newer than file2, in terms of the file-modification date

Here is nodirs.sh, which checks if there are no subdirectories of the current directory. It also uses the return code to signal this.

for i in *
do
    # echo $i
    if test -d $i
    then
        echo "directories found"
        exit 1
    fi
done

echo "no directories"
exit 0

Here are some string tests.

-n string        The length of string is greater than zero.
-z string        The length of string is zero.
string1 = string2
string1 == string2        string1 and string2 are equal. Single or double equal signs may be used. Most people use ==, but it is not Posix.
string1 != string2        string1 and string2 are not equal.
string1 > string2        string1 sorts after string2.  Warning: > must be escaped from the shell
string1 < string2        string1 sorts before string2.   Warning: < must be escaped from the shell

Example: Shotts answer.sh, modified to take a command-line argument

#!/bin/bash
# test-string: evaluate the value of a string
ANSWER=$1
if [ -z "$ANSWER" ]; then
    echo "There is no answer." >&2
    exit 1
fi
if [ "$ANSWER" = "yes" ]; then
    echo "The answer is YES."
elif [ "$ANSWER" = "no" ]; then
    echo "The answer is NO."
elif [ "$ANSWER" = "maybe" ]; then
    echo "The answer is MAYBE."
else
    echo "The answer is UNKNOWN."
fi

Here are some numeric tests. These only work for integers

    integer1 -eq integer2            integer1 is equal to integer2.
    integer1 -ne integer2            integer1 is not equal to integer2.
    integer1 -le integer2             integer1 is less than or equal to integer2.
    integer1 -lt integer2              integer1 is less than integer2.
    integer1 -ge integer2            integer1 is greater than or equal to integer2.
    integer1 -gt integer2             integer1 is greater than integer2.

((  )) tests for numeric zero

You can make Boolean combinations with -a for and, -o for or, and ! for not. The ! for not goes after the test command.

Any command can be an if condition; bash goes by the return code of the command. See the while section for an example

Comments

A comment line begins with #. Such lines are "invisible" to bash, and so can not be the entire body of an if statement. Use the empty statement (a colon) for that:

    if [ -e $file ]
    then
        :
    else
        echo "file not found"
    fi

This is another way to write

    if [ ! -e $file ]
    then
        echo "file not found"
    fi

Basic settings

I recommend always including these at the start.

set -eu
set -o pipefail

The set built-in command lets you control the bash environment. set -e makes your script exit immediately if one stage of a pipeline fails. This is usually what you want. Similarly, the option setting set -o pipefail makes sure that pipeline error codes get returned properly.

set -u causes an error if you try to use an unset variable

command substitution

Sometimes we want to test the output of a subcommand. We us a special syntax that converts the output of the command to a string, which can be tested in the parent script. Just enclose the subcommand in $(    ). This is traditionally called "command substitution".

Example:

    ls -l $(which dash)

This gets "ls -l" information about the command

The expr command evaluates an arithmetic expression (where operands have to be separated from numbers by spaces, and '*' must be escaped from the shell to avoid globbing). If we create a shell "loop", here is how we can increment a variable:

    i=0

    i=$(expr $i + 1)

We can also use the double-parentheses trick, i=$(($i + 1)). Then the '*' does not need to be escaped, and you don't need spaces around each token, so this is better overall. Note the inner $ is needed to get the value of i, and the outer one is part of the $((  )) construct.

The cut command is useful for extracting particular fields from the output of another command. It is possible to use it to extract a set of columns by column numbers, thus allowing the parsing of the output of ls, but the use of cut is easier for output that is essentially in "csv" (comma-separated values") format (the separator does not have to be a comma).

Linux has a builtin command basename that takes a filename like /usr/bin/dircmp and returns the "base" part, "dircmp". What if we want to remove the "extension" from a file; that is, convert foo.text or foo.pdf to just foo? We'll treat the string as two fields with separator '.', and use cut to get the first field. Note that fields in cut are numbered starting with 1, not 0.

    echo foo.pdf | cut -d. -f1

Or, if the filename is in $file and we want to set the "root" filename to the variable rootname,

    rootname=$(echo $file | cut -d. -f1)

What does this do to foo.bar.baz?

There's a fancier way:

    echo "${file%.*}"

From the manual

${parameter%%word}

The word is expanded to produce a pattern [here .*] and matched according to the rules described below. If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern deleted.

Useful Helpers

awk: this is the universal parser, but it's a programming language in its own right.

cut: cuts out part of a line. Lines can be split into fields by any delimiter

tr: translates one set of characters to another.

bash case

My example will be mycal, below, which converts written-out months to numeric form for the cal command (actually, cal nowadays does take written-out months)

#!/bin/bash
# http://redsymbol.net/articles/unofficial-bash-strict-mode/
# mydate: convert 'oct 24' to '10 2024' for the cal command
# Actually, modern versions of cal do accept the month as "jan" or "january", etc.
# Try the month 9 1752

DEBUG=1

set -eu
set -o pipefail

if [ $# -ne 2 ]
then
    echo "usage: mycal month year"
    exit 1
fi

MONTH=$1
YEAR=$2

if test $YEAR -lt 100        # two-digit year
then
    YEAR=$(($YEAR + 2000))
fi

MONTH=$(echo $MONTH | tr '[A-Z]' '[a-z]')    # convert to lowercase; escape char classes from shell
if test $DEBUG -eq 1;then echo "MONTH is $MONTH"; fi

if [[ $MONTH =~ ^[a-z] ]]    # if MONTH starts with a letter; this is a regular expression
then
case $MONTH in
    january|jan)
        MONTH=01;;       
    february|feb)
        MONTH=02;;
    march|mar)
        MONTH=03;;
    april|apr)
        MONTH=04;;
    may)
        MONTH=05;;
    june|jun)
        MONTH=06;;
    july|jul)
        MONTH=07;;
    august|aug)
        MONTH=08;;
    september|sep|sept)
        MONTH=09;;
    october|oct)
        MONTH=10;;
    november|nov)
        MONTH=11;;
    december|dec)
        MONTH=12;;       
    *)            # this is a pattern match; "*" matches everything
        echo "$MONTH is not a month"
        exit 1;;
esac
fi

if test $DEBUG -eq 1;then echo "MONTH is $MONTH, YEAR is $YEAR"; fi
 
cal $MONTH $YEAR

Some things to note here:

bash for

This is the workhorse of bash loops. Typically we are interested in looping over a set of globbed filenames:

for i in *.text
do
    if grep --silent hello $i
    then
        echo "$i says hello"
    fi
done

If a .text file has the string "hello" in it, it is included in the output.

Note the use of grep as a test condition for if; there is no test command here! Also note the --silent.

bash while

The opening example was to find the first N such that 1+2+...+N >= 1,000,000. This can't in general be done with for loops. One good use of while is when reading from a file. Here is linecounter:

if [ $# -ne 1 ]
then
        echo "usage: linecounter filename"
        exit 1
else
        FILENAME=$1
fi

echo "FILENAME is $FILENAME"

NUMLINES=0

while read x
do
    NUMLINES=$(($NUMLINES + 1))
    echo "$NUMLINES: $x"
done < $FILENAME

echo $NUMLINES

Note the argument-count check ($#). Also note that we're reading each line into a shell variable x, but not using x at all.

Here's another use of while to print all the arguments, like echoeach.sh above:

#!/bin/bash
while test $# -gt 0
do
    echo $1
    shift
done

After each shift, the first command-line argument gets popped, and all the others move up a notch (so $2 becomes $1, etc). $# is also updated.