Class 3 Notes

Comp 141 Class 3 notes

June 6

Shotts Chapters 5: Command and 6: Redirection. Some material is from 7: Seeing the World as the Shell Sees It

Videos: 6. I/O Redirection, 7. Expressions and Substitutions

Review the basics: ls, cat, cd, pwd, rm, cp, mv, mkdir/rmdir

Use of wildcards like *.text, foo.?, [a-d], etc.

Expansion of wildcards is sometimes called globbing. It is done by the shell, before launching the command. An alternative design would be for every command to call a standard globbing library, but this is not how bash works.

Globbing can be disabled by quoting, either single or double quotes or the backslash:

ls '*.text'

ls "*.text"

ls \*.text

None of these is the same as ls *.text.

Finally, if there is no match, globbing just leaves the wildcarded string alone. In a directory with no text files, try

ls *.text

We can turn off globbing with set -f, though, and re-enable it with set +f. But normally this is never done.

Hard link examples

With files: ln orig.text link.text

Except the link.text file is a first-class file, it's not subordinate to orig.text. You could remove orig.text.

In fact this is how mv works: it links to the new location, and then deletes the original.

How about linking directories: ln dir1 dir2. You are not allowed to create your own hard links for directories.

Note that '.' and '..' are created for you. Nothing else is allowed. Why not?

Suppose you could. Then consider mkdir foo; cd foo; mkdir bar; cd bar; ln ../../foo. Anything that tries to traverse the filesystem recursively will now do infinite regress foo/bar/foo/bar/foo/bar/....

Backup programs do this.

Symbolic links are ok because the fact that they are symlinks can be detected.

More commands (from chapter 5)

man ls manual page (or help for shell built-in commands)

apropos hintwords Try it. Try "apropos remove". This was an interesting early attempt at context-sensitive help.

help Works for shell built-in commands

Many commands also have a --help option (like ls)

These can be

names of executable files, in some directory along the PATH
commands built into the shell
shell functions and aliases, defined within the shell

The type command tells us which. The Shotts examples are type type, type ls and type cp.

If a command is the name of an executable file, the which command will tell us where it is.

In the output of man and help, there are often syntax strings. An example:

cd [-L |[-P [-e]]] [dir]

Things in [ ] are optional. Things on either side of | are either/or. What all this means is that the parameter flags here can either be omitted, or can be -L alone, or can be -P, the latter either with or without a following -e.

The man pages are organized into sections. For example, man sleep returns the section-1 manual page, which is for the sleep command built into the shell. For the C call, use man -s 2 sleep.

Also, man bash | wc -l. (The second stage of the pipeline here is wc, which counts (with the -l option) the lines in its input.

Shell Environment Variables

The shell allows setting of environment variables, which are strings. To access the value of a shell variable, put $ in front of the name. To display a string, use echo. To set a value, use '=', with no space on either side.

FOO='hello'

echo $FOO

Perhaps the best known is PATH. To print it, echo works:

echo $PATH

PATH is used by the shell to look for commands, when one is typed into the shell. It's a list of directories, separated by ':'. It is very important for successful use of the command-line interface, and remains hard to work with in Windows (although it's easier than it used to be).

When the shell starts up, it reads commands from a designated file (often .bashrc). That's where people typically set PATH:

PATH=$PATH:/home/pld/bin

Shell command-line expansion

Commands consist of a command and multiple arguments, or parameters. The shell must expand wildcards (globbing), and also expand $ variables. There are actually seven kinds of expansion (see www.gnu.org/software/bash/manual/html_node/Shell-Expansions.html, but probably not just yet). For now, note that $ expansion comes before * expansion.

Let's make an example where we rely on the order of evaluation. We need to set FOO to be '*.text'.

Once we do this, and move if necessary to a directory with .text files, then

ls $FOO

is equivalent to the following

ls *.text # without quotes

That is, the $FOO is expanded first, and then the *.text.

Shell variables can't contain '*' and '?' and '[', so we can't create an example that would work only if wildcard expansion were done first and then $ expansion. But we can create an example where, after filename expansion we get a valid shell variable with $, but that does not get evaluated.

FOO=BAR

>'$FOO'

echo \$F?O

ls \$F?O

Both of the last do filename expansion to get $FOO (because that's the file we created), but do not then expand that to BAR, because wildcard expansion comes after variable expansion.

Normally the shell expansion order is exactly right for what you want most of the time.

stdin and stdout

Almost all Unix commands read from stdin and write to stdout. By default, these are the terminal. However, it is easy to redirect to read from or write to a file:

ls -l /usr/bin > ls.out.text

sort < file1 > file2

There is also stderr, a separate stream for error messages (because if you're redirecting stdout, you otherwise would not see them). ls -l /bin/usr > bin.text

Warning: redirecting to an existing file deletes the contents without notice

stdin, stdout and stderr correspond to Unix file descriptors, which are indexes into the per-process file-descriptor table. The file-descriptor indexes are 0, 1 and 2 respectively.

You can also append to a file with the >> operator.

You can redirect stderr with 2> (2 being the file descriptor index for stderr). More likely, you will want to combine stderr and stdout: 2>&1.

Always redirect stdout before stderr; see Shotts. If you redirect stderr to stdout (with 2>&1) and later redirect stdout, then stderr will not follow.

You can also redirect either stdout or stderr to /dev/null.

Redirection

Let's start with some handy commands for use with redirection:

cat [con]catenate: to display a file, or a list of files: cat foo1 foo2 foo3
grep to display the lines in a file that match a pattern: cat /etc/passwd | grep pld, ls -l | grep '^d'
wc displays line, word and byte counts for a file. I almost always use it as wc -l, giving the line count. ls -l | wc -l. But also ls | wc -l
sort to sort a file. The default is alphabetical order by line
uniq to remove duplicate lines. It only removes adjacent duplicates, so you generally have to call sort first.
head to display the first few lines of a file: ls -lt | head -n 10
tail to display the last few lines
tee displays stdout and writes it to a file: cat biglogfile | grep mail | tee mymail.log

Mostly these get used in:

Pipelines

These are where the stdout of the first command becomes stdin of the second:

ls -l /usr/bin | less

ls -l /usr/bin | grep -i virtual (-i is for case-insensitive matching; Shott's example is grep zip).

ls /bin /usr/bin | sort | less

man bash | wc -l

ls -l /usr/bin | wc -l (compare ls | wc -l)

A pipeline like cmd1 | cmd2 is a bit like

cmd1 > temp

cmd2 < temp

except that no temporary file is created.

fork() and exec()

Here's how Unix implements redirection. It has tremendously influenced the syscall interface for new-process creation. Windows, for example, has CreateProcess(); Unix has no such thing.

1. The shell reads a command, say foo <file1 >file2. It parses the line, so it knows that stdin is to come from file1 and stdout is to go to file2

2. The shell calls fork(), which creates a new process. But it's stll running the shell! The parent shell now just calls wait(). The new process has its own virtual-memory space, and process-scheduler entry

3. The child shell opens file1 for reading. It gets file descriptor number 3. It also opens file2 for writing. This gets file descriptor number 4.

4. The child shell closes stdin, and then calls dup(3), which duplicates file descriptor 3 to the lowest available file descriptor, which at this point is 0, or stdin. So now stdin will come from file1. Newer Unix systems call dup(3,0), which makes it clearer that FD 3 is being duplicated to FD 0.

5. Similarly, the child shell closes stdout and calls dup(4) (or dup(4,1)). The stdout file descriptor, 1, now refers to file2.

6. File descriptors 3 and 4 are closed; they are no longer needed.

7. Finally, the child shell calls exec("foo") (or perhaps looks up the full path name for command foo in PATH). Exec uses the same process virtual memory, but maps in a different executable file to run. But the file descriptors are unchanged. So when foo starts to run, its stdin has already been connected to file1 and its stdout to file2.

The fork/exec model is used exclusively in Unix systems to start new processes. It contains a significant inefficiency: when we fork the shell, we create a copy of the entire shell program in the new process, only to use almost none of it before we call exec soon after, wiping out the newly made copy. This is indeed an issue, but one we just live with. Most of the overhead can be eliminated by using virtual memory tricks.

The way that virtual memory works is that physical memory is divided into 4K-sized chunks. These are then mapped in hardware to virtual addresses. The executable code deals only with virtual addresses. Translation from virtual to physical addresses is done on every memory fetch and memory write.

The executable /bin/bash is mapped into memory, and we just copy this virtual-memory map. The executable code doesn't get written to, so this is safe. As for the data segment f /bin/bash (and its stack segment), those get copied as virtual--memory blocks too, but with the copy-on-write flag set. So if the child shell modifies a block of data virtual memory, that memory block is reallocated and copied. Now the child shell has its own copy, and can modify it freely.