Comp 141 Class 2 notes

May 30

Review the basics: ls, cat, cd, pwd. Also: mv (move), cp (copy) rm (remove) and mkdir/rmdir

rm is notoriously unforgiving, but at least it doesn't remove directories. . rm -r recursively removes directories and files. rm -r / would remove everything, and you would need to rebuild your system.

Fun games

  1. create a directory. cd to it with one terminal. Now, from another terminal, rename that directory.
  2. create a directory. cd to it with one terminal. Now, from another terminal, remove that directory. What does pwd say? What about /bin/pwd?
  3. create a file. Open it in an editor. Now, from another terminal, delete the file.

VirtualBox now works again for me!

Last week: starting a VM failed. I got a message to run /sbin/vboxconfig, which builds a new kernel driver. But that failed! It turns out vboxconfig tries to compile a new driver, and there was a genuine compilation failure, and this was due to the fact that I upgraded from Ubuntu 20.04 to 22.04. I just had to get the latest VirtualBox from Oracle, and then run vboxconfig again.

The Unix filesystem

There are no system-level extensions. For clarity, it is common to use extensions like .text, .c, .py, .odt [Open Document Foundation text, comparable to docx], .jpeg, .svg, .ogg. (What are all these?)

Filename extensions were a convention in 1961 under CTSS, and were official in DEC's PDP-6 operating system in 1964. MS-DOS inherited them from DEC, through CPM, where a filename had an 8-character part and a 3-character extension.

Unix has always regarded extensions, if any are used, as just part of the file's name. Windows has also adopted this convention.

Nominally, the file system is a tree. Each directory is a list of files and subdirectories. The root directory is /.

Most Unix filesystems support any characters in filenames except for "/" itself; most also support filenames of "arbitrary" length (often there is a limit of 512 bytes). This can be really frustrating if a filename contains the "clear screen" byte sequence for your terminal. The ls -b or ls --escape options can be useful.

Because this could be incomprehensible to users, the ls default behavior adds quote marks as appropriate and converts escape characters to '?'. You can disable this:

    ls -N --show-control-chars

You can create files with bad chars with, for example, touch $(echo -e '\033[2J.text')

We'll unpack later what the $( ... ) does, but \033[2J is the sequence to clear most terminal screens. (The \033, or ascii 27, is the escape character.)

We'll go with this for a while, but note that the filesystem is not exactly a tree. A given file can appear in two different places, through the use of hard links. This is done with the . and .. entries created in every directory. The . entry is a link to the directory itself. You can see this with ls -i; the i shows the "link number" (technically called an inode). To see this for directories, it's easiest to use -d as well.

The NTFS filesystem also supports hard links. NTFS hard links disallow two files in the same directory to be hard links of one another. Hard links can be created with the mklink /h command, but Windows users almost never do this. I am not sure . and .. in Windows are actual hard links, as opposed to a cmd convention.

Demo, using ln.

There are also soft links, or symbolic links, or symlinks. These are just files that say "psst! the real file is over there!". You create them with ln -s. Note that the link file is the second file named: ln -s realfile.text link.text. Getting this backwards, when directories are involved, can be a mess. For ordinary files, ln just tells you that the second file already exists.

Soft links are super useful for having, say, a large file one place, and needing it other places but not wanting to make multiple copies.

If you call open() on a symlink, the file pointed to is opened; you don't have to "recognize" the symlink at all. But there are system calls that will tell you this.

While we're on the subject, a filesystem is a disk partition with a complete filesystem on it. You can have multiples. The root filesystem starts at /; other filesystems start at some subdirectory. Use mount to see them (and also some other, weird stuff).

Basic commands

ls, ls -l, ls -a

cat

date

Commands: options and arguments

The options are "flags", like -l and -a to ls. The arguments are things that come after:

    ls -l -a file1 file2 file3

    gcc -time -shared file1.c file2.c

There are waay too many single-letter options, so it has become popular to provide "long form" options as alternatives. These are much more readable. Single-letter options usually begin with a single '-', while long-form options usually begin with '--'. For example:

    ls -a     vs     ls --all
    ls -d     vs     ls --directory
    ls -F     vs     ls --classify    (this one is kind of neat)

Some more commands

Keeping up with new commands is hard. For years I continued to use ifconfig to get network interface information, but then learned about the ip command. The latter is much better; in fact, the former has proven to be very difficult to update.

file: this tries to identify a files's type. Is it text? Let's look at some examples (.text, .c, .html, .py, .png, .jpeg, .docx)

How does file tell? If you take foo.c, and delete the '{' character, what happens? What about '}'

At what point does foo.html stop being html?

Zip files begin with the bytes (in hex) 504b 0304 (0x50 is 'P', and 0x4b is 'K'). But so do docx files. That, of course, is because .docx files are zip files (Yes! You can unzip them!). But file can still tell! (we can peek at bytes with xxd, combined with head -n 1).

less (or more if you are old-school): this displays a text file one screen at a time. The Space character advances a screenful, while Return advances one line. You can also go backwards. Use 'q' to quit. Be aware that lots of unrelated things (like the man command) invoke less.

reset fixes your terminal, in case it gets messed up by looking at binary output.

Selected newer commands

df

free    (who uses this? I never have)

Top-level directories

/bin

/boot

/dev

/etc

/home

/lib (and /lib32, /lib64)

lost+found

/media

/opt

/proc

/root

/sbin

/tmp

/usr, and /usr/bin

/var

Files and Directories, and wildcards

How about ls *.text? The '*' matches any chars (including zero). The '?' matches exactly one. These are useful for looking for files when you know part of the name, or manipulating files when the name has spaces in it.

There is also, say, [a-m], which matches one character in the range a-m. [^a-m] matches any character other than a-m. You can also use [abcd] instead of [a-d], or [aeiou], etc

And then there are some prebuilt character classes:

So we can type ls -d [[:alnum:]]* to get all files and directories that begin with an alphanumeric. What if we use single [ ] instead of [[  ]]?

More commands (from chapter 5)

type: something about the command

which: where is the command

man ls    manual page (or help for shell built-in commands)

apropos    hintwords    Try it. Try "apropos remove"

Commands can be: