Information on three common databases: Postgres, MySQL and Oracle. This file contains DB-specific installation and configuration information.

PostgreSQL

In the 1970's, the Ingres database project began at Berkeley. Ingres originally stood for Interactive graphics retrieval system, though shortly thereafter the "graphics" part became much less important.

After Ingres (which does still exist) came Postgres, for "post-ingres".

Later, to emphasize its support for SQL, Postgres became PostgreSQL.

As it is virtually impossible to pronounce PostgreSQL using the "sequel" form for SQL; most people pronounce it "postgres".

A decade ago, MySQL was the "fast one" and Postgres was the one that dutifully implemented the SQL standard. Since 2010, MySQL has become a little more "mainstream" (though it is still missing the intersection operator), but Postgres has become a lot faster. Postgres is also much more similar to Oracle (which is a little ironic because Oracle now owns MySQL, though the usual assumption is that Oracle doesn't want MySQL to compete with the family jewels).

Postgres can be found at postgresql.org.

Once installed, the postgres server runs in the background. The psql command on the command line sets up a connection to the server, usually via network port 5432. If you have multiple command lines running multiple instances of psql, they are all connecting to the same master instance of the server. It is possible to set up multiple independent servers on the same machine, each using a different network port for connection and each maintaining a different set of data files. There is seldom a need for this, however.

After you install the postgres server software, as with pretty much any database, you need to create a user, and a database for that user. The postgres account is the only account installed by default. 

Here's what I had to do on linux (similar to macs; Windows is below). You will need the -p 5433 option, if your PostgreSQL installation uses that alternative port.

1. From the shell:

    bash> createdb -U postgres pld

You will need to enter the postgres password.

2. Run the psql command for user postgres. The "bash>" prompt below is the shell prompt (as user postgres); the "post=> " prompt is the psql SQL prompt.

bash> psql -U postgres
postgres=# create user pld with password 'zanzibar';    

The system response here should be CREATE ROLE. If you don't get that, something went wrong (make sure you typed the final semicolon, though).

You now need to give permissions to your new user:

postgres=# grant all privileges on database pld to pld;
postgres=# grant all on schema public to pld;
postgres=# alter database pld owner to pld;


Then, as user pld, the command "psql" works. If you created several databases above, eg "pldoffice", then the command to connect to postgres using the pldoffice database would be psql pldoffice. What Postgres is doing here is noticing that my linux username is "pld", and assumes (unless I request otherwise) that I will be Postgres user "pld". Postgres also assumes that system user "pld" is authorized to use the Postgres account with the same name; this is called peer authentication.

Note that I did not actually give myself the right to create new databases (though I don't need it).

When I first installed Postgres on linux, it set itself up to accept connections on port 5435, instead of the standard 5432. This was insufferably annoying as I had to specify -p 5435 in every command. I eventually discovered that I could change the port by setting port = 5432 in the postgresql.conf file, in (under linux) /etc/postgresql/9.5/main. The port number is more of an issue, though, when installing under MacOS.


Installing on Windows

Go to postgresql.org, and click the Download button, and then the Windows button. Pick the release you want (16, the most current as of 2024, should be the best choice.) Click the Download link.

Select "Download the ... installer". You can use either the installer from EnterpriseDB or the "graphical" installer from BigSQL. I used the former here. Be sure to choose 32-bit or 64-bit to match your system (you can check your system with Control Panel → System; it should be 64 bit).

After downloading (~162 MB in 2018), run the installer. It starts with "Welcome to the PostgreSQL Setup Wizard". At some point, you may see an alert telling you it is installing "MS Visual C++ Redistributable".

Eventually you get to the PostgreSQL installer itself. The default locations for everything should be fine. You will be asked

    Please provide a password for the database superuser (postgres).

Don't forget the password you provide here! Though for classroom use it doesn't have to be very secure.

You will also be asked a couple other questions; the default answers should be fine (though if the default network port is not 5432, that probably means you have a Mac, and the default will probably then be 5433). The installation begins when you click "next" on the screen that tells you Postgres is ready to install.

You can skip the offer to launch Stack Builder, though it's fine if you want to try it. It won't be used in Comp 305/405 though.

Postgres "peer" authentication means that if you have logged into your computer with username, say, pld, and you want to get into your PostgreSQL account with the same name, postgres just lets you, on the theory that you've already provided your password to get logged in. Windows sometimes has trouble with "peer" authentication, as in the Linux discussion above, because system usernames under Windows tend to be unwieldy. In particular they often have spaces, which do not work well as Postgres usernames. If your system username and postgres username do not match, you cannot use peer authentication. Just go ahead and use password authentication. The -h localhost option to the psql usually forces password authentication. If the port is not 5432, you will also need to add -p 543x to every psql and createdb command, where 543x is the port used by your installation.

As of 2018, it looks like the Windows installation has abandoned "peer" authentication, which is a good thing. (At least there were no "peer" authentication methods listed in the authentication-configuration file pg_hba.conf.)


Installing on a Mac

Go to postgresql.org, and click the Download button, and then the macOS button. Pick the release you want (16, the most current as of 2024, should be the best choice.) Click the Download link.

I recommend the first option there, "interactive installer by EDB". I downloaded the file postgresql-16.1-1-osx.dmg.

You can also use the Postgres.app, which is convenient but which does not run "continuously". You'll have to verify that the app has started Postgres whenever you use it (normally this is not at all difficult).

I then went to the file in a Finder window, and double-clicked on it. This shows postgresql-16.1-1-osx, which I again double-clicked. This starts the actual installation, by default under /Library/PostgreSQL/16.1. (If it's not installing in the /Library directory, then something is amiss!)

You are asked to provide a database superuser password; that is, the password to the "postgres" account. Don't forget the password you supplied! For class use it does not have to be especially secure.

Check to be sure that "PostgreSQL Server" and "Command Line Tools" are selected for installation. The other tools are optional.

The README file accompanying the installation suggests increasing the shared-memory allotment. You should definitely do that if you're running a production service, but I did not.

Note that Apple may already have a Postgres installation on your mac, as part of the mac system. You can verify this by entering the following command in a terminal window and seeing if there's any significant output: ps axuww | grep postgres. It's fine to have two separate, unrelated installations, but if this is the case then your new installation will definitely have a new port number other than 5432 (it will probably be 5433). You need to use this new port in every command!



Postgres Accounts

Now that Postgres is installed, you will need to set up the basic account. How you proceed here is pretty much the same as far as Postgres is concerned on windows, macs and linux machines. Alas, it is not the same as far as Windows/Macs are concerned.

You need to create your own non-superuser (non-postgres) account, and then create a database. It's most convenient to have the database account name be the same name as your computer login name, though I'm never sure about spaces, and on Windows this is sometimes pointless. To find out your windows login name, run psql with no options and it will tell you, after prompting for a password (even if it won't actually "connect").

On the Loyola computer I tested this on in 2020, I was "pdordal", my Loyola ID. I also tried creating a username/databasename of "pld"; the only difference was that with Windows username "pdordal" and Postgres username "pld" I had to use the -U pld option to psql.

To create the non-superuser account, open a command window, and try typing "psql --username=postgres -p 5433".  If it tells you the command is not found, that means the psql command is not in your PATH. You can either fix this now, as below, or else use one of these things:

Whichever you do, here are the steps. The option "--username=postgres" (or "-U postgres", without the equals sign) means you will be running the command as the PostgreSQL superuser, and will need to supply the initial password above.

If you got a network port number other than 5432 during the installation,

    (a) You probably have a Mac
    (b) You will need to use that new port number in every psql and createdb command:

        psql --username=postgres -p 5433

What is going on is that you have two entirely independent installations of Postgres, with separate data. One is reachable via port 5432 and one via 5433. They don't talk to one another; they don't even know about one another.

In the example below the account name and database name will be "mydb". You can use whatever name you want, and should probably use some form of your username, first name or initials. On Windows I've tried both "pdordal" and "pld". On Linux and Macs, it is popular to have your Postgres account name match your login name.

The first step is actually to run createdb, rather than psql. Again using the full path name, as above, or going to the bin directory, or setting up PATH. You will also need to add -p 5433 if your port is 5433; I will show this below, but you can leave it out if it's 5432. The createdb command is typed at the Windows command shell, not within Postgres:

1. createdb --username=postgres -p 5433 mydb    (Supply the postgres password. This might take several seconds.)

This creates a database for userid "mydb". On Windows in 2024, typing "createdb" in the Windows search box failed; I had to open up a cmd window and navigate to \Program Files\PostgreSQL\16\bin.

Next, create the matching account within PostgreSQL, using the postgres administrative account. (Here, on Windows, you can just type "psql" into the Windows search box; you'll be prompted for the username and port, and maybe host (use localhost)).

2. psql --username=postgres -p 5433      (supply the postgres password)

postgres=# create user mydb with password 'mypassword';    (choose your own password for mypassword; the response should be CREATE ROLE)
postgres=# grant all privileges on database mydb to mydb;
postgres=# grant all on schema public to mydb;
postgres=# alter database mydb owner to mydb;
  

Finally, you can log in as you:

3. psql --username=mydb -p 5433        ;; the "--username mydb" isn't necessary if mydb = your windows account name; supply mypassword above

On Windows, if you're starting psql from the Windows search box, enter your user name at the appropriate prompt.

Here are a few quick commands:

psql=> \d    ;; \d gives  list of tables
psql=> create table demo (
        name varchar(200),
        idnum integer primary key
    );

psql=> insert into demo values('alice', 37);
psql=> insert into demo values('bob',   61);
psql=> select * from demo;


Setting your PATH

On either Windows or Mac machines, you may want to be able to type the command "psql" in a cmd/bash window and have the Postgres client start running. The main benefit of this is that you can type "psql" in a specific directory, and load files from that directory into Postgres. The alternative is probably to use the copy-pasted method to paste the contents of the file into the psql window.

To do this, you have to modify your PATH, which is the list of directories searched when you type a command like "psql". The directories are separated internally  by a ';' character on windows, and a ':' character on mac and linux systems. On windows, the postgres directory is probably C:\Program Files\PostgreSQL\16\bin.

To change your PATH on Windows 11, go to

   System (control panel) → Change Settings (lower right) → Advanced (tab) → Environment Variables →
   → System Variables → select PATH → Edit

When you click Edit you get an editor view that shows the PATH as a list of directories with each in a separate text box. You should add an entry like this:

    c:\Program Files\PostgreSQL\16\bin

The space in "Program Files" should not matter.

On the Mac, the application directory is /Library/PostgreSQL/16/bin (where 16 is again the version number). To install this in your path, edit the file .zshrc in your home directory (this is executed whenever you log in). You can open it from within the TextEdit app. Add this line at the end:

    PATH=$PATH:/Library/PostgreSQL/16/bin

Then start a new bash window (or maybe it is zsh now), after the updated .profile is saved; the commands should work.

On Windows you can also create a small batch file containing the command

    set PATH="\Program Files\PostgreSQL\16\bin";%PATH%

You then run this before typing any postgres commands.


Login Troubles

If you are having trouble logging in, here are some possibilities.

1. Sometimes it helps to force a network login, rather than a "local" login. You can do this with

    psql --username=postgres --host=localhost -p 5433

The --password flag might also help; this tells Postgres that you are expecting to be prompted for a password.

This is especially likely to help with problems accessing the "postgres" account, but it sometimes helps with ordinary accounts as well.

2. For Mac and Linux users, one way to get access to the PostgreSQL postgres account is to first get access to the postgres system account. Do this with: sudo -u postgres bash, followed by "psql" (or sudo -u postgres psql). This doesn't work as well under Windows, which has a somewhat peculiar approach to system accounts.

3. Expired passwords: sometimes this does happen. If you can't log in to a user account, check this. Try looking at the output of '\du', while logged in as user postgres. If you see the account, check if the attribute "cannot login" is listed. To fix, execute this as user "postgres":

    ALTER USER username VALID UNTIL 'infinity';

4. Problems with the file pg_hba.conf. This is the postgres login configuration file. It should be in your postgres directory, in subdirectory 16/main or 16/data. It can be used to disable some forms of authentication authentication, if necessary.

Here is a set of traditional entries. The "peer" entries have been deprecated in Windows since 2018 (I mean, so has md5, but it's simply been replaced with sha-something.)

local   all             postgres                           peer
local   all             all                                     peer
host    all             all             127.0.0.1/32    md5
host    all             all             ::1/128             md5

These lines, which represent how a user logs in, are applied in order until a match is found. The second column, "all" above, means what database. The first column, "local" or "host" above, indicates whether the access attempt is a "local" psql command or a network login. If we just type "psql" we get the former, but we can force the latter with "psql --host=localhost", as in item 1 above.

The final column, "peer" or "md5", indicates what kind of credentials must be presented. "peer" means that the operating-system username is to be trusted, as in item 2 above, and no password needs to be supplied. "md5" means that a password must be entered, and then is sent in hashed (md5) form to the postgres server for validation. (Postgres 10.1 is moving away from md5, to scram-sha-256, but my Version-10 installation still shows "md5".)

The third column means which account this applies to. In the table above, this means that user "postgres" must log in with "peer" credentials. That means PostgreSQL user "postgres" can only log in from system account "postgres", as in item 2 above. Password authentication is not allowed! We can bypass this, though, with the --host=localhost trick of item 1, which makes the login a "host" login, subject to the rules of line 3 or 4 above.

The last two entries above, for "host" login with "md5" authentication, also contain IP addresses. These are the IPv4 and IPv6 versions of "localhost", respectively. Logins from outside of your own machine will not be allowed unless you change these. Note that on Windows the IPv6 form is preferred internally for --host=localhost logins.

If you do change pg_hba.conf, you will need to restart PostgreSQL for the change to take effect.

Here are some problems and possible fixes:

(a) You can't log in to the postgres account with a password. Fix: use the method of item 1 above, or item 2 above, or else change "peer" to "md5" in the first line above of pg_hba.conf. (Change it to "trust" if you cannot remember the password at all). Restart PostgreSQL.

(b). Your ordinary-user password isn't working, even though you just reset the password by executing the following as user postgres, and you got back the response ALTER ROLE:
    alter user myname password 'achiote' ;
Try the method of item 1 above. If that fails, check to see if the "host" authentication method in pg_hba.conf is md5. If absolutely necessary, change "md5" to "trust" to disable password checking entirely. But if a "host" entry has "peer" authentication, change "peer" to "md5".



Command Line

You can run postgres queries from the command line using the following syntax:

    psql -U username -d databasename -c 'select * from employee'

You may have to add -W (or --password) to get psql to prompt for a password, if applicable (this is usually not necessary if your postgres username matches your operating-system username). On unix-like systems (linux and macs), the command can be either in single quotes, as above, or double quotes. The latter form allows shell variable expansion:

for table in employee works_on project department
do
    echo $table
    psql -U pld  -c "select count(*) from $table"
done

You can also run queries from within a program.


 
Postgres string comparisons are case-sensitive. If, in the Company database, you run the query

    select fname, lname from employee where lname = 'wong';

you will get nothing (printed as "(0 rows)"). The correct query is

    select fname, lname from employee where lname = 'Wong';

One way to resolve this is with

    select fname, lname from employee where lower(lname) = 'wong';

However, applying lower() to a primary-key attribute (or any indexed attribute) will mean that the attribute's index will likely fail to be used. Another alternative is to use ilike instead of =:

    select fname, lname from employee where lname ilike 'wong';

This may also in some cases mean the index goes unused. But don't worry about those cases for now. In the employee table, the index is on the numeric ssn field; there is no index on the lname field.



Some postgres commands

Your psql prompt should show you the current database.

SHOW ALL -- shows status variables only

\d                   same as MySQL show tables;

\d employee    similar to as MySQL describe employee; gives all indexes and FK constraints with names

\d [NAME]      describe table, index, sequence, or view
\d{t|i|s|v|S} [PATTERN] (add "+" for more detail)
                 list tables/indexes/sequences/views/system tables
\da [PATTERN]  list aggregate functions
\db [PATTERN]  list tablespaces (add "+" for more detail)
\dc [PATTERN]  list conversions
\dC            list casts
\dd [PATTERN]  show comment for object
\dD [PATTERN]  list domains
\df [PATTERN]  list functions (add "+" for more detail)
\dg [PATTERN]  list groups
\dn [PATTERN]  list schemas (add "+" for more detail); \dnS includes system schemas
\do [NAME]     list operators
\dl            list large objects, same as \lo_list
\dp [PATTERN]  list table, view, and sequence access privileges
\dt [PATTERN]    list tables in a schema; eg \dt pg_catalog.*
\dT [PATTERN]  list data types (add "+" for more detail)
\du [PATTERN]  list users
\l             list all databases (add "+" for more detail)
\z [PATTERN]   list table, view, and sequence access privileges (same as \dp)

If you start psql with the -E option, postgres will tell you the underlying query it uses for each of the above commands.

Here's another way to list your tables:

select table_name from information_schema.tables where table_schema='public' and table_type='BASE TABLE';

To connect to a given database:
    \connect pld

To enable query-time printing:
    \timing

To insert a csv file (comma-separated-values, though any delimiter may be used; the delimiter here is tab, \t) into a table:

copy big_project from '/tmp/bigproject.csv' delimiter E'\t' csv  null '\N';
copy big_employee from '/tmp/bigemployee.csv' delimiter E'\t' csv  null '\N';

To display the server version:
    select version();

To turn off that annoying paging:
    \pset pager off

To find indexes on table EMPLOYEE (though \d is probably all you will ever need)

prepare listindexes(text) as
select
    t.relname as table_name,
    i.relname as index_name,
    a.attname as column_name
from
    pg_class t,
    pg_class i,
    pg_index ix,
    pg_attribute a
where
    t.oid = ix.indrelid
    and i.oid = ix.indexrelid
    and a.attrelid = t.oid
    and a.attnum = ANY(ix.indkey)
    and t.relkind = 'r'
    and t.relname = $1
order by
    t.relname,
    i.relname;

Now run it with
    execute listindexes('employee')

This is an example of a stored procedure, or stored query.

You can load a file of sql commands (eg employee.alter.text) using the \i (include) directive:

\i company.alter.text

To do this, you must have started the psql client command while in the directory containing the file. Otherwise you have to give the full path. You can also open the file in an editor, copy it, and paste it in to a command window running psql, though for even moderately large files (eg dellstore.sql) that does not always work smoothly.

Intersect example

This works, under postgres. Here is the example (the select attributes must match in the two queries being intersected):

(select e.fname, e.lname from employee e, works_on w where e.ssn = w.essn and w.pno = 2)
intersect
(select e.fname, e.lname from employee e, works_on w where e.ssn = w.essn and w.pno = 3);


Postgres file storage

Your postgres files are in a designated directory (at least if you are using the standard "tablespace"). On my linux installation it is /var/lib/postgresql/9.5/main/base/. There is a directory for each database, and, within these directories, a file for each table. Databases and tables are usually named using their so-called "oid" numbers. My "pld" database on my laptop (2015) has oid 16385. I can find this with the oid2name command, or with the query

    select oid, datname from pg_database;

To find the oid of a table, say 'employee', you can use

    select pg_relation_filepath('employee');            -- note quotation marks!

or

select tableoid from employee limit 1;

The first returns

    base/16385/16386

The second returns 16386. For my password database on ulam.cs.luc.edu (a different installation from above!), the first is  base/16385/16387. The file size itself is just under half a gigabyte.

Very large tables are split over multiple files (by default 1 GB per file). Tables with very wide columns (blobs and long strings, for example) have an associated "TOAST" table; the primary table file then in effect has pointers to another location where the large field is stored. TOAST stands for The Oversized-Attribute Storage Technique.

Record layout

Postgres "tuples" are immutable. When a record needs to be updated, the old tuple is left in place, but marked as not current. The new tuple is created in a new location. This design decision does sometimes create some write bottlenecks, but we will address that later.

Each tuple has a system attribute called ctid, which is a pair of numbers that, informally, represents (block number, record number). Blocks are 8KB. What is Mr Wong's ctid?

select ctid, lname, ssn, salary from employee;

Now let's update Mr Wong's salary:

update employee set salary = salary+10 where lname = 'Wong';

What is Wong's ctid now? What happens if we repeat this a few times?

Actually, each block begins with an array of record pointers; the ctid record number points to one of these record pointers. That pointer then points to the actual block; this way, records within a block can be compacted without changing any ctids.

Postgres tuples also have xmin and xmax system attributes, reflecting the minimum transaction id that can view a record and (for deletions) the maximum. These play a role in concurrency control; we will come back to them later. Every transaction is numbered sequentially; the current transaction id is available via

select txid_current();

As soon as there are no remaining transactions in progress with txid <= xmax, that particular tuple can be deleted completely. This is done by the autovacuum process, which is sort of the garbage collector for Postgres.

Other system columns include cmin and cmax.

More at rachbelaid.com/introduction-to-postgres-physical-storage/interdb.jp/pg/, and, of course, postgresql.org/docs/9.6/static/storage.html.


Postgres schemas

As a general database term, a "schema" is a set of tables together with table definitions. The University and Company databases would be good examples of separate schemas, except I did not set them up that way. Postgres allows schemas -- as a kind of namespace -- to be first-class objects within the database.

If, as user postgres, I enter \l (list databases), I get these:

The first three were created by me. The last three have no top-level tables! If I do the following,

    \c postgres
    \d

I get

    No relations found

Where are the tables for the postgres database? They are all within non-default schemas. These schemas can be listed with

    \dnS

(the S asks to include even the system schemas.) Here is a partial list:

Databases are, in effect, divided into schemas. Every database has a schema named public (the last one above) which is (initially at least) the default schema for that database. Every database also has a schema pg_catalog; this particular schema is the same across all databases. Every database also has a schema information_schema as part of the SQL standard; this schema contains a more "standardized" (that is, non-Postgres-specific) version of the pg_catalog information.

Schemas can be thought of as directories within a database, where tables are actually stored, although you cannot have nested schemas. A table foo in schema bar can be thought of as having fully-qualified name bar.foo.

Schemas have owners. The "postgres" role owns the public schema, and all the system schemas. The owner of a table in a schema is not necessarily the same as the owner of the schema.

We can examine the tables in a schema with, eg

   \dt pg_catalog.*

We can the look in detail at the pg_database table with

    \d pg_catalog.pg_database
    \d pg_catalog.pg_index

We can read the table with

    select * from pg_catalog.pg_database;

Tables are described in the information_schema schema; for example we can run this:

select table_name from information_schema.tables where table_schema='public' and table_type='BASE TABLE';

But there is no table 'tables' in information_schema! (We can check this with \dt information_schema.*). It turns out that tables is a view. We can list all views with

    \dv information_schema.*

Or

\connect pld
select table_schema, table_name, table_type from information_schema.tables
where table_catalog='pld';

Why is this a view? Most of these views are of Postgres-specific tables (and Postgres-version-specific tables) in the pg_catalog schema; the information_schema views are more generic. The underlying tables are subject to change, but the view is not.

The information_schema schema is documented in http://www.postgresql.org/docs/current/static/information-schema.html.

Recall that if you run psql with the -E option, you get to see the underlying queries generated. This would be a good time to give that a try.

We can also run the following:

select table_name from information_schema.tables
where table_catalog='pld'
and table_schema='public'
and table_type = 'BASE TABLE';

This is like the previous query, but with more where clauses. It also has similarities to the listindexes example of the previous section.

The public schema is the default. We can run the following to find all tables in this schema:

SELECT * FROM information_schema.tables WHERE table_schema = 'public'

Alternatively, we can use \dt public.* However, this will only list tables in the current database.

To list all the tables in the current database, we can use \dt *.*

Schemas and organization

Here are a few options for organizing a database with schemas.

1. Use only the public schema. There may be multiple users and multiple separate table ownerships.

2. Each user can be given his or her own schema, corresponding to their username. Each user's tables are potentially visible to other users (if permissions allow it), but this visibility is not "obvious" (the other schema tables won't show in a \dt, for example). Users can choose to restrict access by other users, or not. (If each user gets his or her own database, then no sharing between users is possible at all.)

With this strategy, it might be appropriate to deny everyone access to the public schema, to force them to use their individual schemas.

3. Create one schema for each company "grouping", eg for each store or each project. Then most queries within one store's schema will refer only to that store, but it is possible to write queries that collect data from different stores. If we create a separate database for each store, on the other hand, we can not write queries that involve data from different stores.

Schema search path

To avoid having to type "fully qualified" schema.table names, one can define a schema search path.

show search_path;

If we create a new schema (in database "pld")

create schema ignatius;
create schema ignatius authorization pld;    -- done as postgres, if pld lacks rights to create schema

and add two tables:

create table ignatius.foo (...);
create table ignatius.foo2 (...);

if I type \dt I don't get these. I can type \dt ignatius.*, of course. But I can also do this:

set search_path to ignatius, public;

Now the tables in the ignatius schema show up. If I reverse the search_path order, all that does is reverses the list order. But if I remove ignatius from the search path, I have to access the tables as ignatius.foo, etc.

Newly created tables go in the first schema on the path, if not fully qualified.


Postgres arrays

We can have arrays in postgres data fields; see http://www.postgresql.org/docs/9.5/static/arrays.html. This is often done as denormalization, though it definitely breaks the 1NF rule.

create table courses (name varchar, courses integer[]);

insert into courses values ('Dordal', '{ 305, 388, 417, 443 }');  -- note quote marks!
insert into courses values('Harrington', array[150, 170, 310]);

We can then access the array components using []:

select courses[2] from courses where name = 'Dordal';
update courses set courses[2]=488 where name='Dordal';

Searching is primitive: we can do this

select name from courses where 305 = any(courses);

but not this:

select name from courses where any(courses) = 305;

The general rule is that if you are spending a lot of time searching arrays, you probably should switch to the 1NF normalization; that is, a separate table of ⟨instructor,course⟩ indexed by both attributes.

Two-dimensional arrays are also possible.


Postgres 9.5 new-features summary: 

MySQL

Some notes on installing mysql

These assume that you will use a command-line interface.

Basically, you need to create a user, and then a database for that user. A database is a collection of tables.

Step 1: do the actual installation (eg apt-get install mysql-server, or run the .exe install file). During this step, you should be asked for a master password. Do not forget this password!

Downloads for the MySQL Community Server are at http://www.mysql.com/downloads/mysql. At a minimum, you will need MySQL Server.

The MySQL server runs in the background. The mysql command on the command line sets up a connection to the server. If you have multiple command lines running multiple instances of mysql, they are all connecting to the same master instance of the server.

Step 2. log in using the master password.

mysql --user root -p
(give master password)

Step 3. Now new users and databases can be created, eg pld. (note password is in quotes)

mysql> create user pld identified by 'cranberry';
(or: create user pld @ localhost identified by 'cranberry';
mysql> create database plddb;  
mysql> grant all on plddb.* to pld;


We have created a user pld, and a database plddb, and given user pld full control over plddb, and only over plddb. If you wanted to allow pld to create databases, you could use

mysql> grant all on *.* to 'pld';

You can also grant individual actions:

grant SELECT, INSERT, UPDATE, DELETE, CREATE, DROP ON *.* TO 'pld';

Step 4. Now log in as pld (after logging out as root):

mysql --user=pld --password=cranberry plddb

But let's assume you are user pld, in unix-world, and that you don't wish to type your password on the command line. Then use:

mysql -p plddb

Step 5: Create a table (while logged in as pld). Java/C users take note: the comma is a separator in the field declarations, not a terminator (ie there is no comma after the final declaration, and the field name precedes the type name.

mysql> create table foo (id int, name varchar(100));

mysql> insert into foo values (1, 'peter');
mysql> insert into foo values (2, 'paul');

mysql> show tables;

mysql> select * from foo;

The last is our first example of a SQL select statement.

If you are using a unix-based system, you might find the following helpful in your my.cnf file (eg in /etc/mysql). It must be in the [mysqld] section.

    lower_case_table_names=1

With this setting, the following are equivalent:
    select * from employee;
    select * from EMPLOYEE;

MySQL by default does case-insensitive comparisons. The query

    select fname, lname from employee where lname = 'wong';

returns

    Franklin Wong

Postgres does not do this; the query above would return nothing because 'wong' ≠ 'Wong'.

All privileges are stored in the mysql database (which you can access as root with "use mysql;"). The user table is where much of it resides. Examples:

use mysql;
show tables;
select user from user;
describe user;
select user, password from user;

Why are there so many root accounts? This becomes clearer with

select user, host from user;

The pair (user,host) is the primary key of this table.

You can load a file of sql commands (eg employee.alter.text) using the source command:

source company.alter.text

To do this, though, you must have started the mysql client command while in the directory containing the file. Otherwise you have to give the full path, and may have to "escape" backslash characters.


MySQL configuration

Thhe MySQL configuration file is my.cnf (sometimes my.ini); on unix-based systems it is usually in directory /etc/mysql.

Server features are defined in the [mysqld] section. They include:

The port (normally 3306) and socket used for client communication with the server. This is mostly used by those who want MySQL to be a drop-in replacement for some other DB, and by those who want to run multiple instances of MySQL (or drop-in replacements such as MariaDB) on the same machine (each instance has to use different ports, sockets and data directories).

Network listening: the default configuration is now to listen to port 3306, but only for connections originating from localhost. Connections from outside are simply refused. Eventually one needs to change this, but likely only for connections from specific hosts that are part of your web-based front end.

The datadir: where do the database files go by default? On unix-based systems this is often /var/lib/mysql.

The temporary-files directory

Configuration settings such as lower_case_table_names=1 and innodb_file_per_table.

Security settings such as symbolic-links (should they be allowed?)

Rules for error logging (where, and to what extent, and what constitutes a "slow" query, etc)
It turns out (somewhat surprisingly?) that if you don't define a network-connection port in the configuration file then no network connections are allowed. The same goes for defining a so-called "unix socket" or "named pipe" for communicating with the mysqld server process: if it's not defined in the config file, then it doesn't happen.

A consequence of this is that if one starts up the msyql server "mysqld" with a stripped-down configuration file, then the user connection tool "mysql" will not be able to connect. The server is running, but not talking to anyone.


"Raw" installation

The above process assumes you are using an installer that sets everything up automatically. The actual steps are as follows:

The latter is done with a script mysql_install_db. It creates, among other things, the root account. Some options are possible, but rare.


Where and how are MySQL files stored?

The default file format is set by the DB engine (eg InnoDB) and the default location is set in my.cnf. The DB engine determines the internal format (record length, etc) and the index format; MyISAM, for example, uses ISAM-type index files.

You can change the DB engine on a per-table basis by adding an engine specification to create table:

create table foo (
    ....
) engine = innodb;       

Alternatives include
Originally, MySQL kept all the innodb data in a single file, typically ibdata1. The default in MySQL is now file_per_table. My older MySQL tables all live in ibdata1, but tables created since I upgraded to MySQL 5.6 now get their own file. For example, I rebuilt the University database, and so directory plddb contains files
Conceptually, theses files can now be on different devices. A frequently used file, for example, might be placed on a solid-state drive (SSD), or two files frequently traversed simultaneously (eg with a join) might be put on two different magnetic disks, so the disk heads wouldn't have to move back and forth from one file to the other.

MySQL does not support moving table files, eg with ALTER TABLE. However, tables can be created in any directory as follows:

create table foo (
    ....
) data directory = '/home2/mysql/data';       

Given table foo in /var/lib/mysql, one can do the following to "move" it to /ssd/mysql/data.

Alternatively, one can just move the original foo.ibd to the new location, and create a symlink from the old location to the new. This requires turning on symlink support, however; there are also some warnings in the MySQL manual about such symlink use.


MySQL data dictionary

If I connect to my MySQL system, I see my own database named "pld", and also "mysql", "information_schema", and "performance_schema".

The "mysql" database contains information about users and about the system itself. For example, we can run the query

select user, password from user;

Try adding host to the select clause as well.

The information_schema database includes the TABLE table. For example, we can run this:

select table_name, table_schema, table_rows, engine from tables;
select table_name, table_rows, engine from tables where table_schema = 'pld';



MySQL Password hashes


Here is a snapshot of the mysql password table, generated with

use mysql;    -- must be root
select user, password from user;

Duplicates of the root account are deleted.

+------------------+-------------------------------------------+
| user             | password                                  |
+------------------+-------------------------------------------+
| root             | *1D9941EB89D26755CEFBB3D0A07498ECE4F8BE83 |
| debian-sys-maint | *E1ABF29B1CB5E9B02FB4E164322FF31E8406990A |
| pld              | *F1624510F3A5C73809D0E47DAECAA31C14D04910 |
| bob              | *C0316B761A961A800C37D888D19D23E1790C12DB |
+------------------+-------------------------------------------+

Bob's password is 'cranberry'.

The command openssl dgst -sha1 calculates the 20-byte SHA-1 secure hash of its input. Adding the flag -binary means that the output is in binary rather than hex. According to this site, the MySQL password hash is sha1(sha1(password)). Let us verify this.

echo -n "cranberry" | openssl dgst -sha1 -binary | openssl dgst -sha1
c0316b761a961a800c37d888d19d23e1790c12db

The echo -n cranberry generates as its output the string "cranberry", without a trailing newline. The first openssl command takes the SHA-1 hash of that, and passes it in binary form to the second, which repeats the operation and this time returns the result in hex. It does indeed match what's in the table.

The initial '*' character in the table is a marker indicating that the above "new" password hash is used, versus the pre-4.1 16-byte "old" password hash.

Let's change Bob's password to 'gingerbread'. One approach is to generate the new string externally, via shell commands; 'ucase' is my own script for changing letters to uppercase:

echo -n "gingerbread" | openssl dgst -sha1 -binary | openssl dgst -sha1 | ucase
402F36BCAB785DE4D3EB52B98C22E3DA45162B3F

Alternatively, the password() function generates new password strings internally. To view the function result, we simply include it in a select statement:

select password('gingerbread');

Now we change bob's password, with either of the following:

update user set password = '*402F36BCAB785DE4D3EB52B98C22E3DA45162B3F' where user = 'bob';
update user set password = password('gingerbread') where user = 'bob';
MySQL must now be stopped and restarted, but after that bob should be able to log in with password 'gingerbread', and not with 'cranberry'. To restore:

update user set password = '*C0316B761A961A800C37D888D19D23E1790C12DB' where user = 'bob';

Stopping and restarting is needed because MySQL keeps a copy of this table in memory, and does not expect the table to be written to. In general, changing passwords with the standard SET PASSWORD command is much safer.

set password for 'bob' = password('gingerbread');
set password for 'bob'@'%' = password('gingerbread');



Oracle

Compared to MySQL, Oracle has more proprietary extensions. A few are there more to lock you into Oracle than because they are a good idea. That said, Oracle does do well with big, transactional DBs.

Note that, with Oracle's purchase of Sun Microsystems, Oracle now owns MySQL. Some people think this is why they bought Sun. (Others think it was for Java.)

Oracle tends to support more features than MySQL. Transactions are one, though this is now beginning to appear in MySQL. Another is key constraints, and foreign key constraints in particular. However, consider the following:

Note also that MySQL is dual-licensed: as open-source (which means your additions must also be open-source, if you release them) and as a proprietary product (meaning you can develop and sell proprietary extensions to MySQL, provided you pay the licensing fee).

Installing Oracle 12c (more later)

Some notes on the Oracle Developer Days virtual machine

If you install this, the machine login username/password is oracle/oracle.

Once logged in, create a terminal window, and start sqlplus as follows:
Oracle data dictionary
    sqlplus / as sysdba

From there:

alter user sys identified by "mynewpassword"        # creating a stronger password is recommended
create user pld identified by "mypass"                # Use your own initials or name instead of 'pld'
grant connect to pld;                                        # Give your account the important privileges
grant create table to pld;                                 #
grant alter any table to pld;                             #
grant unlimited tablespace to pld;                      #
At this point you should be able to login as 'pld', from another terminal:

    sqlplus pld

You can create tables, etc. To get information about tables, the following work:

select table_name from user_tables;
select owner, table_name from all_tables;

describe employee;

To get constraints: P = primary key, R = foreign key C = check (eg not null)

select constraint_name,constraint_type from user_constraints where table_name = 'WORKS_ON';
select constraint_name, table_name from user_constraints where constraint_type='R';




Demo of Oracle

To start (on the virtual machine)

    sqlplus / as sysdba
    sqlplus pld

To view tables:

select table_name from user_tables;
select owner, table_name from all_tables;
select * from user_objects where object_type = 'TABLE';
 
A schema is a collection of objects (eg tables) under common ownership. Thus, I could use different owners (and so different schemata) to represent what in MySQL corresponds to different "databases". But in MySQL, different databases don't have to have different owners. (Actually, MySQL tables don't really have "owners"; instead, users are GRANTed rights to the tables.)

One can in Oracle also create multiple tablespaces. These aren't really like MySQL databases either; they are there to assign storage space to particular sets of tables.

Question: what Oracle permissions will allow me to "factor" a table into two tables? In MySQL, if I own the database, I can do whatever I want with the tables in that database.

To load the company tables, I had to make a few changes. First, the default date format in MySQL is, eg, 27-01-2014; the default format in Oracle is 27-jan-14. There's a way to get Oracle to read in dates in a different format, but I didn't bother.

Second, the insert into tablename values command of Oracle takes only a single row; I had to edit this part of the load file.

Also, I named all the constraints. The default Oracle names are things like sys_c0033725; it is not clear if such things are Foreign-Key constraints or something else.

To see the constraints on a table:

Constraints have the following one-letter codes: P = primary key, U = secondary key (unique), R = foreign key C = check (eg not null)

select constraint_name,constraint_type from user_constraints where table_name = 'WORKS_ON';

select constraint_name, table_name from user_constraints where constraint_type='R';

To see the indexes on a table:

select index_name, index_type, table_name from user_indexes;

The index_name values are cryptic, and the index_type values are all 'NORMAL'.

Here are a few queries from before to run under Oracle:

select e.fname, e.lname , w.pno from employee e, works_on w
where e.ssn = w.essn and (w.pno = 2 or w.pno =3);

select e.fname, e.lname, s.fname, s.lname 
from employee e, employee s where e.super_ssn = s.ssn;

(select e.fname, e.lname from employee e, works_on w where e.ssn = w.essn and w.pno = 2)
intersect
(select e.fname, e.lname from employee e, works_on w where e.ssn = w.essn and w.pno = 3);



Oracle administrative accounts

(See Oracle Database Administrator's Guide.)

The two privileged accounts automatically created by Oracle are SYS and SYSTEM. Both have been granted privileges of the "DBA role"; we'll get to roll-based privileges later but this basically means the accounts have all privileges.

The SYS account is the actual owner for the data dictionary (below). (More precisely, the data dictionary lives in the SYS "schema".) Any user-initiated changes to these tables can result in DB corruption; the SYSTEM account is sometimes considered safer for this reason. But the SYSTEM account can still do plenty of damage.

There are two other privilege roles that offer powers not present in the DBA role: SYSDBA and SYSOPER. The SYSOPER privilege does not grant the right to view the data directly (eg with SELECT), though it does grant the right to back up the data, and the data can be read from those backups.

After (or at the same time as) an Oracle user connects with "standard" account privileges using the sqlplus command, he or she can elevate privileges with the CONNECT command: connect username as {SYSDBA|SYSOPER}. The username can be replaced with "/" to indicate that the underlying operating system is to determine privileges. Example:
    sqlplus / as sysdba

The operating-system groups OSDBA and OSOPER exist to allow privileges to be delegated by the OS: if you are in one of these groups and you request the corresponding SYSDBA or SYSOPER privilege, you will not have to authenticate with your Oracle password.


Oracle Data Dictionary

Earlier, we tried

select table_name from user_tables;

The user_tables structure is part of the Oracle data dictionary. Here are some additional pieces in Oracle 11g:

The user_catalog (or just cat) is a table of the current user's tables; additional information about these (mostly storage-related) is found in user_tables. The user_views table contains information about views (below); views are a sort of pseudo-table.

select * from cat;

An alias for cat is tab. The "official" way to get all tables is

select table_name from user_tables;
select object_name from user_objects where object_type = 'TABLE';

You can use describe employee to get information about the employee table. What is different from the MySQL version?

Earlier we looked at some information from user_constraints. I had: used this information to find the existing foreign-key constraints and re-create them with sensible names:

select table_name, constraint_name from user_constraints where constraint_type = 'R';
select table_name, constraint_name, constraint_type from user_constraints;

If we run the following query:

select index_name, table_name from user_indexes;

in Oracle, we find that the tables project and department have two indexes. Why?

User_objects contains tables and indexes, and also (if defined) views, functions, procedures, synonyms, triggers.

These user_... "tables" are actually views, as discussed below. They list the things "owned" by the current user, or, more precisely, belonging to the current user's "schema".

If a user replaces the prefix user_ with all_, the tables (views) show all objects accessible by the current user. If one has sufficient privileges there is also a prefix dba_, which includes everything.

The data dictionary is the repository for all metadata. It is normally understood to be read-only. Some privileged accounts can update the data dictionary directly; this is often unwise.

A number of "scripts" are available for accessing information in the data dictionary. There are also some built-in commands, such as describe tablename. In MySQL, most metadata is accessed not from tables/views but via commands, eg show tables, show create table tablename, etc.

The table user_users lists the users visible by the current user (eg the current user). The all_users table may be more relevant here.

How do we identify specific primary key attributes, foreign key references, etc?

How about

select * from dual;

See http://docs.oracle.com/cd/B10500_01/server.920/a96524/c05dicti.htm.


Oracle Schemas

A schema is owned by a user and represents, to a first approximation, the set of tables and other objects owned by that user.

Different users cannot necessarily "see" the objects in each others' schemas; they need to be granted permission.

Each schema has its own "namespace". In other words, users pld and bob can each have their own table "employee", in which case the tables pld.employee and bob.employee are not related.

If user bob is granted permission to view pld's table employee, then bob will access it as pld.employee. Such permission would be granted by pld as follows:

grant select on employee to bob;

For production use, the schema owner is usually an abstract entity rather than an individual, eg "admin_dept". Rights to create, view and modify tables can then be granted by the DBA to individual users.

In MySQL, one would give different users their own "database". A similar dotted notation, dbname.tablename, would be used to access tables in another database.


Oracle Materialized Views

In Oracle you can specifically ask that a view be materialized; that is, that it be a copy ("replica") of the underlying data. An early term for a materialized view was a data snapshot. You use the following syntax:

create materialized view ....

Materialized views are useful for data warehousing: creating a large (hundreds of attributes) table representing static data. Materialized views are also useful in networked databases: a frequently used view on one network node can reduce network traffic to other nodes. A materialized view can hold a local copy (or cache) of some other data.

A materialized view (any view, actually) of a single table can represent a subset of the columns (as specified in the SELECT clause), and also a subset of the rows (as specified in the WHERE clause).

Materialized views are updated (synchronized with the parent tables) in operations called refreshes. The fastest refreshes are obtained if one specifies REFRESH ON COMMIT; that is, the view table is updated whenever the underlying base tables are updated. This sometimes causes extensive CPU use and network traffic, and requires that the view permit a "fast refresh". Not all materialized views support this, and the creation of a "materialized-view log", which keeps track of all changes made to the base tables since the previous refresh, is required. Oracle supplies tools to help identify whether a material view is fast-refreshable, and, if not, what can be done to fix it.

The alternative is REFRESH ON DEMAND. This means that refreshes are not automatic, and must be scheduled by the DBA. Such refreshes are typically executed at periodic intervals. An on-demand refresh can either be COMPLETE or FAST (incremental).

A materialized view with REFRESH ON DEMAND will occasionally have out-of-date, or stale, data. Whether this is serious depends on the application. For data-warehousing applications, for example, it is seldom an issue; the data warehouse might contain long-term records of customer purchases. For an inventory application, a materialized view may or may not be appropriate. The out-of-date possibility means that an item that shows as available may in fact not be (or vice-versa). This may in turn mean that a few customers are disappointed, but retail customers are often easily mollified.

A materialized view represents a way to "pre-compute" joins. If joins are slowing down queries, a materialized view can improve performance dramatically. The same applies to calculation of aggregate (SUM, COUNT) values.