Information on three common databases: Postgres,
MySQL and Oracle. This file
contains DB-specific installation and configuration information.
PostgreSQL
In the 1970's, the Ingres database project began at Berkeley. Ingres
originally stood for Interactive graphics retrieval system, though shortly
thereafter the "graphics" part became much less important.
After Ingres (which does still exist) came Postgres, for "post-ingres".
Later, to emphasize its support for SQL, Postgres became PostgreSQL.
As it is virtually impossible to pronounce PostgreSQL using the "sequel"
form for SQL; most people pronounce it "postgres".
A decade ago, MySQL was the "fast one" and Postgres was the one that
dutifully implemented the SQL standard. Since 2010, MySQL has become a
little more "mainstream" (though it is still missing the intersection
operator), but Postgres has become a lot faster.
Postgres is also much more similar to Oracle (which is a little ironic
because Oracle now owns MySQL, though the usual assumption is
that Oracle doesn't want MySQL to compete with the family jewels).
Postgres can be found at postgresql.org.
Once installed, the postgres server runs in
the background. The psql
command on the command line sets up a connection to the
server, usually via network port 5432. If you have multiple command lines
running multiple instances of psql,
they are all connecting to the same master instance of the server. It is
possible to set up multiple independent servers on the same machine, each
using a different network port for connection and each maintaining a
different set of data files. There is seldom a need for this, however.
After you install the postgres server software, as with pretty much any
database, you need to create a user,
and a database for that user.
The postgres account is the only account installed by
default.
Here's what I had to do on linux (similar to macs; Windows is below). You
will need the -p 5433
option, if your PostgreSQL installation uses that alternative port.
1. From the shell:
bash> createdb
-U postgres pld
You will need to enter the postgres password.
2. Run the psql command for user postgres. The
"bash>" prompt below is the shell prompt (as user postgres); the
"post=> " prompt is the psql SQL prompt.
bash>
psql -U postgres
postgres=# create user pld with password 'zanzibar';
The system response here should be CREATE
ROLE. If you don't get that, something went wrong (make sure you
typed the final semicolon, though).
You now need to give permissions to your new user:
postgres=#
grant all privileges on database pld to pld;
postgres=# grant all on schema public to pld;
postgres=# alter database pld owner to pld;
Then, as user pld, the command "psql" works. If you created several
databases above, eg "pldoffice", then the command to connect to postgres
using the pldoffice database would be psql pldoffice.
What Postgres is doing here is noticing that my linux username is "pld",
and assumes (unless I request otherwise) that I will be Postgres user
"pld". Postgres also assumes that system user "pld" is authorized to use
the Postgres account with the same name; this is called peer
authentication.
Note that I did not actually give myself the right to create new
databases (though I don't need it).
When I first installed Postgres on linux, it set itself up to accept
connections on port 5435, instead of the standard 5432.
This was insufferably annoying as I had to specify -p
5435 in every command. I eventually discovered that I could
change the port by setting port =
5432 in the postgresql.conf file, in (under linux)
/etc/postgresql/9.5/main. The port number is more of an issue, though,
when installing under MacOS.
Installing on Windows
Go to postgresql.org, and click the
Download button, and then the Windows button. Pick the release you want
(16, the most current as of 2024, should be the best choice.) Click the
Download link.
Select "Download the ... installer". You can use either the installer from
EnterpriseDB or the "graphical" installer from BigSQL. I used the former
here. Be sure to choose 32-bit or 64-bit to match your system (you can
check your system with Control Panel → System; it should be 64
bit).
After downloading (~162 MB in 2018), run the installer. It starts with
"Welcome to the PostgreSQL Setup Wizard". At some point, you may see an
alert telling you it is installing "MS Visual C++ Redistributable".
Eventually you get to the PostgreSQL installer itself. The default
locations for everything should be fine. You will be asked
Please provide a password for the database superuser
(postgres).
Don't forget the password you provide here! Though for
classroom use it doesn't have to be very secure.
You will also be asked a couple other questions; the default answers
should be fine (though if the default network port is not 5432, that
probably means you have a Mac, and the default will probably then be 5433).
The installation begins when you click "next" on the screen that tells you
Postgres is ready to install.
You can skip the offer to launch Stack Builder, though it's fine if you
want to try it. It won't be used in Comp 305/405 though.
Postgres "peer" authentication means that if you have logged into your
computer with username, say, pld, and you want to get into your PostgreSQL
account with the same name, postgres just lets you, on the theory that
you've already provided your password to get logged in. Windows sometimes
has trouble with "peer" authentication, as in the Linux discussion above,
because system usernames under Windows tend to be unwieldy. In particular
they often have spaces, which do not work well as Postgres usernames. If
your system username and postgres username do not match, you cannot use
peer authentication. Just go ahead and use password authentication. The -h localhost option to the psql
usually forces password authentication. If the port is not 5432,
you will also need to add -p 543x
to every psql and createdb command, where 543x is the port used by your
installation.
As of 2018, it looks like the Windows installation has abandoned "peer"
authentication, which is a good thing. (At least there were no "peer"
authentication methods listed in the authentication-configuration file pg_hba.conf.)
Installing on a Mac
Go to postgresql.org, and click the
Download button, and then the macOS button. Pick the release you want (16,
the most current as of 2024, should be the best choice.) Click the Download
link.
I recommend the first option there, "interactive installer by EDB". I
downloaded the file postgresql-16.1-1-osx.dmg.
You can also use the Postgres.app, which is
convenient but which does not run "continuously". You'll have to
verify that the app has started Postgres whenever you use it (normally
this is not at all difficult).
I then went to the file in a Finder window, and double-clicked on it. This
shows postgresql-16.1-1-osx, which I again double-clicked. This starts the
actual installation, by default under /Library/PostgreSQL/16.1. (If it's not
installing in the /Library directory, then something is amiss!)
You are asked to provide a database superuser password; that is, the
password to the "postgres" account. Don't forget the password you
supplied! For class use it does not have to be especially secure.
Check to be sure that "PostgreSQL Server" and "Command Line Tools" are
selected for installation. The other tools are optional.
The README file accompanying the installation suggests increasing the
shared-memory allotment. You should definitely do that if you're running a
production service, but I did not.
Note that Apple may already have a Postgres installation on your
mac, as part of the mac system. You can verify this by
entering the following command in a terminal window and seeing if there's
any significant output: ps axuww |
grep postgres. It's fine to have two separate, unrelated
installations, but if this is the case then your new installation will
definitely have a new port number other than 5432 (it will probably be
5433). You need to use this new port in every command!
Postgres Accounts
Now that Postgres is installed, you will need to set up the basic account.
How you proceed here is pretty much the same as far as Postgres is
concerned on windows, macs and linux machines. Alas, it is not the same as
far as Windows/Macs are concerned.
You need to create your own non-superuser (non-postgres)
account, and then create a database. It's most convenient to have the
database account name be the same name as your computer login name, though
I'm never sure about spaces, and on Windows this is sometimes pointless.
To find out your windows login name, run psql with no options and it will
tell you, after prompting for a password (even if it won't actually
"connect").
On the Loyola computer I tested this on in 2020, I was "pdordal", my
Loyola ID. I also tried creating a username/databasename of "pld"; the
only difference was that with Windows username "pdordal" and Postgres
username "pld" I had to use the -U
pld option to psql.
To create the non-superuser account, open a command window, and try typing
"psql --username=postgres -p 5433". If it tells you the command is not
found, that means the psql command is not in your PATH. You can either fix
this now, as below, or else use one of these things:
- type the full path name: "c:\program files\postgresql\16\bin\psql"
--username=postgres -p 5433 (with quotes. The 16 here is the version
number).
- cd to the appropriate bin directory, and run commands from there (on
macs you'll need "./psql" rather than "psql"
- On Windows, type "psql" into the search box. You will be prompted for
a couple things (like the port number), and then you should connect.
- Likewise on Macs, but there may be more than one psql command; you
want the version-16 one.
Whichever you do, here are the steps. The option "--username=postgres"
(or "-U postgres", without the equals sign) means you will be
running the command as the PostgreSQL superuser, and will need to supply
the initial password above.
If you got a network port number other than 5432 during the
installation,
(a) You probably have a Mac
(b) You will need to use that new port number in every
psql and createdb command:
psql --username=postgres -p 5433
What is going on is that you have two entirely independent installations
of Postgres, with separate data. One is reachable via port 5432 and one
via 5433. They don't talk to one another; they don't even know about one
another.
In the example below the account name and database name will be "mydb".
You can use whatever name you want, and should probably use some form of
your username, first name or initials. On Windows I've tried both
"pdordal" and "pld". On Linux and Macs, it is popular to have your
Postgres account name match your login name.
The first step is actually to run createdb, rather than psql. Again using
the full path name, as above, or going to the bin directory, or setting up
PATH. You will also need to add -p
5433 if your port is 5433; I will show this below, but you can
leave it out if it's 5432. The createdb command is typed at the Windows
command shell, not within Postgres:
1. createdb --username=postgres
-p 5433 mydb (Supply the postgres password.
This might take several seconds.)
This creates a database for userid "mydb". On Windows in 2024, typing
"createdb" in the Windows search box failed; I had to open up a cmd
window and navigate to \Program Files\PostgreSQL\16\bin.
Next, create the matching account within PostgreSQL, using the
postgres administrative account. (Here, on Windows, you can just type
"psql" into the Windows search box; you'll be prompted for the username
and port, and maybe host (use localhost)).
2. psql --username=postgres
-p 5433 (supply the postgres password)
postgres=#
create user mydb with password 'mypassword';
(choose your own password for mypassword;
the response should be CREATE ROLE)
postgres=# grant all privileges on database mydb to mydb;
postgres=# grant all on schema public to mydb;
postgres=# alter database mydb owner to mydb;
Finally, you can log in as you:
3. psql --username=mydb -p 5433
;; the "--username mydb" isn't necessary
if mydb = your windows account name; supply mypassword
above
On Windows, if you're starting psql from the Windows search box, enter
your user name at the appropriate prompt.
Here are a few quick commands:
psql=>
\d ;; \d gives list of tables
psql=> create table demo (
name varchar(200),
idnum integer primary key
);
psql=> insert into demo
values('alice', 37);
psql=> insert into demo values('bob', 61);
psql=> select * from demo;
Setting your PATH
On either Windows or Mac machines, you may want to be able to type the
command "psql" in a cmd/bash window and have the Postgres client start
running. The main benefit of this is that you can type "psql" in a specific
directory, and load files from that directory into Postgres. The alternative
is probably to use the copy-pasted method to paste the contents of the file
into the psql window.
To do this, you have to modify your PATH, which is the list of directories
searched when you type a command like "psql". The directories are separated
internally by a ';' character on windows, and a ':' character on mac
and linux systems. On windows, the postgres directory is probably C:\Program
Files\PostgreSQL\16\bin.
To change your PATH on Windows 11, go to
System (control panel) → Change Settings (lower right) →
Advanced (tab) → Environment Variables →
→ System Variables → select PATH → Edit
When you click Edit you get an editor view that shows the PATH as a list of
directories with each in a separate text box. You should add an entry like
this:
c:\Program Files\PostgreSQL\16\bin
The space in "Program Files" should not matter.
On the Mac, the application directory is /Library/PostgreSQL/16/bin (where
16 is again the version number). To install this in your path, edit
the file .zshrc in your home directory (this is executed whenever
you log in). You can open it from within the TextEdit app. Add this line at
the end:
PATH=$PATH:/Library/PostgreSQL/16/bin
Then start a new bash window (or maybe it is zsh now), after the updated
.profile is saved; the commands should work.
On Windows you can also create a small batch file containing the command
set PATH="\Program
Files\PostgreSQL\16\bin";%PATH%
You then run this before typing any postgres commands.
Login Troubles
If you are having trouble logging in, here are some possibilities.
1. Sometimes it helps to force a network login, rather than a
"local" login. You can do this with
psql
--username=postgres --host=localhost -p 5433
The --password flag might also
help; this tells Postgres that you are expecting to be prompted for a
password.
This is especially likely to help with problems accessing the "postgres"
account, but it sometimes helps with ordinary accounts as well.
2. For Mac and Linux users, one way to get access to the PostgreSQL postgres
account is to first get access to the postgres system account. Do
this with: sudo -u postgres bash, followed by "psql" (or sudo -u postgres
psql). This doesn't work as well under Windows, which has a somewhat
peculiar approach to system accounts.
3. Expired passwords: sometimes this does happen. If you can't log in to a user
account, check this. Try looking at the output of '\du', while logged in as
user postgres. If you see the account, check if the attribute "cannot login"
is listed. To fix, execute this as user "postgres":
ALTER USER username VALID UNTIL 'infinity';
4. Problems with the file pg_hba.conf.
This is the postgres login configuration file. It should be in your postgres
directory, in subdirectory 16/main or 16/data. It can be used to disable
some forms of authentication authentication, if necessary.
Here is a set of traditional entries. The "peer" entries have been
deprecated in Windows since 2018 (I mean, so has md5, but it's simply been
replaced with sha-something.)
local
all
postgres
peer
local
all
all
peer
host
all
all
127.0.0.1/32 md5
host
all
all
::1/128
md5
These lines, which represent how a user logs in, are applied in order until
a match is found. The second column, "all" above, means what database. The
first column, "local" or "host" above, indicates whether the access attempt
is a "local" psql command or a network login. If we just type "psql" we get
the former, but we can force the latter with "psql --host=localhost", as in
item 1 above.
The final column, "peer" or "md5", indicates what kind of credentials must
be presented. "peer" means that the operating-system username is to be
trusted, as in item 2 above, and no password needs to be supplied. "md5"
means that a password must be entered, and then is sent in hashed (md5) form
to the postgres server for validation. (Postgres 10.1 is moving away from
md5, to scram-sha-256, but my Version-10 installation still shows "md5".)
The third column means which account this applies to. In the table above,
this means that user "postgres" must log in with "peer" credentials. That
means PostgreSQL user "postgres" can only log in from system account
"postgres", as in item 2 above. Password authentication is not allowed! We
can bypass this, though, with the --host=localhost trick of item 1, which
makes the login a "host" login, subject to the rules of line 3 or 4 above.
The last two entries above, for "host" login with "md5" authentication, also
contain IP addresses. These are the IPv4 and IPv6 versions of "localhost",
respectively. Logins from outside of your own machine will not be allowed
unless you change these. Note that on Windows the IPv6 form is preferred
internally for --host=localhost
logins.
If you do change pg_hba.conf, you will need to restart PostgreSQL for the
change to take effect.
Here are some problems and possible fixes:
(a) You can't log in to the postgres account with a password. Fix: use the
method of item 1 above, or item 2 above, or else change "peer" to "md5" in
the first line above of pg_hba.conf. (Change it to "trust" if you cannot
remember the password at all). Restart PostgreSQL.
(b). Your ordinary-user password isn't working, even though you just reset
the password by executing the following as user postgres, and you got back
the response ALTER ROLE:
alter user myname
password 'achiote' ;
Try the method of item 1 above. If that fails, check to see
if the "host" authentication method in pg_hba.conf
is md5. If absolutely necessary, change "md5" to "trust" to disable
password checking entirely. But if a "host" entry has "peer"
authentication, change "peer" to "md5".
Command Line
You can run postgres queries from the command line using the following
syntax:
psql -U
username -d databasename -c 'select * from employee'
You may have to add -W (or --password) to get psql to prompt for a
password, if applicable (this is usually not necessary if your postgres
username matches your operating-system username). On unix-like systems
(linux and macs), the command can be either in single quotes, as above, or
double quotes. The latter form allows shell variable expansion:
for table in employee works_on project
department
do
echo $table
psql -U pld -c "select count(*) from $table"
done
You can also run queries from within a program.
Postgres string comparisons are case-sensitive. If, in
the Company database, you run the query
select fname,
lname from employee where lname = 'wong';
you will get nothing (printed as "(0 rows)"). The correct query is
select fname,
lname from employee where lname = 'Wong';
One way to resolve this is with
select fname,
lname from employee where lower(lname) = 'wong';
However, applying lower() to a primary-key attribute (or any indexed
attribute) will mean that the attribute's index will likely fail to be
used. Another alternative is to use ilike instead of =:
select fname,
lname from employee where lname ilike 'wong';
This may also in some cases mean the index goes unused. But don't worry
about those cases for now. In the employee table, the index is on the
numeric ssn field; there is no index on the lname field.
Some postgres commands
Your psql prompt should show you the current database.
SHOW ALL -- shows status variables only
\d
same as MySQL show tables;
\d employee similar to as MySQL describe
employee; gives all indexes and FK
constraints with names
\d [NAME] describe table, index, sequence,
or view
\d{t|i|s|v|S} [PATTERN] (add "+" for more detail)
list tables/indexes/sequences/views/system tables
\da [PATTERN] list aggregate functions
\db [PATTERN] list tablespaces (add "+" for more detail)
\dc [PATTERN] list conversions
\dC list
casts
\dd [PATTERN] show comment for object
\dD [PATTERN] list domains
\df [PATTERN] list functions (add "+" for more detail)
\dg [PATTERN] list groups
\dn [PATTERN] list schemas (add "+" for more detail); \dnS includes
system schemas
\do [NAME] list operators
\dl list
large objects, same as \lo_list
\dp [PATTERN] list table, view, and sequence access privileges
\dt [PATTERN] list tables in a schema; eg \dt
pg_catalog.*
\dT [PATTERN] list data types (add "+" for more detail)
\du [PATTERN] list users
\l
list all databases (add "+" for more detail)
\z [PATTERN] list table, view, and sequence access privileges
(same as \dp)
If you start psql with the -E option, postgres will tell you the
underlying query it uses for each of the above commands.
Here's another way to list your tables:
select
table_name from information_schema.tables where table_schema='public'
and table_type='BASE TABLE';
To connect to a given database:
\connect pld
To enable query-time printing:
\timing
To insert a csv file (comma-separated-values, though any delimiter
may be used; the delimiter here is tab, \t) into a table:
copy big_project from '/tmp/bigproject.csv'
delimiter E'\t' csv null '\N';
copy big_employee from '/tmp/bigemployee.csv' delimiter E'\t' csv
null '\N';
To display the server version:
select version();
To turn off that annoying paging:
\pset pager off
To find indexes on table EMPLOYEE (though \d is probably all you
will ever need)
prepare listindexes(text) as
select
t.relname as table_name,
i.relname as index_name,
a.attname as column_name
from
pg_class t,
pg_class i,
pg_index ix,
pg_attribute a
where
t.oid = ix.indrelid
and i.oid = ix.indexrelid
and a.attrelid = t.oid
and a.attnum = ANY(ix.indkey)
and t.relkind = 'r'
and t.relname = $1
order by
t.relname,
i.relname;
Now run it with
execute listindexes('employee')
This is an example of a stored procedure, or stored query.
You can load a file of sql commands (eg
employee.alter.text) using the \i (include) directive:
\i company.alter.text
To do this, you must have started the psql client command while in the
directory containing the file. Otherwise you have to give the full path. You
can also open the file in an editor, copy it, and paste it in to a command
window running psql, though for even moderately large files (eg
dellstore.sql) that does not always work smoothly.
Intersect example
This works, under postgres. Here is the example (the select attributes must
match in the two queries being intersected):
(select
e.fname, e.lname from employee e, works_on w where e.ssn = w.essn and
w.pno = 2)
intersect
(select e.fname, e.lname from employee e, works_on w where e.ssn =
w.essn and w.pno = 3);
Postgres file storage
Your postgres files are in a designated directory (at least if you are
using the standard "tablespace"). On my linux installation it is
/var/lib/postgresql/9.5/main/base/. There is a directory for each
database, and, within these directories, a file for each table. Databases
and tables are usually named using their so-called "oid" numbers. My "pld"
database on my laptop (2015) has oid 16385. I can find this with the
oid2name command, or with the query
select oid,
datname from pg_database;
To find the oid of a table, say 'employee', you can use
select
pg_relation_filepath('employee');
-- note quotation marks!
or
select
tableoid from employee limit 1;
The first returns
base/16385/16386
The second returns 16386. For my password database on ulam.cs.luc.edu (a
different installation from above!), the first is base/16385/16387. The file size itself is just under half a
gigabyte.
Very large tables are split over multiple files (by default 1 GB per
file). Tables with very wide columns (blobs and long strings, for example)
have an associated "TOAST" table; the primary table file then in effect
has pointers to another location where the large field is stored. TOAST
stands for The Oversized-Attribute Storage Technique.
Record layout
Postgres "tuples" are immutable. When a record needs to
be updated, the old tuple is left in place, but marked as not current. The
new tuple is created in a new location. This design decision does
sometimes create some write bottlenecks, but we will address that later.
Each tuple has a system attribute called ctid, which is a pair of numbers
that, informally, represents (block number, record number). Blocks are
8KB. What is Mr Wong's ctid?
select ctid, lname, ssn, salary from employee;
Now let's update Mr Wong's salary:
update employee set salary = salary+10 where
lname = 'Wong';
What is Wong's ctid now? What happens if we repeat this a few times?
Actually, each block begins with an array of record pointers; the ctid
record number points to one of these record pointers. That pointer then
points to the actual block; this way, records within a block can be
compacted without changing any ctids.
Postgres tuples also have xmin and xmax
system attributes, reflecting the minimum transaction id
that can view a record and (for deletions) the maximum. These play a role
in concurrency control; we will come back to them later.
Every transaction is numbered sequentially; the current transaction id is
available via
select txid_current();
As soon as there are no remaining transactions in progress with txid
<= xmax, that particular tuple can be deleted completely. This is done
by the autovacuum process, which is sort of the garbage
collector for Postgres.
Other system columns include cmin and cmax.
More at rachbelaid.com/introduction-to-postgres-physical-storage/,
interdb.jp/pg/, and, of course, postgresql.org/docs/9.6/static/storage.html.
Postgres schemas
As a general database term, a "schema" is a set of tables together with
table definitions. The University and Company databases would be good
examples of separate schemas, except I did not set them up that way.
Postgres allows schemas -- as a kind of namespace -- to be first-class
objects within the database.
If, as user postgres, I enter \l (list databases), I get these:
- dell
- pld
- usda
- postgres
- template0
- template1
The first three were created by me. The last three have no top-level
tables! If I do the following,
\c postgres
\d
I get
No relations found
Where are the tables for the postgres database? They
are all within non-default schemas. These schemas can be
listed with
\dnS
(the S asks to include even the system schemas.) Here is a partial list:
- information_schema
- pg_catalog
- pg_temp_1
- pg_toast
- pg_toast_temp_1
- public
Databases are, in effect, divided into schemas. Every database has a
schema named public (the last one above) which is
(initially at least) the default schema for that database. Every database
also has a schema pg_catalog; this particular schema is
the same across all databases. Every database also has a schema information_schema
as part of the SQL standard; this schema contains a more "standardized"
(that is, non-Postgres-specific) version of the pg_catalog information.
Schemas can be thought of as directories within a database, where tables
are actually stored, although you cannot have nested schemas. A table foo
in schema bar can be thought of as having fully-qualified name bar.foo.
Schemas have owners. The "postgres" role owns the public schema, and all
the system schemas. The owner of a table in a schema is not necessarily
the same as the owner of the schema.
We can examine the tables in a schema with, eg
\dt pg_catalog.*
We can the look in detail at the pg_database table with
\d pg_catalog.pg_database
\d pg_catalog.pg_index
We can read the table with
select * from pg_catalog.pg_database;
Tables are described in the information_schema schema; for example we can
run this:
select
table_name from information_schema.tables where
table_schema='public' and table_type='BASE TABLE';
But there is no table 'tables' in information_schema!
(We can check this with \dt information_schema.*). It turns out that tables
is a view. We can list all views with
\dv information_schema.*
Or
\connect pld
select table_schema, table_name, table_type from information_schema.tables
where table_catalog='pld';
Why is this a view? Most of these views are of Postgres-specific tables
(and Postgres-version-specific tables) in the pg_catalog schema;
the information_schema views are more generic. The underlying tables are
subject to change, but the view is not.
The information_schema schema is documented in http://www.postgresql.org/docs/current/static/information-schema.html.
Recall that if you run psql with the -E option, you get to see the
underlying queries generated. This would be a good time to give that a
try.
We can also run the following:
select table_name from
information_schema.tables
where table_catalog='pld'
and table_schema='public'
and table_type = 'BASE TABLE';
This is like the previous query, but with more where
clauses. It also has similarities to the listindexes example of the
previous section.
The public schema is the default. We can run the following to find all
tables in this schema:
SELECT * FROM information_schema.tables WHERE table_schema = 'public'
Alternatively, we can use \dt public.* However, this will only list
tables in the current database.
To list all the tables in the current database, we can use \dt
*.*
Schemas and organization
Here are a few options for organizing a database with schemas.
1. Use only the public schema. There may be multiple
users and multiple separate table ownerships.
2. Each user can be given his or her own schema, corresponding to their
username. Each user's tables are potentially visible to other users (if
permissions allow it), but this visibility is not "obvious" (the other
schema tables won't show in a \dt, for example). Users can choose to
restrict access by other users, or not. (If each user gets his or her own
database, then no sharing between users is possible at
all.)
With this strategy, it might be appropriate to deny everyone access to
the public schema, to force them to use their individual
schemas.
3. Create one schema for each company "grouping", eg for each
store or each project. Then most queries within one store's schema will
refer only to that store, but it is possible to write queries that collect
data from different stores. If we create a separate database for each
store, on the other hand, we can not write queries that
involve data from different stores.
Schema search path
To avoid having to type "fully qualified" schema.table names, one can
define a schema search path.
show search_path;
If we create a new schema (in database "pld")
create
schema ignatius;
create schema ignatius authorization
pld; -- done as postgres, if pld lacks rights
to create schema
and add two tables:
create
table ignatius.foo (...);
create table ignatius.foo2 (...);
if I type \dt I don't get these. I can type \dt ignatius.*, of course. But I
can also do this:
set
search_path to ignatius, public;
Now the tables in the ignatius schema show up. If I reverse the search_path
order, all that does is reverses the list order. But if I remove ignatius
from the search path, I have to access the tables as ignatius.foo, etc.
Newly created tables go in the first schema on the path, if not fully
qualified.
Postgres arrays
We can have arrays in postgres data fields; see http://www.postgresql.org/docs/9.5/static/arrays.html.
This is often done as denormalization, though it
definitely breaks the 1NF rule.
create table courses (name
varchar, courses integer[]);
insert
into courses values ('Dordal', '{ 305, 388, 417, 443 }'); -- note
quote marks!
insert into courses values('Harrington', array[150, 170, 310]);
We can then access the array components using []:
select
courses[2] from courses where name = 'Dordal';
update courses set courses[2]=488 where name='Dordal';
Searching is primitive: we can do this
select
name from courses where 305 = any(courses);
but not this:
select
name from courses where any(courses) = 305;
The general rule is that if you are spending a lot of time searching arrays,
you probably should switch to the 1NF normalization; that is, a separate
table of ⟨instructor,course⟩ indexed by both attributes.
Two-dimensional arrays are also possible.
Postgres 9.5 new-features summary:
MySQL
Some notes on installing mysql
These assume that you will use a command-line interface.
Basically, you need to create a user,
and then a database for that user.
A database is a collection of tables.
Step 1: do the actual installation (eg apt-get install mysql-server, or run
the .exe install file). During this step, you should be asked for a master
password. Do not forget this password!
Downloads for the MySQL Community Server are at http://www.mysql.com/downloads/mysql.
At a minimum, you will need MySQL Server.
The MySQL server runs in the background. The
mysql command on the command
line sets up a connection to the server. If you have
multiple command lines running multiple instances of mysql,
they are all connecting to the same master instance of the server.
Step 2. log in using the master password.
mysql --user root -p
(give master password)
Step 3. Now new users and databases can be created, eg pld. (note password
is in quotes)
mysql> create user pld identified by
'cranberry';
(or: create user pld @ localhost identified by 'cranberry';
mysql> create database plddb;
mysql> grant all on plddb.* to pld;
We have created a user pld, and a database plddb, and given user pld full
control over plddb, and only over plddb. If you wanted to allow pld to
create databases, you could use
mysql> grant all on *.* to 'pld';
You can also grant individual actions:
grant SELECT, INSERT, UPDATE, DELETE,
CREATE, DROP ON *.* TO 'pld';
Step 4. Now log in as pld (after logging out as root):
mysql --user=pld --password=cranberry plddb
But let's assume you are user pld,
in unix-world, and that you don't wish to type your password on the command
line. Then use:
mysql -p plddb
Step 5: Create a table (while logged in as pld). Java/C users take note: the
comma is a separator in the field
declarations, not a terminator (ie there is no comma after the final
declaration, and the field name precedes
the type name.
mysql> create table foo (id int, name
varchar(100));
mysql> insert into foo values (1, 'peter');
mysql> insert into foo values (2, 'paul');
mysql> show tables;
mysql> select * from foo;
The last is our first example of a SQL select
statement.
If you are using a unix-based system, you might find the following helpful
in your my.cnf file (eg in /etc/mysql). It must be in the [mysqld] section.
lower_case_table_names=1
With this setting, the following are equivalent:
select * from employee;
select * from EMPLOYEE;
MySQL by default does case-insensitive comparisons. The
query
select fname, lname from employee where lname = 'wong';
returns
Franklin Wong
Postgres does not do this; the query above would return nothing
because 'wong' ≠ 'Wong'.
All privileges are stored in the mysql database (which you
can access as root with "use mysql;"). The user table is
where much of it resides. Examples:
use mysql;
show tables;
select user from user;
describe user;
select user, password from user;
Why are there so many root accounts? This becomes clearer
with
select user, host from user;
The pair (user,host) is the primary key of this table.
You can load a file of sql commands (eg
employee.alter.text) using the source command:
source company.alter.text
To do this, though, you must have started the mysql client command while in
the directory containing the file. Otherwise you have to give the full path,
and may have to "escape" backslash characters.
MySQL configuration
Thhe MySQL configuration file is my.cnf (sometimes
my.ini); on unix-based systems it is usually in directory /etc/mysql.
Server features are defined in the [mysqld] section. They include:
The port (normally 3306) and socket used for
client communication with the server. This is mostly used by those who
want MySQL to be a drop-in replacement for some other DB, and by those who
want to run multiple instances of MySQL (or drop-in replacements such as
MariaDB) on the same machine (each instance has to use different ports,
sockets and data directories).
Network listening: the default configuration is now to listen to port
3306, but only for connections originating from localhost.
Connections from outside are simply refused. Eventually one needs to
change this, but likely only for connections from specific hosts that are
part of your web-based front end.
The datadir: where do the database files go
by default? On unix-based systems this is often /var/lib/mysql.
The temporary-files directory
Configuration settings such as lower_case_table_names=1
and innodb_file_per_table.
Security settings such as symbolic-links
(should they be allowed?)
Rules for error logging (where, and to what extent, and what constitutes a
"slow" query, etc)
It turns out (somewhat surprisingly?) that if you don't define a
network-connection port in the configuration file then no network
connections are allowed. The same goes for defining a so-called "unix
socket" or "named pipe" for communicating with the mysqld
server process: if it's not defined in the config file, then it doesn't
happen.
A consequence of this is that if one starts up the msyql server "mysqld"
with a stripped-down configuration file, then the user connection tool
"mysql" will not be able to connect. The server is running, but not talking
to anyone.
"Raw" installation
The above process assumes you are using an installer that sets everything up
automatically. The actual steps are as follows:
- install the MySQL executables
- create the initial system tables
The latter is done with a script mysql_install_db. It creates, among
other things, the root account. Some options are possible, but rare.
Where and how are MySQL files stored?
The default file format is set by the DB engine (eg InnoDB) and the default
location is set in my.cnf. The DB engine determines the internal format
(record length, etc) and the index format; MyISAM, for example, uses
ISAM-type index files.
You can change the DB engine on a per-table basis by adding an engine
specification to create table:
create table foo (
....
) engine = innodb;
Alternatives include
- MyISAM: once upon a time the MySQL default
- memory: all tables are kept in RAM
- csv: comma-separated values, usually used only for data export
- Archive: non-indexed, used for archival data
- various cluster-oriented engines such as NDB/NDBCLUSTER and Federated
- Blackhole: very fast -- ahem -- write performance, but no persistence.
Originally, MySQL kept all the innodb data in a single file, typically ibdata1.
The default in MySQL is now file_per_table. My older MySQL
tables all live in ibdata1, but tables created since I upgraded to MySQL 5.6
now get their own file. For example, I rebuilt the University database, and
so directory plddb contains files
- course.ibd
- grade_report.ibd
- prerequisite.ibd
- section.ibd
- student.ibd
Conceptually, theses files can now be on different devices. A frequently
used file, for example, might be placed on a solid-state drive (SSD), or two
files frequently traversed simultaneously (eg with a join) might be put on
two different magnetic disks, so the disk heads wouldn't have to move back
and forth from one file to the other.
MySQL does not support moving table files, eg with ALTER TABLE. However,
tables can be created in any directory as follows:
create table foo (
....
) data directory = '/home2/mysql/data';
Given table foo in /var/lib/mysql, one can do the following to "move" it to
/ssd/mysql/data.
- create table foocopy ( /*
exactly like foo */ ) data directory '/ssd/mysql/data'; -- same
table structure as foo
- insert into foocopy select *
from foo;
-- copy the data from foo to foocopy
- drop table foo;
- rename table foocopy to foo;
Alternatively, one can just move the original foo.ibd to the new
location, and create a symlink from the old location to
the new. This requires turning on symlink support, however; there are also
some warnings in the MySQL manual about such symlink use.
MySQL data dictionary
If I connect to my MySQL system, I see my own database named "pld", and also
"mysql", "information_schema", and "performance_schema".
The "mysql" database contains information about users and about the system
itself. For example, we can run the query
select user, password from user;
Try adding host to the select clause as well.
The information_schema database includes the TABLE table. For example, we
can run this:
select table_name, table_schema, table_rows,
engine from tables;
select table_name, table_rows, engine from tables where table_schema =
'pld';
MySQL Password hashes
Here is a snapshot of the mysql password table, generated with
use mysql; -- must be root
select user, password from user;
Duplicates of the root account are deleted.
+------------------+-------------------------------------------+
|
user
|
password
|
+------------------+-------------------------------------------+
|
root
| *1D9941EB89D26755CEFBB3D0A07498ECE4F8BE83 |
| debian-sys-maint | *E1ABF29B1CB5E9B02FB4E164322FF31E8406990A |
|
pld
| *F1624510F3A5C73809D0E47DAECAA31C14D04910 |
|
bob
| *C0316B761A961A800C37D888D19D23E1790C12DB |
+------------------+-------------------------------------------+
Bob's password is 'cranberry'.
The command openssl dgst -sha1
calculates the 20-byte SHA-1 secure hash of its input. Adding the flag -binary means that the output is in
binary rather than hex. According to this
site, the MySQL password hash is sha1(sha1(password)). Let us verify
this.
echo -n "cranberry" | openssl dgst -sha1
-binary | openssl dgst -sha1
c0316b761a961a800c37d888d19d23e1790c12db
The echo -n cranberry
generates as its output the string "cranberry", without a trailing newline.
The first openssl command
takes the SHA-1 hash of that, and passes it in binary form to the second,
which repeats the operation and this time returns the result in hex. It does
indeed match what's in the table.
The initial '*' character in the table is a marker indicating that the above
"new" password hash is used, versus the pre-4.1 16-byte "old" password hash.
Let's change Bob's password to 'gingerbread'. One approach is to generate
the new string externally, via shell commands; 'ucase' is my own script for
changing letters to uppercase:
echo -n "gingerbread" | openssl dgst -sha1
-binary | openssl dgst -sha1 | ucase
402F36BCAB785DE4D3EB52B98C22E3DA45162B3F
Alternatively, the password() function generates new password strings
internally. To view the function result, we simply include it in a select
statement:
select password('gingerbread');
Now we change bob's password, with either of the following:
update user set password =
'*402F36BCAB785DE4D3EB52B98C22E3DA45162B3F' where user = 'bob';
update user set password =
password('gingerbread') where user = 'bob';
MySQL must now be stopped and restarted, but after that bob should be able
to log in with password 'gingerbread', and not with 'cranberry'. To restore:
update user set password =
'*C0316B761A961A800C37D888D19D23E1790C12DB' where user = 'bob';
Stopping and restarting is needed because MySQL keeps a copy of this table
in memory, and does not expect the table to be written to. In general,
changing passwords with the standard SET PASSWORD command is much safer.
set password for 'bob' =
password('gingerbread');
set password for 'bob'@'%' = password('gingerbread');
Oracle
Compared to MySQL, Oracle has more proprietary extensions. A few are
there more to lock you into Oracle than because they are a good idea. That
said, Oracle does do well with big, transactional DBs.
Note that, with Oracle's purchase of Sun Microsystems, Oracle now owns
MySQL. Some people think this is why they bought Sun. (Others think it was
for Java.)
Oracle tends to support more features than MySQL. Transactions
are one, though this is now beginning to appear in MySQL. Another is key
constraints, and foreign key constraints
in particular. However, consider the following:
- Application programs generally have to explicitly check for foreign
key values, anyway; otherwise, it is difficult to respond naturally to a
user error.
- Now that Oracle has acquired the innodb
database engine for MySQL, foreign key constraints are now implemented
whenever the DB administrator chooses the innodb engine (which is now
the default).
Note also that MySQL is dual-licensed: as open-source (which means your
additions must also be open-source, if you release them) and as a
proprietary product (meaning you can develop and
sell proprietary extensions to MySQL, provided you pay the
licensing fee).
Installing Oracle 12c (more later)
- bandwidth to download 2+ GB
- disk space
Some notes on the Oracle Developer Days virtual machine
If you install this, the machine login username/password is oracle/oracle.
Once logged in, create a terminal window, and start sqlplus as follows:
Oracle data dictionary
sqlplus / as
sysdba
From there:
alter
user sys identified by "mynewpassword"
# creating a stronger password is recommended
create user pld identified by
"mypass"
# Use your own initials or name instead of 'pld'
grant connect to pld;
# Give your account the important privileges
grant create table to pld;
#
grant alter any table to pld;
#
grant unlimited tablespace to pld;
#
At this point you should be able to login as 'pld', from another terminal:
sqlplus pld
You can create tables, etc. To get information about tables, the following
work:
select
table_name from user_tables;
select owner, table_name from all_tables;
describe employee;
To get constraints: P = primary key, R = foreign key C = check (eg not null)
select constraint_name,constraint_type from user_constraints where
table_name = 'WORKS_ON';
select constraint_name, table_name from user_constraints where
constraint_type='R';
Demo of Oracle
To start (on the virtual machine)
sqlplus / as sysdba
sqlplus pld
To view tables:
select table_name from user_tables;
select owner, table_name from all_tables;
select * from user_objects where object_type =
'TABLE';
A schema is a collection of objects (eg tables) under
common ownership. Thus, I could use different owners (and so different
schemata) to represent what in MySQL corresponds to different "databases".
But in MySQL, different databases don't have to have different owners.
(Actually, MySQL tables don't really have "owners"; instead, users
are GRANTed rights to the tables.)
One can in Oracle also create multiple tablespaces. These
aren't really like MySQL databases either; they are there to assign storage
space to particular sets of tables.
Question: what Oracle permissions will allow me to "factor" a table into two
tables? In MySQL, if I own the database, I can do whatever I want with the
tables in that database.
To load the company tables, I had to make a few changes.
First, the default date format in MySQL is, eg, 27-01-2014; the default
format in Oracle is 27-jan-14. There's a way to get Oracle to read in dates
in a different format, but I didn't bother.
Second, the insert into tablename values
command of Oracle takes only a single row; I had to edit this part of the
load file.
Also, I named all the constraints. The default Oracle names are things like
sys_c0033725; it is not clear if such things are
Foreign-Key constraints or something else.
To see the constraints on a table:
Constraints have the following one-letter codes: P = primary key, U =
secondary key (unique), R = foreign key C = check (eg not null)
select constraint_name,constraint_type from
user_constraints where table_name = 'WORKS_ON';
select constraint_name, table_name from user_constraints where
constraint_type='R';
To see the indexes on a table:
select index_name, index_type, table_name
from user_indexes;
The index_name values are cryptic, and the index_type values are all
'NORMAL'.
Here are a few queries from before to run under Oracle:
select e.fname, e.lname , w.pno from
employee e, works_on w
where e.ssn = w.essn and (w.pno = 2 or
w.pno =3);
select e.fname, e.lname, s.fname, s.lname
from employee e, employee s where e.super_ssn = s.ssn;
(select
e.fname, e.lname from employee e, works_on w where e.ssn = w.essn and
w.pno = 2)
intersect
(select e.fname, e.lname from employee e, works_on w where e.ssn = w.essn
and w.pno = 3);
Oracle administrative accounts
(See Oracle Database Administrator's Guide.)
The two privileged accounts automatically created by Oracle are SYS and
SYSTEM. Both have been granted privileges of the "DBA role"; we'll get to
roll-based privileges later but this basically means the accounts have all
privileges.
The SYS account is the actual owner for the data dictionary (below). (More
precisely, the data dictionary lives in the SYS "schema".) Any
user-initiated changes to these tables can result in DB corruption; the
SYSTEM account is sometimes considered safer for this reason. But the SYSTEM
account can still do plenty of damage.
There are two other privilege roles that offer powers not present in the DBA
role: SYSDBA and SYSOPER. The SYSOPER privilege does not grant the right to
view the data directly (eg with SELECT), though it does grant the right to
back up the data, and the data can be read from those backups.
After (or at the same time as) an Oracle user connects with "standard"
account privileges using the sqlplus command, he or she
can elevate privileges with the CONNECT command: connect username
as {SYSDBA|SYSOPER}. The username can be replaced with "/" to
indicate that the underlying operating system is to determine privileges.
Example:
sqlplus / as sysdba
The operating-system groups OSDBA and OSOPER exist to allow privileges to be
delegated by the OS: if you are in one of these groups and you request the
corresponding SYSDBA or SYSOPER privilege, you will not have to authenticate
with your Oracle password.
Oracle Data Dictionary
Earlier, we tried
select table_name from user_tables;
The user_tables structure is part of the Oracle data
dictionary. Here are some additional pieces in Oracle 11g:
- user_catalog
- user_tab_columns
- user_views
- user_constraints
- user_objects
- user_users
- ...
The user_catalog (or just cat) is a table of
the current user's tables; additional information about these (mostly
storage-related) is found in user_tables. The user_views table contains
information about views (below); views are a sort of pseudo-table.
select * from cat;
An alias for cat is tab. The
"official" way to get all tables is
select table_name from user_tables;
select object_name from user_objects where object_type = 'TABLE';
You can use describe employee to get information about
the employee table. What is different from the MySQL version?
Earlier we looked at some information from user_constraints. I had: used
this information to find the existing foreign-key constraints and
re-create them with sensible names:
select table_name, constraint_name from
user_constraints where constraint_type = 'R';
select table_name, constraint_name, constraint_type from user_constraints;
If we run the following query:
select index_name, table_name from
user_indexes;
in Oracle, we find that the tables project and department
have two indexes. Why?
User_objects contains tables and indexes, and also (if defined) views,
functions, procedures, synonyms, triggers.
These user_... "tables" are actually views, as
discussed below. They list the things "owned" by the current user, or, more
precisely, belonging to the current user's "schema".
If a user replaces the prefix user_ with all_,
the tables (views) show all objects accessible by the current
user. If one has sufficient privileges there is also a prefix dba_,
which includes everything.
The data dictionary is the repository for all metadata. It is normally
understood to be read-only. Some privileged accounts can update the data
dictionary directly; this is often unwise.
A number of "scripts" are available for accessing information in the data
dictionary. There are also some built-in commands, such as describe
tablename. In MySQL, most metadata is accessed not from
tables/views but via commands, eg show tables, show
create table tablename, etc.
The table user_users lists the users visible by the current user (eg the
current user). The all_users table may be more relevant here.
How do we identify specific primary key attributes, foreign key references,
etc?
How about
select * from dual;
See http://docs.oracle.com/cd/B10500_01/server.920/a96524/c05dicti.htm.
Oracle Schemas
A schema is owned by a user and represents, to a first approximation, the
set of tables and other objects owned by that user.
Different users cannot necessarily "see" the objects in each others'
schemas; they need to be granted permission.
Each schema has its own "namespace". In other words, users pld and bob can
each have their own table "employee", in which case the tables pld.employee
and bob.employee are not related.
If user bob is granted permission to view pld's table employee, then bob
will access it as pld.employee. Such permission would be granted by pld as
follows:
grant select on employee to bob;
For production use, the schema owner is usually an abstract entity rather
than an individual, eg "admin_dept". Rights to create, view and modify
tables can then be granted by the DBA to individual users.
In MySQL, one would give different users their own "database". A similar
dotted notation, dbname.tablename, would be used to access tables in another
database.
Oracle Materialized Views
In Oracle you can specifically ask that a view be materialized;
that is, that it be a copy ("replica") of the underlying data. An early term
for a materialized view was a data snapshot. You use the following
syntax:
create materialized view ....
Materialized views are useful for data warehousing:
creating a large (hundreds of attributes) table representing static data.
Materialized views are also useful in networked databases: a frequently used
view on one network node can reduce network traffic to other nodes. A
materialized view can hold a local copy (or cache)
of some other data.
A materialized view (any view, actually) of a single table can represent a
subset of the columns (as specified in the SELECT clause), and also a subset
of the rows (as specified in the WHERE clause).
Materialized views are updated (synchronized with the parent tables) in
operations called refreshes. The fastest refreshes are
obtained if one specifies REFRESH ON COMMIT; that is, the view table is
updated whenever the underlying base tables are updated. This sometimes
causes extensive CPU use and network traffic, and requires that the view
permit a "fast refresh". Not all materialized views support this, and the
creation of a "materialized-view log", which keeps track of all changes made
to the base tables since the previous refresh, is required. Oracle supplies
tools to help identify whether a material view is fast-refreshable, and, if
not, what can be done to fix it.
The alternative is REFRESH ON DEMAND. This means that refreshes are not
automatic, and must be scheduled by the DBA. Such refreshes are typically
executed at periodic intervals. An on-demand refresh can either be COMPLETE
or FAST (incremental).
A materialized view with REFRESH ON DEMAND will occasionally have
out-of-date, or stale, data. Whether this is serious
depends on the application. For data-warehousing applications, for example,
it is seldom an issue; the data warehouse might contain long-term records of
customer purchases. For an inventory application, a materialized view may or
may not be appropriate. The out-of-date possibility means that an item that
shows as available may in fact not be (or vice-versa). This may in turn mean
that a few customers are disappointed, but retail customers are often easily
mollified.
A materialized view represents a way to "pre-compute" joins. If joins are
slowing down queries, a materialized view can improve performance
dramatically. The same applies to calculation of aggregate (SUM, COUNT)
values.