Using Information_Schema (and pg_catalog)

Due: Friday, April 27

In this last assignment you are to write two queries that make use of the two system-table collections (or schemas) information_schema (standard across different databases) and pg_catalog (specific to Postgres). We've seen that the pg_stats table, containing histogram information about the distribution of data in various columns, belongs to pg_catalog.

Where possible you should use information_schema, because it is more standard.

You can find Postgres documentation of information_schema at postgresql.org/docs/10/static/information-schema.html, and information about pg_catalog at postgresql.org/docs/10/static/catalogs.html.

For some query experiments, the \x option (typed at the psql prompt) will enable "vertical" format, which may improve readability.

To make your query easier to use, consider creating a "prepared" query (see here). You then use the sql command execute to run it. Here's an example:

prepare emps(varchar) as select fname, lname, ssn, salary from employee where lname = $1;
execute emps('Wong');

Query 1: list all tables with a serial primary key.

The goal is to be able to easily find such tables. Because of the likely need for a string match, below, your query might not be perfect at identifying serial keys, but that is unlikely to be an issue in practice. In any event, your query should err on the side of including too many results. (I have needed this query in the past.)

You can get information about a table's columns from information_schema.columns.

For extra credit, verify that the serial (autoincrementing) column is actually part of the primary key. For this you will need to join table columns to table table_constraints, probably involving a where clause "constraint_type='PRIMARY KEY'", and table key_column_usage, column column_name. 

I recommend proceeding as follows:

  1. create a sample table with a serial key, say named serialdemo.
  2. use select * from information_schema.columns where table_name='serialdemo' and column_name='keycolname' to find out how the serial key is described. It is here where the \x option may improve readability.
  3. Now write a query to identify this. You will probably need to use a match, of the form "LIKE 'something%'". Try to make your match as tight as you can.


Query 2: for a given table, list each column and all users who have read access (SELECT access) to that column.

Users are more properly called "roles". Your query should not list the owner of the table. For the basic query, use table column_privileges in information_schema. To identify the owner of the table, this should work:

select tableowner from pg_tables where tablename=$1

Test your query by creating a second role and granting it SELECT access to some but not all columns of a table. You create roles with, eg, create role alice;. This has to be done as user postgres. You can then grant permissions with, eg, grant select(dname, dnumber) on department to alice;. If you're using role inheritance (see the notes), only the parent role assigned the privilege will show up, not the "child" roles inheriting from that privilege. The information obtained by your query should match that obtained by \dp tablename.


Query 3: list each of your tables with a (possibly approximate) record count

The basic query here is

select relname, reltuples from pg_class;

This is potentially not exact because an operation adding or deleting a row may have just completed. You might also run into problems with Postgres not updating the count at all; I had this problem (in version 9.5) with pg_stat_user_tables.n_live_tup. If so, document the problem and try another method.

The above pg_class query includes all the system tables. Figure out a way to get rid of them! You might start with the relowner column, but this is in OID format so you'll have to join to another table to find your own OID. The best table here is probably pg_roles, columns rolname and oid.

Extra-credit options: Can you write the query so it lists all tables with non-system owners? Not just your own tables? You can get the name of the currently logged-in user (that is, you) with select current_user; Can you incorporate this into your query? The table pg_stat_user_tables might help get information about tables belonging to any user.