MySQL and PostgreSQL Administration

Comp 305/488: Database Administration, LT 410, 4:15 Tuesdays

Miscellaneous database administration notes

Mostly relating to INDEXES and QUERY OPTIMIZATION (including EXPLAIN)

(And some on Oracle)

A Big Dataset

I downloaded 10,000,000 ⟨username,password⟩ pairs from xato.net/passwords/ten-million-passwords. This is real password data; users chose their passwords with the expectation that they would not be seen by others.

All the pairs are in Postgres in the table passwords.password on ulam, and in MySQL on host sql1. I first modified the data to replace control characters and non-Ascii Unicode characters with reasonable Ascii equivalents.

Auxiliary goals:

legitimize the data by putting it to a non-hacking use
establish the legitimate use of bittorrent

The size of the Ascii tables files is about 200 Mbyte. The actual MySQL file with index is about four times that, but only a little over twice that without the index (perhaps because I allowed varchar(80?)

-rw-rw---- 1 mysql mysql 801112064 Feb 17 12:22 password.ibd
-rw-rw---- 1 mysql mysql 478150656 Feb 17 12:11 password_noindex.ibd

For Postgres, the file size without index was 500,998,144 bytes. With index, the base file didn't change size.

See here for file locations.

Copying the table within MySQL takes about 1.5 minutes. Copying in Postgres took 164 seconds, or 2.75 minutes.

The password table was defined as follows:

create table password (
    username varchar(80) not null,
    password varchar(80) not null,
    primary key(username,password)
)

The csv text files can be loaded into the table (as user postgres) using

copy password from 'filename' delimiter E'\t' csv;

How many entries?

select count(*) from password;

Whose password is the same as their username?

select count(*) from password p where p.username = p.password;

219058
1 row in set (7.94 sec) (MySQL)

This query requires one pass through the data. The time here is typical of such a pass. For Postgres, the time was just over 10 seconds. The command \timing enables query timing in Postgres.

Next let's compare the results of two similar queries:

select * from password where username = A
select * from password where password = B

Here are a few (A,B) pairs to try, where in each case both the username and password are unique.

badmole, 150super
corydeut, uxoodo
eurotags, Lambrech
hpgator, ncgator
kiraartmank, sundayfunday
megabit_tat, Rafit03
pasdebois123, gingerpasdecalai
sargis88ov, sarkisov
telo4ela, qazZZ1234
zypipofylyk, KArIHaFacEH

Times to look up a username are reported by MySQL as 0.00, and by Postgres as in the range 0.5-20 ms. Times to look up a password are ~7.6 seconds in MySQL and ~10 seconds in Postgres. Why the difference?

The original password table has a primary key, and therefore, automatically, an index on that key. An index greatly improves the speed of a lookup given a key value.

Here, though, the key values have two attributes: ⟨username,password⟩. In this case, MySQL indexes support fast prefix lookup: given just a username, we can quickly find all the matching ⟨username,password⟩ pairs. This works just like using a phone book, which is an index on ⟨lastname,firstname⟩ pairs, to look someone up using only the last name.

In MySQL I also created a copy of the table, password_noindex, with no key and therefore no index at all:

create table password_noindex (
username varchar(80) not null,
password varchar(80) not null
)

How do the lookup times compare with this table?

We can add a second index to the original table, on just the password.

create index pwindex on ipassword(password); -- took
show index from ipassword; -- Mysql; in postgres use \d ipassword
drop index pwindex on ipassword;

Index creation takes about 1 minute 50 seconds in MySQL, and 3 min 40 seconds in Postgres. I did this to a copy of table password, named password2 in MySQL and ipassword in Postgres, so we could have both forms of the table simultaneously.

What effect does the presence of pwindex have on the lookups?

Now let's count usernames and passwords matching a string condition:

select count(username) from password where username like 'david%';
select count(username) from password where username like '%david%';

select
          count(*) from password where password like 'david%';

select count(*) from password where password like '%david%';

The first search here is still a prefix search using the primary-key index; the other searches are not. Why?

MySQL automatically uses an index on username for the first (username like 'david%') query. Postgres does not, by default, and so this query takes the same ~10 seconds as the others, and is based on sequential search.

But we can configure the index so that it does use text prefixes, with the varchar_pattern_ops or text_pattern_ops options. I added an index "username_prefix" using the text_pattern_ops option to table password, speeding up this username like 'david%' query from 10 seconds to 164 ms. Here's the create table code:

create table ipassword (
    username varchar(80) not null,
    password varchar(80) not null,
    primary key(username,password)
);

create index password_index on ipassword (password);

create index username_prefix on ipassword (username, password varchar_pattern_ops);

This last index took 61 seconds to create. Back to the index-as-phonebook analogy, if you know lastname='Smith', you turn to the Smith pages and start searching for firstname. The same works, though, if you know lastname starts with 'Smi'. The reason Postgres does not enable varchar_pattern_ops as the default, by the way, is that it requires using the locale character collation order, and this is often not set.

How fast are these queries using table password versus table password2/ipassword?

Counting distinct entries

Next we count the number of distinct usernames and unique passwords, without the secondary index:

select count(distinct username) from password;
8106113
1 row in set (21.57 sec) -- Postgres took 140 sec

select count(distinct password) from password;
5054219
1 row in set (26.30 sec) -- Postgres took 186 sec

Both these queries require a sort to eliminate duplicates. There is not much penalty, however, for using the non-indexed value. An index doesn't help with sorting.

Here we count usernames occuring >= 100 times:

select username, count(password) from password group by username having count(password) >=100;
1205 rows in set (0.00 sec)

What about a similar count for passwords?

select password, count(username) from password group by password having count(username) >= 100;
5062 rows in set (4 min 2.23 sec)

If we change the comparison to >=1000, we get 188 rows in the set (4 min 4.92 sec).

What happens to the time here if we re-create the password index?

Counting items that occur only once

How can we count usernames that occur only a single time? One way is to use HAVING, along with an inner query. Selecting from an inner query requires that we give the inner query a table-alias name, TEMP.

select
          count(*) from 

        (select username, count(password)
          from password group by username having count(password) = 1) as TEMP;

7465978
MySQL: 18-25 seconds Postgres: 142 seconds

But the second time we run this we get: 1 row in set (0.00 sec). Why the speedup? Because MySQL remembers table TEMP (or maybe it remembers the query itself). Change it to TEMP2 and the time is back up to 18 seconds. I am not sure how long MySQL remembers TEMP.

Another way to do this query is with NOT EXISTS. This is slower in MySQL, but faster in Postgres:

select count(*) from password p where not exists (select * from password p2 where p.username = p2.username and p.password <> p2.password);
7465978
MySQL: 1 min 1.31 sec Postgres: 119 sec

Here is a slightly different formulation, again using a subquery but also NOT EXISTS:

select count(*) from (select * from password p where not exists (select * from password p2 where p.username = p2.username and p.password <> p2.password)) as T;
7465978
MySQL: 1 min 17.96 sec

To an extent, the slowness is fundamental: for each p, we have to search the table for records p2 that match.

Finally, we can use a LEFT JOIN. The idea is to join the password table to itself on the condition that usernames match but passwords do not, and then look only at records where the join failed. Note the double join condition for a single join.

select count(*) from password p left join password p2 on (p.username = p2.username and p.password <> p2.password) where p2.username is null;
7465978
MySQL: 48.81 sec Postgres: 129 sec

With the index on password2.password, the HAVING query takes about the same amount of time. The NOT EXISTS query takes much longer, 10 min 37.57 sec! The LEFT JOIN query also takes much longer: 9 min 23 sec. What happened? I'm still working on this, but it appears that MySQL was misled by the second index, and attempted to use it for individual lookups when that strategy was ultimately slower.

Now let's try these to count unique passwords rather than usernames. Does the lack of an index slow us down?

select count(*) from
(select password, count(username) from password group by password having count(username) = 1) as TEMP;

select count(*) from password p where not exists (select * from password p2 where p.password = p2.password and p.username <> p2.username);

select count(*) from password p left join password p2 on (p.password = p2.password and p.username <> p2.username) where p2.password is null;

If we run the first query here on table password, it takes MySQL 6 min 59 sec, returning a count of 4300984. If we run it on table password2, with index pwindex, it takes 30 sec. This is a big speedup with the index!

Here's a more direct variant on counting usernames using the same password:

select password, count(username) from password group by password having count(username) >= 2000;

MySQL and Postgres each took between 4 and 4.5 minutes.

One caveat: the tables are big enough that accessing a second table has a material effect on the buffer pool, which in turn can have a big effect on timing. But, over multiple runs, I seldom saw variations in excess of 10%.

Here are some older Postgres timing values from my old (iron-disk) laptop

select * from password where password = 'uxoodo';

corydeut | uxoodo
Time: 12910.541 ms

Not so impressive. Again:

select * from password where password = 'Lambrech';
4104.560 ms

Better! Again:

select * from password where password = 'ncgator';
Time: 2629.510 ms

How about uxoodo again?

Time: 2758.630 ms

The first time took ~13 seconds. Then 4, and then (pretty stably, even if the queries get repeated a while) ~2.7 seconds.

What is going on? Why is this speeding up? This is probably related to a rather large disk cache.

Bigger Company

The Company has had a hiring spree. I created data for 300 more employees, and a total of 20 departments and 40 projects. Each new employee works on four projects. This data is in my database bigcompany, available at pld.cs.luc.edu/database/bigcompany. (Only tables employee, department, project and works_on are provided; you can use the original versions of the other files, or leave them out.)

The new employee names all came courtesy of listofrandomnames.com (who would have thought?). The new hires don't have addresses yet, though I did give them random salaries.

Most of the employee data was created using C# programs that generated appropriate insert/update statements using random values. The new employees have ssns from 111220001 to 111220300.

Select count(*) from employee;

returns 308

select e.ssn, e.fname, e.lname, e.salary, e.dno from employee e order by e.lname limit 50;

The limit N SQL clause is useful at the end of any query to view only some of the data.

A few notes on transferring data from MySQL to Postgres:

(Generally, export and import must be run as "root" or "superuser")

To export: select * from employee into outfile '/home/pld/bigemployee.csv';

To import into postgres, run the following. To use the copy command, you have to be the postgres user.

\connect bigcompany

    copy employee from   '/home/pld/305/bigcompany/bigemployee.csv'   delimiter E'\t' csv null '\N';
    copy department from '/home/pld/305/bigcompany/bigdepartment.csv' delimiter E'\t' csv null '\N';
    copy project from    '/home/pld/305/bigcompany/bigproject.csv'    delimiter E'\t' csv null '\N';
    copy works_on from '/home/pld/305/bigcompany/bigworks_on.csv' delimiter E'\t' csv null '\N';

A performance experiment

Here is table foo:
describe foo;

+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int(11)      | YES  |     | NULL    |       |
| name  | varchar(100) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+

select * from foo;

+------------+-------+
| id         | name  |
+------------+-------+
|          1 | peter |
|          3 | ralph |
| 2147483647 | fred  |
+------------+-------+

I then ran 1000 update queries from the command line, using a shell script, and measured the time. These measurements were done with MySQL 5.5, in 2013.

time while [ $n -lt 1000 ]
do
n=$(expr $n + 1)
mysql --password="ZZZ" -D plddb -e "update foo set id=id+1 where name='peter'"
done

real    0m43.659s
user    0m3.280s
sys    0m2.640s

That's 23 updates per second.

If I do this just with a query:

time while [ $n -lt 1000 ]
do
n=$(expr $n + 1)
mysql --password="blue" -D plddb -e "select name from foo where id=3"
done > /dev/null

Then the time is about 9.35 seconds (11.9 seconds if output is not sent to /dev/null). That's about 1 query every 10 ms.

I really should run the queries from, say, a PHP program. Still, this illustrates a pretty dramatic difference between reading and writing.

Explain in MySQL

Most SQL implementations provide an explain command to report how a database organizes a query's execution. This section has examples from MySQL; the following section has examples from Postgres. Official documentation for MySQL explain is at https://dev.mysql.com/doc/refman/5.6/en/explain-output.html.

Here is an example. Suppose the query is the following triple join:

select e.lname, p.pname, w.hours
from employee e, project p, works_on w
where e.ssn = w.essn and w.pno = p.pnumber

If we run this on the bigcompany data using MySQL 5.7:

explain select e.lname, p.pname, w.hours from employee e, project p, works_on w where e.ssn = w.essn and w.pno = p.pnumber;

we get a table:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	simple	e	ALL	primary	NULL	NULL	NULL	308	NULL
1	simple	w	ref	primary,pno	primary	9	e.ssn	1	NULL
1	simple	p	eq_ref	primary	primary	4	w.pno	1	NULL

The rows here refer to the individual tables of the join, identified by their table aliases in the table column. Other columns are as follows:

select_type: simple means a simple join, with no nested subquery.
type:

index means the index was scanned in full. This is slightly faster than scanning the table row-by-row.
ref means that all matching rows (in a join) were retrieved. An index is used, but it is not a key index. Above, given a project record, all matching works_on rows with that project.pnumber are retrieved using the works_on.pno index.
eq_ref means that MySQL knew there would be only one matching row (eg the lookup of w.essn in table employee). This is only used if there is a suitable key index available. MySQL looks up the matching row using the index. Above, given a works_on record w, MySQL looks up w.essn in employee using the index on employee.ssn.
all: all records were examined
const: MySQL knows there is a single match

possible_keys: list of keys that might have been used
key: the actual key that was used
key_len: the length of the key, in bytes. The department.dname key is of type varchar(15); the extra two bytes are presumably a length field.
ref: what value was looked up in this table (for example, p.pnumber was looked up in the works_on table using the pno index)
rows:An estimate of the number of rows to be examined. When looking up w.essn in table employee, the 1 here is exact, as ssn is the primary key of employee. When looking up p.pnumber in works_on, the 1 is a gross underestimate. It is not exactly clear why this is happening; the fact that the type column is ref rather than eq_ref shows that MySQL knows one p.pnumber value may lead to multiple works_on matches.
extra: Additional information as appropriate. The "using index" here means that each project name is looked up via the index, rather than simply by traversing the project table row-by-row.

What MySQL did here was to

First retrieve every employee (the ALL in the type column), using the employee.ssn (primary-key) index.
For each e.ssn value (column ref), retrieve matching works_on records using the works_on.essn index (the PRIMARY index). The ref in the type column means that this is a one-to-many lookup (as only one of the two primary-key fields are used). The 1 in the rows column is misleading.
For each record so far, look up the pno (column ref) in the project table. The eq_ref in the type column shows that MySQL knows these matches must be unique.

Here is a closely related query:

select e.lname from employee e, works_on w, project p
where e.ssn = w.essn and w.pno = p.pnumber and p.pname = 'Computerization';

Here is the output of explain:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	simple	p	const	primary,pname	pname	17	const	1	using index
1	simple	w	ref	primary,pno	pno	4	const	31	using index
1	simple	e	eq_ref	primary	primary	9	w.essn	1	NULL

This is a similar join as the previous query, joining employee, works_on and project the same way, but is done in a different order.

MySQL first looks up "Computerization" in the project table, obtaining pnumber = 4. MySQL knows only 1 record will be retrieved. The "using index" in the extra column is supposed to indicate this.

In the second row all works_on records matching this pnumber are retrieved using the key on pno (key column). Here, however, MySQL gets a larger number for its estimate of the number of rows in works_on returned: 31 rather than 1 as in the previous example.

Finally, for each w.essn (column ref), MySQL looks up the matching employee record using the primary key on employee.

Here is another example, including a join and an additional where-clause:

select e.lname, d.dname from employee e, department d where e.dno = d.dnumber and e.salary >=40000;

Here is the result of explain:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	simple	d	index	primary	dname	17	null	3	using index
1	simple	e	ref	FK_employee_dept	FK_emp_dept	4	d.dnumber	1	using where

Here MySQL first retrieves each department record, using the primary-key index (column type). Note the key length of 17 for department.dname, which is in fact of type varchar(15).

For each department, MySQL then looks up the employees in that department, using the index on employee.dno. This is the index created by the foreign-key constraint employee.dno references department.dnumber. Employees with salary less than $40,000 are removed at this point.

Note that there is no index above on e.salary. What happens if the query uses a field on which we do have an index?

select d.dname from employee e, department d where e.dno = d.dnumber and e.ssn='111220266';

The explain output is

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	simple	e	const	primary,dno	primary	9	const	1	NULL
1	simple	d	const	primary	primary	4	const	1	NULL

Here MySQL first retrieves the single relevant employee record, using the primary-key index (column key).

The next step is to look up the matching department, using the key on department.dnumber (taken from e.dno).

The type const means that MySQL knows there is a single matching row in each table, and it looks up that row using the index.

Here is a simple where, on an indexed field:

select e.lname from employee e where e.ssn between 111220100 and 111220150;

MySQL reports the following:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	simple	e	all	primary	null	null	null	308	using where

This answer suggests that the BTREE index on e.ssn is not used; the type=all means that row-by-row traversal is done. This could be done using the index, but the key column is "null". Even when we are searching for a single ssn

select lname from employee where ssn=111220266;

the explain output suggests that MySQL uses type=all search, that is, without indexing. If this is true, it likely reflects a simple lack of optimization.

Generally, the primary use of explain is to find when indexes are needed (or, in some cases, when indexes are not needed). If a query either is run very often or takes a long time when it is run, we can use explain to try to figure out how it can be improved. If the output shows that the query is doing an inefficient join, where there is not an index on either of the joined attributes, we can try adding an index to see if that helps.

Sometimes, however, the result of the explain analysis is to decide to use straight_join (forcing a join in a particular order) or else force index (forcing use of an index). Here is an example of straight_join introduced into the first query above:

select e.lname, p.pname, w.hours
from employee e straight_join works_on w on e.ssn = w.essn, project p
where w.pno = p.pnumber;

The result of explain is now

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	simple	e	all	primary	null	null	null	308	NULL
1	simple	w	ref	primary,pno	primary	9	e.ssn	1	NULL
1	simple	p	eq_ref	primary	primary	4	w.pno	1	NULL

This is identical with the version above. Now let us try to use straight_join to force a different query ordering:

select e.lname, p.pname, w.hours
from project p straight_join works_on w on p.pnumber = w.pno, employee e
where e.ssn = w.essn;

We now get

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	simple	p	index	primary	pname	17	null	40	using index
1	simple	w	ref	primary,pno	pno	4	p.pnumber	15	NULL
1	simple	e	eq_ref	primary	primary	9	w.essn	1	NULL

Sure enough, the steps are now

MySQL looks up pnames from project
MySQL joins these to works_on records, matching p.pnumber to w.pno, using the works_on key on pno
MySQL joins this to employee by looking up w.essn as the employee.ssn value.

Explain in Postgres

The explain command works the same in postgres, but generates very different output. We will start with simpler examples, mostly using the bigcompany database. For further information, see www.postgresql.org/docs/9.5/static/using-explain.html and www.postgresql.org/docs/9.5/static/planner-optimizer.html.

There is an interesting viewer at http://tatiyants.com/pev/#/plans/new.

Let's start with this query:

explain select lname from employee where ssn='111220266';

Here is the output of explain:

QUERY PLAN
--------------------------------------------------------
Seq Scan on employee (cost=0.00..7.85 rows=1 width=7)
Filter: (ssn = '111220266'::bpchar)

In this example, Postgres scans employee sequentially; the index — somewhat surpisingly — is not used. The parenthesized numbers are:

0.00: estimated startup cost, in relatively arbitrary cost units that are supposed to relate to page accesses
7.85: estimated total cost
1: rows generated. We are looking up one employee, by ssn
7: average size, in bytes, of the records returned

What is going on here?

We can also run explain analyze, which actually runs the query and collects data:

explain analyze select lname from employee where ssn='111220266';

We get a little more information:

Seq Scan on employee (cost=0.00..7.85 rows=1 width=7) (actual time=0.083..0.093 rows=1 loops=1)
Filter: (ssn = '111220266'::bpchar)
Rows Removed by Filter: 307
Planning time: 0.056 ms
Execution time: 0.110 ms

Postgres is reluctant to use the index when the number of blocks in the file is small. For the bigcompany database, it's still only three blocks. Let's use the trick of sql3.html to create a table of 10000 (id, val) entries, randos10000.

insert into randos10000 select generate_sequence(1,10000) as id, 1000000*random();

How many blocks in this file? Use ctid to tell!

Now try

explain select * from randos10000 where id=7654;

Postgres should use an index scan.

Let's try the dell database:

explain select lastname from customers where customerid = 11807;
explain select * from orders where orderid=6666;

We can try the same using the passwords database:

explain select password from password where username = 'corydeut';

Here Postgres uses the index:

 Index Only Scan using password_pkey on password  (cost=0.56..4.61 rows=3 width=8)
   Index Cond: (username = 'corydeut'::text)

Here is another example:

explain select lname from employee where salary > 40000;

QUERY PLAN
-----------------------------------------------------------
Seq Scan on employee (cost=0.00..7.85 rows=2 width=7)
Filter: (salary > 40000::numeric)

Here a sequential scan is pretty much the only choice; there is no useful index. The rows=2 is the actual count of employees in this salary category. If we lower the salary, the number of rows climbs as more employees are included in the cutoff. If we raise the salary, fewer employees are included. Either way, the query plan does not change.

Next is an example involving a join. We run it on the "bigcompany" database:

explain select e.lname, d.dname from employee e join department d on e.dno = d.dnumber where e.salary >=40000;

                                 QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=0.15..24.20 rows=2 width=55)
   -> Seq Scan on employee e (cost=0.00..7.85 rows=2 width=11)
         Filter: (salary >= '40000'::numeric)
   -> Index Scan using department_pkey on department d (cost=0.15..8.17 rows=1 width=52)
         Index Cond: (dnumber = e.dno)

But now let's try it on a smaller salary!

explain select e.lname, d.dname from employee e join department d on e.dno = d.dnumber where e.salary >=30000;

Hash Join (cost=24.62..34.61 rows=155 width=55)
   Hash Cond: (e.dno = d.dnumber)
   -> Seq Scan on employee e (cost=0.00..7.85 rows=155 width=11)
         Filter: (salary >= '30000'::numeric)
   -> Hash (cost=16.50..16.50 rows=650 width=52)
         -> Seq Scan on department d (cost=0.00..16.50 rows=650 width=52)

With 155 rows, a hash join is faster than an index scan. Postgres knows how selective e.salary >= 30000 actually is.

The first join here is implemented via the nested-loop join algorithm. There is a sequential scan on employee, implementing the selection operation "e.salary >=40000", followed by an index scan on department. Note that this e.salary >= 40000 criterion is quite selective: there are only two employees. Each employee record that survives this scan then has its department looked up using the Employee e.dno value. In the second example, with the salary cutoff lowered to 30,000, the where clause is much less selective.

Now let's do a join with a very selective where clause:

explain select d.dname from employee e, department d where e.dno = d.dnumber and e.ssn='111220266';

                                   QUERY PLAN
---------------------------------------------------------------------------------
Nested Loop (cost=0.00..16.13 rows=1 width=48)
   -> Seq Scan on employee e (cost=0.00..7.85 rows=1 width=4)
         Filter: (ssn = '111220266'::bpchar)
   -> Index Scan using department_pkey on department d (cost=0.00..8.27 rows=1 width=52)

This is another nested-loop join, but this time the added where condition -- a specific value for e.ssn -- is even more selective. We use the employee index on ssn to find the employee, and then look up the department in the department table. It is a very "small" join.

explain select e.lname from employee e where e.ssn between '111220100' and '111220150';

QUERY PLAN
---------------------------------------------------------------------------------
Seq Scan on employee e (cost=0.00..8.62 rows=49 width=7)
Filter: ((ssn >= '111220100'::bpchar) AND (ssn <= '111220150'::bpchar))

This is another sequential scan.

Hash Joins

Here's a simple join on employees and departments:

select e.fname, e.lname, d.dname from employee e join department d on e.dno = d.dnumber;

Here's the explain output (using database bigcompany):

Hash Join (cost=24.62..35.94 rows=308 width=61)
   Hash Cond: (e.dno = d.dnumber)
   -> Seq Scan on employee e (cost=0.00..7.08 rows=308 width=17)
   -> Hash (cost=16.50..16.50 rows=650 width=52)
         -> Seq Scan on department d (cost=0.00..16.50 rows=650 width=52)

This means that the department table is scanned and divided into hash buckets (almost certainly in memory), and then the employee table is scanned. For each employee e, a hash is performed on e.dno and the appropriate department bucket is checked. It makes sense to have the smaller table be the one that gets "bucketed", as above.

Here's another example from the examscores database, below. Table tests has 3 records; table grades has 24000 records:

select * from grades g join tests t on g.test_id = t.test_id;

The output shows that table tests (the smaller table) is bucketed, and then grades (the (much) larger table) is scanned.

Hash Join (cost=19.00..719.00 rows=24000 width=186)
   Hash Cond: (g.test_id = t.test_id)
   -> Seq Scan on grades g (cost=0.00..370.00 rows=24000 width=14)
   -> Hash (cost=14.00..14.00 rows=400 width=172)
         -> Seq Scan on tests t (cost=0.00..14.00 rows=400 width=172)

What if we reverse the join order?

select * from tests t join grades g on t.test_id = g.test_id;

We get the same hash-join order: the smaller table, tests, is the one bucketed.

Sometimes the order calculated by Postgres is mysterious:

select e.fname, e.lname, d.dependent_name from employee e join dependent d on e.ssn = d.essn;

Hash Join (cost=10.93..32.80 rows=500 width=61)
   Hash Cond: (d.essn = e.ssn)
   -> Seq Scan on dependent d (cost=0.00..15.00 rows=500 width=88)
   -> Hash (cost=7.08..7.08 rows=308 width=23)
         -> Seq Scan on employee e (cost=0.00..7.08 rows=308 width=23)

The dependent table is much smaller than the employee table, and yet the latter is the one bucketed. This probably has something to do with poor table statistics. When we run explain analyze, we get

Hash Join (cost=10.93..32.80 rows=500 width=61) (actual time=0.205..0.212 rows=7 loops=1)
   Hash Cond: (d.essn = e.ssn)
   -> Seq Scan on dependent d (cost=0.00..15.00 rows=500 width=88) (actual time=0.015..0.016 rows=7 loops=1)
   -> Hash (cost=7.08..7.08 rows=308 width=23) (actual time=0.177..0.177 rows=308 loops=1)
         Buckets: 1024 Batches: 1 Memory Usage: 25kB
         -> Seq Scan on employee e (cost=0.00..7.08 rows=308 width=23) (actual time=0.006..0.078 rows=308 loops=1)
Planning time: 0.163 ms
Execution time: 0.242 ms

Note the marked discrepancy between dependent's estimated rows=500 and actual rows=7.

Now let's introduce a three-table join.

explain select e.lname, p.pname,
        w.hours from employee e, project p, works_on w where e.ssn = w.essn and
        w.pno = p.pnumber;

explain select e.lname, p.pname, w.hours from employee
          e join works_on w on e.ssn = w.essn join project p on w.pno =
        p.pnumber;

Both these lead to:

                                    QUERY PLAN
-----------------------------------------------------------------------------------
Hash Join (cost=34.43..88.03 rows=1216 width=60)
   Hash Cond: (w.pno = p.pnumber)
   -> Hash Join (cost=10.93..47.81 rows=1216 width=16)
         Hash Cond: (w.essn = e.ssn)
         -> Seq Scan on big_works_on w (cost=0.00..20.16 rows=1216 width=19)
         -> Hash (cost=7.08..7.08 rows=308 width=17)
               -> Seq Scan on big_employee e (cost=0.00..7.08 rows=308 width=17)
   -> Hash (cost=16.00..16.00 rows=600 width=52)
         -> Seq Scan on big_project p (cost=0.00..16.00 rows=600 width=52)

Postgres starts out with a hash join on w and e (the bold part), followed by a hash join on that and p (at the top). MySQL used index joins, and also did the e--w join first.

If we reverse the join order, though,

explain select e.lname,
        p.pname, w.hours from project p join works_on w on

w.pno
          = p.pnumber join employee e on e.ssn = w.essn;

We get a different query plan:

Hash Join (cost=34.43..88.03 rows=1216 width=60)
   Hash Cond: (w.essn = e.ssn)
   -> Hash Join (cost=23.50..60.38 rows=1216 width=63)
         Hash Cond: (w.pno = p.pnumber)
         -> Seq Scan on works_on w (cost=0.00..20.16 rows=1216 width=19)
         -> Hash (cost=16.00..16.00 rows=600 width=52)
               -> Seq Scan on project p (cost=0.00..16.00 rows=600 width=52)
   -> Hash (cost=7.08..7.08 rows=308 width=17)
         -> Seq Scan on employee e (cost=0.00..7.08 rows=308 width=17)

Now the tables w and p are joined first, and then the employee e. (4.json and 4a.json demos)

What if we use a different set of three tables? Say

explain select * from employee e join department d on e.dno = d.dnumber join project p on d.dnumber = p.dnum;

Sure enough, we again get a hash join:

Hash Join (cost=56.15..67.28 rows=924 width=273)
   Hash Cond: (e.dno = d.dnumber)
   -> Seq Scan on employee e (cost=0.00..7.08 rows=308 width=73)
   -> Hash (cost=48.65..48.65 rows=600 width=200)
         -> Hash Join (cost=24.40..48.65 rows=600 width=200)
               Hash Cond: (p.dnum = d.dnumber)
               -> Seq Scan on project p (cost=0.00..16.00 rows=600 width=104)
               -> Hash (cost=16.40..16.40 rows=640 width=96)
                     -> Seq Scan on department d (cost=0.00..16.40 rows=640 wi

Now let's add a where clause to our earlier triple join:

explain select * from employee e join works_on w on e.ssn = w.essn join project p on w.pno = p.pnumber where p.pname='Computerization';

We get a hash join inside a nested loop:

Nested Loop (cost=8.28..33.63 rows=2 width=196)
   -> Hash Join (cost=8.28..33.02 rows=2 width=123)
         Hash Cond: (w.pno = p.pnumber)
         -> Seq Scan on works_on w (cost=0.00..20.16 rows=1216 width=19)
         -> Hash (cost=8.27..8.27 rows=1 width=104)
               -> Index Scan using project_pname_key on project p (cost=0.00..8.27 rows=1 width=104)
                     Index Cond: ((pname)::text = 'Computerization'::text)
   -> Index Scan using employee_pkey on employee e (cost=0.00..0.29 rows=1 width=73)
         Index Cond: (ssn = w.essn)

Note that we look up p.pname='Computerization' using the (non-primary) key on pname.

Now let's try this query with two where clauses:

explain
          select * from employee e join works_on w on e.ssn = w.essn 

          join project p on


          w.pno = p.pnumber 

          where p.pname = 'Computerization' and e.salary > 400

00;

                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
Nested Loop (cost=4.28..32.15 rows=1 width=196)
   Join Filter: (w.pno = p.pnumber)
   -> Nested Loop (cost=4.28..29.57 rows=8 width=92)
         -> Seq Scan on employee e (cost=0.00..7.85 rows=2 width=73)
               Filter: (salary > 40000::numeric)
         -> Bitmap Heap Scan on works_on w (cost=4.28..10.81 rows=4 width=19)
               Recheck Cond: (essn = e.ssn)
               -> Bitmap Index Scan on works_on_pkey (cost=0.00..4.28 rows=4 width=0)
                     Index Cond: (essn = e.ssn)
   -> Index Scan using project_pname_key on project p (cost=0.00..0.31 rows=1 width=104)
         Index Cond: ((pname)::text = 'Computerization'::text)

Here we see a bitmap heap scan. This involves an index scan on works_on_pkey, followed by a bitmap sort in order to organize record retrievals by block number. We scan through the index on works_on, finding w.essn values and corresponding block pointers. For each block encountered, we mark that in the bitmap, which is an array of one bit for each block. Some blocks will be marked multiple times; some blocks might not (in general) get marked at all. After that comes the block-reading pass, in which only marked blocks are read. This ensures that, when we look up these blocks in table Employee, we access records in the order they are grouped into pages; that is, we access records a page at a time.

But what if we reduce the salary threshold to 4000?

explain
          select * from employee e join works_on w on e.ssn = w.essn 

          join project p on


          w.pno = p.pnumber 

          where p.pname = 'Computerization' and e.salary > 40

00;

The inner join changes from nested-loop to hash, because the e.salary criterion is now much less selective, and Postgres knows this. So it joins works_on to project first, and then joins to employee knowing that most employee records will be included.

Nested Loop (cost=8.28..33.64 rows=2 width=196)
   -> Hash Join (cost=8.28..33.02 rows=2 width=123)
         Hash Cond: (w.pno = p.pnumber)
         -> Seq Scan on works_on w (cost=0.00..20.16 rows=1216 width=19)
         -> Hash (cost=8.27..8.27 rows=1 width=104)
               -> Index Scan using project_pname_key on project p (cost=0.00..8.27 rows=1 width=104)
                     Index Cond: ((pname)::text = 'Computerization'::text)
   -> Index Scan using employee_pkey on employee e (cost=0.00..0.30 rows=1 width=73)
         Index Cond: (ssn = w.essn)
         Filter: (salary > 4000::numeric)

(5a.json and 5b.json)

More fun with statistics

Let's create two tables:

create table rand (
id int primary key,
rval int
);

insert into rand select generate_series(1,100000) as id, 100000000*random();

create table tbl (
   id integer primary key,
   thou integer,
   hun integer,
   ten1 integer,
   ten2 integer,
   ten3 integer,
   ten4 integer
);

insert into tbl select id from tbl2 order by id;
update tbl set thou = id%1000, hun=id%100,ten1=id%10;
update tbl set ten2=10-ten1,ten3=ten1+1,ten4=ten1+10;

Note that the ten1 through ten4 columns are not statistically independent. Knowing one value, you can easily figure out the others. In particular, if ten1 = 1, then ten2=9, ten3=2, and ten4=11.

We will now join these tables on r.id = t.id. If we do this without a where clause, we get a hash join.

Question 1: how selective does a where-clause have to be to get Postgres to switch from hash join to a nested-loop join?

explain select * from rand r join tbl6 t on r.id = t.id where t.ten1=2;

                               QUERY PLAN
--------------------------------------------------------------------------
Hash Join (cost=2677.46..5679.23 rows=10277 width=40)
   Hash Cond: (r.id = t.id)
   -> Seq Scan on rand r (cost=0.00..2524.00 rows=100000 width=12)
   -> Hash (cost=2549.00..2549.00 rows=10277 width=28)
         -> Seq Scan on tbl t (cost=0.00..2549.00 rows=10277 width=28)
               Filter: (ten1 = 2)

This is the least selective; a hash join is still used.

explain select * from rand r join tbl t on r.id = t.id where hun=1;

                               QUERY PLAN
--------------------------------------------------------------------------
Hash Join (cost=2561.66..5470.79 rows=1013 width=40)
   Hash Cond: (r.id = t.id)
   -> Seq Scan on rand r (cost=0.00..2524.00 rows=100000 width=12)
   -> Hash (cost=2549.00..2549.00 rows=1013 width=28)
         -> Seq Scan on tbl t (cost=0.00..2549.00 rows=1013 width=28)
               Filter: (hun = 1)

This is still a hash join! Note that the rows=1013 estimate on the first line is pretty accurate, but not perfect.

explain select * from rand r join tbl t on r.id=t.id where thou=1;

                                  QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (cost=0.42..3360.61 rows=98 width=40)
   -> Seq Scan on tbl t (cost=0.00..2549.00 rows=98 width=28)
         Filter: (thou = 1)
   -> Index Scan using tbl2_pkey on rand r (cost=0.42..8.27 rows=1 width=12)
         Index Cond: (id = t.id)

Finally we've forced Postgres to use a nested loop join!

Question 2: how good is Postgres at estimating the selectivity of where clauses?

Let's try three in sequence:

explain select * from tbl where ten1 = 1;

QUERY PLAN
-----------------------------------------------------------
Seq Scan on tbl (cost=0.00..2549.00 rows=10093 width=28)
Filter: (ten1 = 1)

Look at the rows=10093 estimate. The actual value is 10,000; Postgres' estimate is pretty good.

explain select * from tbl where ten1 = 1 and ten2 = 9;

Seq Scan on tbl (cost=0.00..2799.00 rows=1019 width=28)

The actual answer is again 10,000 rows. But Postgres doesn't know that. It has assumed the two where-conditions are independent, and so only 1/100 of the rows will be included.

explain select * from tbl where ten1 = 1 and ten2 = 9 and ten3 = 2;

Seq Scan on tbl (cost=0.00..3049.00 rows=103 width=28)

Now Postgres assumes that with three where-conditions each 10% selective, the end result is 0.1% selective. It is not; it's still 10%.

Question 3: can we use this trick to make Postgres use the wrong join?

We know from Question 1 that a where-condition with a selectivity of 1/1000 will force a nested-loop join. Let's try it:

explain select * from rand r join tbl t on r.id=t.id where ten1 = 1;

                               QUERY PLAN
-------------------------------------------------------------------------
Hash Join (cost=2675.16..4594.09 rows=10093 width=36)
   Hash Cond: (r.id = t.id)
   -> Seq Scan on rand r (cost=0.00..1443.00 rows=100000 width=8)
   -> Hash (cost=2549.00..2549.00 rows=10093 width=28)
         -> Seq Scan on tbl t (cost=0.00..2549.00 rows=10093 width=28)
               Filter: (ten1 = 1)

This is expected.

explain select * from rand r join tbl t on r.id=t.id where ten1 = 1 and ten2 = 9 and ten3 = 2;

                                  QUERY PLAN
------------------------------------------------------------------------------
Nested Loop (cost=0.29..3801.96 rows=103 width=36)
   -> Seq Scan on tbl t (cost=0.00..3049.00 rows=103 width=28)
         Filter: ((ten1 = 1) AND (ten2 = 9) AND (ten3 = 2))
   -> Index Scan using tbl3_pkey on rand r (cost=0.29..7.30 rows=1 width=8)
         Index Cond: (id = t.id)

This is not the choice Postgres should be making! It is the "wrong" plan; Postgres has been fooled by the fact that ten1, ten2 and ten3 are not independent.

Question 4: is there a fix?

create statistics s1 (dependencies) on col1, col2, col3 from table tbl2;

But this does not work in Postgres 9.5, which is what I'm using.

We will see where Postgres keeps its statistical knowledge in sql3.html#pg_stats.

Semi-joins and Anti-joins

A semi-join is a join where we only request attributes from one of the two tables:

select distinct e.fname, e.lname from employee e join department d on e.ssn = d.mgr_ssn;

Note that we don't ask anything about the department, other than to verify that the employee is a department manager. (Why do we ask for select distinct?) This can also be written as

select e.fname, e.lname from employee e where EXISTS (select * from department d where e.ssn = d.mgr_ssn);

Here's a semi-join query about dependents:

Select e.fname, e.lname, d.dependent_name from employee e join dependent d on e.ssn = d.essn
where e.fname = d.dependent_name and e.sex = d.sex;

The explain output (for the bigcompany database) is

Hash Join (cost=12.47..33.11 rows=1 width=61)
   Hash Cond: ((d.essn = e.ssn) AND ((d.dependent_name)::text = (e.fname)::text) AND (d.sex = e.sex))
   -> Seq Scan on dependent d (cost=0.00..15.00 rows=500 width=96)
   -> Hash (cost=7.08..7.08 rows=308 width=25)
         -> Seq Scan on employee e (cost=0.00..7.08 rows=308 width=25)

Now let's try a semi-join: we get the employee records of those employees that have a dependent, but we don't generate any dependent attributes in the output:

select e.fname, e.lname from employee e
where EXISTS (select * from dependent d where e.ssn = d.essn);

The explain output is:

Hash Join (cost=17.81..20.81 rows=12 width=96)
   Hash Cond: (d.essn = e.ssn)
   -> HashAggregate (cost=16.25..18.25 rows=200 width=40)
         Group Key: d.essn
         -> Seq Scan on dependent d (cost=0.00..15.00 rows=500 width=40)
   -> Hash (cost=1.25..1.25 rows=25 width=136)
         -> Seq Scan on employee e (cost=0.00..1.25 rows=25 width=136)

Postgres understands that this EXISTS query is a join (a semi-join). Postgres does not attempt to execute it as a double nested loop; that is, postgres does not check, for each employee e, the dependent table d to see if a match can be found. Instead, Postgres forms the join of the two tables, and uses that join to answer the query.

Here's an example with two EXISTS:

select e.fname, e.lname from employee e
where EXISTS (select * from dependent d where e.ssn = d.essn)
and EXISTS (select * from department d where d.mgr_ssn = e.ssn);

The Postgres explain output identifies it as a semijoin:

Nested Loop Semi Join (cost=19.84..27.04 rows=6 width=96)
   Join Filter: (e.ssn = d.essn)
   -> Hash Join (cost=19.69..22.69 rows=12 width=176)
         Hash Cond: (d_1.mgr_ssn = e.ssn)
         -> HashAggregate (cost=18.12..20.12 rows=200 width=40)
               Group Key: d_1.mgr_ssn
               -> Seq Scan on department d_1 (cost=0.00..16.50 rows=650 width=40)
         -> Hash (cost=1.25..1.25 rows=25 width=136)
               -> Seq Scan on employee e (cost=0.00..1.25 rows=25 width=136)
   -> Index Only Scan using dependent_pkey on dependent d (cost=0.15..0.42 rows=2 width=40)
         Index Cond: (essn = d_1.mgr_ssn)

A NOT EXISTS query is sometimes called an anti-join. It is of the general form

select a.foo from TableA a left join TableB b on a.bar = b.bar where b.bar is null

This represents all records in TableA that do not have corresponding matches in TableB. While it is often convenient to write these using NOT EXISTS, it does come down to the left join.

select a.foo from TableA where NOT EXISTS (select * from TableB b where a.bar = b.bar);

Here's our employee example:

select e.fname, e.lname from employee e
where NOT EXISTS (select * from dependent d where e.ssn = d.essn);

The explain output is

Hash Anti Join (cost=21.25..22.95 rows=12 width=96)
   Hash Cond: (e.ssn = d.essn)
   -> Seq Scan on employee e (cost=0.00..1.25 rows=25 width=136)
   -> Hash (cost=15.00..15.00 rows=500 width=40)
         -> Seq Scan on dependent d (cost=0.00..15.00 rows=500 width=40)

Postgres has recognized this as an anti-join, and has a query plan specifically for this.

Password DB Explain

Now let's look at some of the queries involving the password database. The first is a simple lookup:

explain select * from password where username = 'badmole';

Not surprisingly, Postgres uses the index:

Index Only Scan using password_pkey on password (cost=0.56..4.61 rows=3 width=17)
Index Cond: (username = 'badmole'::text)

What about looking up a username given the password? The index doesn't allow a direct lookup.

explain select * from password where password = 'uxoodo';

Postgres still uses the index! (at least on my laptop; on a disk-based machine it used a Sequential Scan.) How?

Index Only Scan using password_pkey on password (cost=0.56..259442.36 rows=8 width=17)
   Index Cond: (password = 'uxoodo'::text)

Note the much higher total-cost estimate. Recall that the index is actually on all pairs (username, password). To traverse this index is to traverse the file, but perhaps much more quickly. This is like using a phone book to look up someone by first name, by searching linearly through every page. That is slow, but it might be faster than searching through the same number of file-cabinet folders.

With an index available on password, Postgres uses it:

Bitmap Heap Scan on ipassword (cost=5.92..182.82 rows=45 width=17)
   Recheck Cond: ((password)::text = 'uxoodo'::text)
   -> Bitmap Index Scan on pwindex (cost=0.00..5.91 rows=45 width=0)
         Index Cond: ((password)::text = 'uxoodo'::text)

Index Creation

Databases all support some mechanism of creating indexes, of specific types. For example, MySQL allows

CREATE INDEX indexname ON employee(ssn) USING BTREE; // or USING HASH

Indexes can be ordered or not; ordered indexes help us find employees with e.age >= 59. Ordered and nonordered indexes both support lookup; ordered indexes also in effect support return of an iterator yielding all DB records starting at the specified point. BTREE and ISAM indexes are ordered; HASH indexes are not.

MySQL Indexes

MySQL by default creates indexes for the following attributes:

The primary key

If the primary key has two attributes, eg (essn, pno), then the index can also be used as an index on (essn) alone

Any other keys (declared unique in Create Table)
Any attributes subject to foreign-key constraints, eg employee.dno and employee.super_ssn (why?)

In the first few weeks I created an index on project.plocation so that I could introduce a Foreign Key constraint requring that each project.plocation appeared in the table dept_locations:

alter table project ADD foreign key (plocation) references dept_locations(dlocation);

The necessary index was created with

create index dloc_index on dept_locations(dlocation);

For example, the table employee has three indexes, on ssn, super_ssn and dno. The table works_on has indexes on (essn,pno) (doubling as an index on (essn) alone) and on (pno). And the table project has indexes on pnumber, pname, dnum and plocation.

To view MySQL indexes, use show index from tablename.

A Very Slow MySQL Query

I discovered a wonderful example of a very slow query at http://www.slideshare.net/manikandakumar/mysql-query-and-index-tuning, due to Keith Murphy, co-author of MySQL Administrator's Bible.

The goal is to force the database (MySQL here) to use a quadratic algorithm to form a join. This is the worst-case algorithm, where for each record of the first table the database has to do a sequential scan of the second table. For MySQL, the main thing is to force it not to use an index scan. The two primary "tricks" are

To create a table with no index, just don't declare any primary key! Or, alternatively, declare a primary key that is an unrelated (perhaps synthetic) column.
To force MySQL not to use an index on the other table involved in the join, use a left join (or, alternatively, an inner exists query).

When Postgres is faced with this situation, it uses a hash join, which performs reasonably well (very well, usually!) even when no useful index is available. But MySQL doesn't support that (at least not through v7.0; I haven't checked more recently). However, a similarly slow Postgres query appears below.

The database (querydemo) consists of information about students and their scores on one of three tests. There are three tables:

students
grades
tests

Here are the tables (on server sql1.cs.luc.edu)

create table students (
	student_id integer not null primary key,
	name char(40)
);

create table tests(
	test_id integer not null primary key,
	total integer,
	name char(40)
);

create table grades (
	test_id integer,
	student_id integer,
	score decimal(5,2)
);

There are only three tests; this is the smallest table:

insert into tests values (1,100,'test 1');
insert into tests values(2,100,'test 2');
insert into tests values (3,300, 'test 3');

The students2 table has 10,000 entries generated through a unix shell script:

for i in `seq 1 10000`
do
     echo "insert into students values ($i, 'student$i');" >> insert_students.sql
done

(The students table has ~50,000 entries). The insert statements above are loaded into MySQL with

bash> mysql -p dbname < insert_students.sql
or
mysql> source insert_students.sql

The grades table is then generated within MySQL using random values (this works in Postgres too, but with random() instead of rand()). The rand() function chooses a random real between 0 and 1; multiplying it by 100 gives a real in the range 0 to 100, an then inserting it into the score column of table grades coerces it into the format decimal(5,2).

insert into grades select 1, s.student_id, rand()*100 from students s order by rand() limit 8000;
insert into grades select 2, s.student_id, rand()*100 from students s order by rand() limit 8000;
insert into grades select 3, s.student_id, rand()*300 from students s order by rand() limit 8000;

As an example of what is going on here, the query

select s.student_id from students s limit 20

selects the first 20 student_id values. Adding order by rand() (before the limit 20) selects 20 random student values; it has the same effect as

select s.student_id, rand() as randval from students s order by randval limit 20;

Adding a test_id value of 1, 2 or 3 and a score of rand()*100 fills out the grades record; the random score has no relationship to the order by rand() used to choose random records.

What all this means is that while we have 10,000 students, for each test we choose only 8,000 students at random who took that test (the "limit 8000" cuts off the insertions after 8,000, and the records being inserted are randomly ordered, so this in effect chooses 8000 students at random).

In my MySQL database I called the tables students2 and grades2, so I'll use those names from now on.

If we try a join

select count(*) from students2 s join grades2 g on s.student_id = g.student_id where g.test_id = 2;
select count(*) from students s join grades g on s.student_id = g.student_id where g.test_id = 2;

this runs quite fast. If we explain it, we see why:

*************************** 1. row ***************************
           id: 1
select_type: SIMPLE
        table: g
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 23521
        Extra: Using where
*************************** 2. row ***************************
           id: 1
select_type: SIMPLE
        table: s
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: querydemo.g.student_id
         rows: 1
        Extra: Using index

MySQL runs through the indexless table grades2 and, for each record, looks up the matching record in students2 with the latter's index.

But with a left join, we can force MySQL to run through the table students2 and, for each record, look up the matching grades2 records. This is slow! Here is Murphy's query:

select count(*) from students2 s left join grades2 g on (s.student_id = g.student_id and g.test_id = 2)
where g.student_id is null;

select count(*) from students s left join grades g on (s.student_id = g.student_id and g.test_id = 2)
where g.student_id is null;

This is something of an odd query for us. Why is there the "g.test_id=2" in the join condition? And there are no records in grades2 where student_id is null. But remember that a left join contains all students who either had a matching record in grades2 for test_id =2, or who did not; for the latter, g.student_id will be null. Thus, the above query asks for those students who did not take exam 2. It is sometimes known as an anti-join.

It runs rather slowly: 1 minute 7 seconds on my system with MySQL 5.5, ±1 second. It ran faster using MySQL 5.7, but it still took 28 seconds. With a database 5 times larger, and a slower system (Murphy presented this in 2007), he gave up after over 4200 seconds.

We can also formulate the query using exists:

select count(*) from students2 s where not exists
(select * from grades2 g where s.student_id = g.student_id and g.test_id= 2);

This takes about 39 seconds under MySQL 5.5 (and 59 seconds under MySQL 5.7!). I was surprised at the MySQL 5.5 relative numbers; traditionally the outer-join form is faster than the nested-query-with-exists form.

Here is the explain from the first (left-join) query:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: s
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 9976
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: g
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 23521
        Extra: Using where

As intended, we start with table s, using the primary key, and then search through g using the ALL method. That means that for every record of students2 (as enumerated by the index on students2) we search the entire table grades2.

Could we do better? Probably not; we can not do this query in the g-then-s order of the earlier query because we need to find all records s in students2 for which there isn't a matching g.

A database could design a specific algorithm here for left outer joins, that allowed the efficient use of an index on the lefthand table (here students2); that is which allowed an indexed join based on the framework of

for each g in grades2, use the index on students2 to find the matching s record

The idea is to note that the left join of students2 and grades2 is simply the disjoint union of

the ordinary join (which we can easily do in the above g-first order)
the set of those students2.student_id values that do not appear in grades2. (Each student_id value in this set would be augmented with null values for the grades2 columns when forming the final join).

The first set can be computed as an efficient g-first indexed join, as above. The second -- the so-called anti-join -- can be computed efficiently by going through each record of grades2, via a single pass, and then marking the matching record s in students2. The set in item 2 above is then the set of unmarked records in students2. This algorithm is similar to, and as efficient as, the g-first indexed join above.

But MySQL does not implement such a special-case anti-join algorithm, perhaps because it can complicate the writing of the query optimizer. While the strategy here might be signaled by the presence of the outer join, in the majority of cases it is still unnecessary, as normally grades2 might be expected to have a suitable index of its own. In any event, MySQL does not implement such a strategy.

Postgres, on the other hand, does implement anti-join as a query algorithm, and it is fully integrated with the query optimizer. As a result, our "very slow query" runs quite fast on Postgres: in half a millisecond.

Here is the MySQL explain for the exists query:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: s
         type: index
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 9976
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: g
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 23521
        Extra: Using where

Again we start with g, using ALL, but the subquery also involves searching through every record of g2.

Let's try the first query again. The second time, it took 0.00 seconds! This is the query cache talking. The easiest way to disable that is to add or delete a record:

insert into students2 values(10000, 'jezebel');
delete from students2 where student_id = 10000;

Adding an Index

Now let's see what happens when we add an index to grades2 on student_id (as usual, the command to drop the index, later, is shown just below):

create index grades2_student_id on grades2(student_id);
drop index grades2_student_id on grades2;

With the index, our left-join query runs in 0.04 seconds! That's almost 2,000 times faster! The explain tells us why:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: s
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 9976
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: g
         type: ref
possible_keys: grades2_student_id
          key: grades2_student_id
      key_len: 5
          ref: querydemo.s.student_id
         rows: 1
        Extra: Using where

In the "2. row" section, we've gone from type: ALL to type: ref and the use of an index.

As another query, here is how we could find the students who took no tests:

select count(*) from students2 s left join grades2 g on (s.student_id = g.student_id) where g.student_id is null;

It took 0.03 seconds with the index, and 57 seconds without.

Should we also add an index on test_id? Murphy did, but this index is not very selective: there are only three different values of test_id. If we do this,

create index grades2_test_id on grades2(test_id);
drop index grades2_test_id on grades2;

With just this index, the query runs in 52 seconds, down maybe 16 seconds from the indexless version.

Here is the explain:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: s
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 9634
        Extra: Using index
*************************** 2. row ***************************
           id: 1, down to about 2
  select_type: SIMPLE
        table: g
         type: ref
possible_keys: grades2_test_id
          key: grades2_test_id
      key_len: 5
          ref: const
         rows: 7922
        Extra: Using where

The query does indeed use the index, but it does not offer much help.

We can also add a "useless" index. Consider this table:

create table grades3 (
    test_id int(11) default null,
    student_id int(11) default null,
    score decimal(5,2) default null,
    ident int primary key
);

The key ident is there, but doesn't help look up student_id or test_id. We fill the table from grades with

insert into grades3 select (test_id, student_id, score, 100000*test_id + student_id from grades;

The idea is that, since test_id ranges from 1 to 3 and student_id ranges from 1 to 10000, the value 100000*test_id + student_id will all be unique (100001 to 110000, 200001 to 210000, etc). So now table grades3 has a primary key (we could have used sequential integers as a "synthetic" key), but it is of no use to us.

Postgres

What happens here in Postgres? Much better performance, for one thing. Murphy's query runs in 18 ms. Here's the explain output:

Aggregate (cost=794.25..794.26 rows=1 width=0)
   -> Hash Anti Join (cost=530.00..789.25 rows=2000 width=0)
         Hash Cond: (s.student_id = g.student_id)
         -> Seq Scan on students s (cost=0.00..194.00 rows=10000 width=4)
         -> Hash (cost=430.00..430.00 rows=8000 width=4)
               -> Seq Scan on grades g (cost=0.00..430.00 rows=8000 width=4)
                     Filter: (test_id = 2)

Note the Hash Anti Join. An "anti join" is the result of taking an outer join, and then considering those records where the "other" record is in fact null (here, g.student_id, meaning there is no match for s in table grades). MySQL does not handle this as a special case, but Postgres does. The not exists form of the query generates essentially exactly the same explain output; Postgres clearly recognizes the two forms as equivalent.

A slow Postgres Query

For our coverage of transactions, I created the following table (the goal was to allow later batching of updates to the parts table)

create table partupdate (
        invnum integer,
        partnum integer,
        quan_sold integer,
        constraint partupdate_invnum foreign key (invnum) references invoice(invnum),
        constraint partupdate_partnum foreign key (partnum) references part(partnum)
);

There are also definitions of the customer, invoice, invitem and part tables. The customer and part tables are populated with random data. The invoice/invitem tables are then populated by random purchases as made by the Python3 program conndemo.py.

The concern in conndemo.py is that every invoice has to update the appropriate rows of the part table, in order to update the inventory on hand. Multiple simultaneous invoices for the same part may lead to multiple updates of the same row of the part table at the same time, a potential bottleneck. The idea of the partupdate table is that each invoice can add one row to this table for each part ordered; there is never an issue with locked rows. The Python3 program conndemo3.py implements this. Setting the program's variables NUMTHREADS = 1 and PERTHREAD = 1000 simulates 1000 serialized (that is, not concurrent) transactions.

The primary key for partupdate probably should have been the first two items, (invnum,partnum), though that is not entirely clear as in some ways the partupdate table simply represents a log of all updates to do later, and in theory such logs can have duplicates. In any event, the table as written has no indexes at all.

For the examples below, I saved a snapshot of partupdate (which I have to delete records from regularly) as partupdate_copy:

insert into partupdate_copy select * from partupdate;

Anyway, let's find out the invoices having at least 12 parts:

select pu.invnum from partupdate_copy pu group by pu.invnum having count(*) >= 12;

                               QUERY PLAN                                
-------------------------------------------------------------------------
 HashAggregate  (cost=533.62..558.62 rows=2000 width=4)
   Group Key: invnum
   Filter: (count(*) >= 12)
   ->  Seq Scan on partupdate pu  (cost=0.00..434.08 rows=19908 width=4)

This took, with about 10,000 records in partupdate, about 6 ms.

But now here it is with an inner query; the inner query (correlated) counts the number of parts corrsponding to that invnum:

select distinct pu.invnum from partupdate_copy pu where (select count(*) from partupdate_copy p where p.invnum = pu.invnum) >= 12;

3.8 seconds!

                                    QUERY PLAN
-----------------------------------------------------------------------------------
   -> Seq Scan on partupdate pu (cost=0.00..9633666.43 rows=6636 width=4)
         Filter: ((SubPlan 1) >= 12)
         SubPlan 1
           -> Aggregate (cost=483.88..483.88 rows=1 width=0)
                 -> Seq Scan on partupdate p (cost=0.00..483.85 rows=10 width=0)
                       Filter: (invnum = pu.invnum)

A sequential scan within a sequential scan! That is, a nested loop, with no optimization tricks to speed it up!

But the problem does have something to do with the failure to have a proper index. If I add one:

create index partupdate_index on partupdate(invnum,partnum);

drop index partupdate_index;

The query now takes 104 ms to run, a 400x speedup! And creation of the index took less than 40 ms.

                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
   -> Seq Scan on partupdate pu (cost=0.00..763686.34 rows=6636 width=4)
         Filter: ((SubPlan 1) >= 12)
         SubPlan 1
           -> Aggregate (cost=38.33..38.34 rows=1 width=0)
                 -> Bitmap Heap Scan on partupdate p (cost=4.37..38.30 rows=10 width=0)
                       Recheck Cond: (invnum = pu.invnum)
                       -> Bitmap Index Scan on partupdate_index (cost=0.00..4.36 rows=10 width=0)
                             Index Cond: (invnum = pu.invnum)

Still a sequential scan on partupdate, at the top level. But the inner query now uses a bitmap heap scan, based on a bitmap index scan. Yay indexes!

A simpler version

Now let's try to extract out the essential feature of this example. The only attribute of partupdate that we refer to is invnum, so let's just use that:

create table nums (invnum integer);

Note that we don't make invnum a primary key. Now let's add 30,000 values:

insert into nums select generate_series(1, 30000);

There are no duplicate values! But let's repeat the count of for how many records does the same invnum appear at least 12 times. This count, of course, must be zero:

select invnum from nums group by invnum having count(*)>=12;

And here is the very slow version:

select p.invnum from nums p where (select count(*) from nums q where p.invnum = q.invnum) >= 12;

Time: 35036 ms

Postgres has no way to optimize away that inner query, and to run the inner query takes quadratic time.

Blog post on Postgres indexes

eftimov.net/postgresql-indexes-first-principles

Selecting Indexes

Much of this information comes from www.slideshare.net/myxplain/mysql-indexing-best-practices-for-mysql and www.slideshare.net/billkarwin/how-to-design-indexes-really.

The first step is identifying which queries are underperforming. If a query is slow, but you don't run it often, it doesn't matter. The MySQL slow-query log (/var/log/mysql/mysql-slow.log) can be useful here.

If a query is not running fast enough, options available include

adding options to force the query to be run a certain way
restructuring the query, eg from a nested query to a join
adding suitable indexes

The straight_join option (above) is often useful for the first category, as can be the addition of a WITH INDEX index_name option. Most query-optimizing efforts, however, are in the last category: adding indexes.

Be aware that indexes add write costs for all updates; don't create more than you need.

A B-Tree index can also be used to look up any prefix of the key. The prefix might be a single attribute A in an index on trow attributes (A,B), or else a string prefix: e.ssn LIKE '55566%'.

B-tree indexes can also be used for range queries: key < 100, key BETWEEN 50 AND 80, for example.

Join comparisons do best when the fields being compared are both strings and are both the same length. Joining on ssn where one table declares ssn to be an Integer and the other a CHAR(9) is not likely to work well.

A non-key index has high selectivity if each index value corresponds to only a few records. An index on lname might qualify. The standard example of a low-selectivity index is one on sex.

Many MySQL indexes on a non-key field A are really dual-attribute indexes on (A,PK) where PK is the primary key.

In a multi-attribute index, MySQL can use attributes up until the first one not involved in an equality comparison (that is, not used or involved in an inequality). If the index is on attributes (A,B,C), then the index can be used to answer:

    A>10
    A=10 and B between 30 and 50
    A=12 and B=40 and C<> 200

The telephone book is an index on (lname, fname). It can find people with lname='dordal' quickly, but is not very useful in finding people with fname='peter'.

Sometimes a query can be answered by using ONLY the index. In that case, the index is said to be a covering index for that query. For example, if we have an index for EMPLOYEE on (dno,ssn), then we can answer the following query using the index alone:

select e.ssn from employee e where e.dno = 5

The (dno,ssn) index is a covering index for this query.

For the phone book, an index on (lname, fname, phone) would be a covering index for phone-number lookups, but not, say, for address lookups.

For the works_on table, an index on (essn,pno,hours) ((essn,pno) is the primary key) is a covering index for the whole table.

If the query is

select e.lname from employee e where e.salary = 30000 and e.dno = 5;

then while the ideal index is a common index on either (salary,dno) or (dno,salary), two separate indexes on (salary) alone and on (dno) alone are almost as useful, as we can look up the employees with salary = 30000 and the employees in dno 5 and then intersect the two sets. The strategy here is sometimes called "index merge".

Sometimes, if the key is a long string (eg varchar(30)), we build the index on just a prefix of the string (maybe the first 10 characters). The prefix length should be big enough that it is "selective": there should be few records corresponding to each prefix. It should also be short enough to be fast. If a publisher database has an index for book_title, of max length 100 characters, what is a good prefix length? 10 chars may be too short: "An Introdu" matches a lot of books. How about 20 characters? "An Introduction to C" is only marginally more specific.

Query Analysis on Oracle

A standard reference is the Oracle Database SQL Tuning Guide.

Oracle has more tools than MySQL, and also a great many more back-end database-tuning mechanisms. In the latter category there are things like

row packing
caching
optimizer settings
disk/ram tuning

Mostly we will ignore these. The first step, in any case, is with the queries.

Oracle provides the SQL Performance Analyzer (SPA) tool, which takes a set of queries known as the SQL Tuning Set, or STS, and figure out what the DBA should consider optimizing. In many cases the SPA tool will recommend the addition of specific indexes.

The Oracle version of MySQL's explain is the explain plan command. I have not worked with this, but there is an article describing it at http://www.oracle.com/technetwork/database/bi-datawarehousing/twp-explain-the-explain-plan-052011-393674.pdf.

Aside from creation of indexes, Oracle also supports collection of a wide-ranging set of table statistics, beyond MySQL's basic row counts. Analysis of these statistics is sort of what the SPA tool does.

Adaptive Query Optimization

Here is a brief example from the tuning guide above, at http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_optcncpt.htm#TGSQL221. The sample query is (slightly edited):

SELECT product_name  
FROM   order_items o join product_information p on p.product_id = o.product_id
WHERE  o.unit_price = 15 
AND    o.quantity > 1

If we do the two where clauses, and this results in just a few order_items rows remaining, then the best plan is a nested-loops join (that is, for each remaining order_items row o, look up o.product_id in the product_information table). However, if we have a large set of rows, a hash join becomes more efficient, and worth the effort to set up. The optimizer can determine this situation just by looking at the query, but cannot tell which plan to use without actually starting to run the query.

oarcle query plans

What Oracle does is to make some actual measurements of the number of rows, apparently during the execution of the query, and then may switch from one mechanism to the other as it discovers that, whoa, there are lots of rows to look at.

Optimizer Hints

Oracle supports supplying hints to the optimizer for query execution. Here are some kinds of hints:

single-table

specification of what INDEX to use
USE_NL to request a Nested Loop join

multiple-table

LEADING allows a suggestion of what tables to start with (eg to join first)

query block

UNNEST: applies to nested queries
STAR_TRANSFORMATION: One table is the center of the "star" and multiple other tables are accessed relative to the center, but not relative to one another
ALL_ROWS

Here are a couple of Oracle's examples. The first is an index suggestion:

SELECT /*+ INDEX (employees emp_department_ix)*/ employee_id, department_id
FROM   employees 
WHERE  department_id > 50;

Here is an example of LEADING, which suggests that the join begin with tables e and j:

SELECT /*+ LEADING(e j) */ *
FROM   employees e, departments d, job_history j
WHERE  e.department_id = d.department_id
AND    e.hire_date = j.start_date;

On what basis might the DBA make such a recommendation?