Comp 353/453: Database Programming, Corboy
L08, 4:15 Mondays
Week 4, Feb 11
Read in Elmasri & Navathe (EN)
- Chapter 4: Basic SQL
- Chapter 5: More-complex SQL (section 5.1 especially)
Homework 2: E&N exercises 5.5,
5.6, 5.7, 5.8 (actual text on sakai), due Friday, Mar 1
Select-expressions as an alternative to outer joins entirely (week
3 notes)
General conversion of an intersect to a join (week
3 notes)
Nested queries
Queries can be nested; this is how we will construct complex queries where
inclusion of a record in the result depends on other records in the
database.
When queries are nested, we use the term outer
query for the enclosing query, and inner
query (queries) for the enclosed query/queries.
Single-row nested queries
The simplest case is perhaps when only a single row and column is retrieved.
SQL automatically converts a query result that is a 1×1 table to the value
that 1×1 table contains; this value can be used in a boolean or arithmetic
expression. We should really call these single-value queries.
Note that the number of columns in a query result is determined
syntactically, by the form of the select clause. The number of rows,
however, usually depends on the data.
As an example, we can find the ssn of the big boss (ie the one with no
supervisor) with
select e.ssn from employee e where
e.super_ssn is null;
Now let's find everyone who works for
the boss:
select e.fname, e.lname from employee e
where e.super_ssn =
(select e.ssn from employee e where e.super_ssn is
null);
Note that we reused the letter e above; the two uses (inner and outer) don't
interact at all.
How can we do this as a "traditional" query? We want employees e where there
is a supervising employee s with a null super_ssn. We can do this as a
self-join, just like the table of employees and their supervisors:
select e.fname, e.lname from employee e,
employee s
where e.super_ssn = s.ssn and s.super_ssn is null;
Which is easier?
Now let's find everyone who makes at least half of what the boss makes:
select e.fname, e.lname from employee e
where 2*e.salary >
(select e.salary from employee e where e.super_ssn is
null);
How about:
select e.fname, e.lname from employee e
where e.salary >
(select e.salary from employee e where e.super_ssn is
null)/2;
For any given employee with ssn X, we can find the ssn of the employee's
manager with
select e1.super_ssn from employee e1 where
e1.ssn = X;
We can find the supervisor's salary with
select e1.salary from employee e1 where
e1.ssn = X's super_ssn
Here's a query with two inner single-row queries. Note that Wong and Wallace
are the two managers reporting directly to Mr Borg. My main point here is
that the results of the inner selects can be used inside arithmetic
expressions.
select e.fname, e.lname, e.salary from
employee e
where e.salary >= 0.6*(
(select e.salary from employee e where e.lname =
'Wong')
+
(select e.salary from employee e where e.lname =
'Wallace')
);
Now let's list all employees who make at least 70% of what their own manager earns.
select e.fname, e.lname, e.salary from
employee e
where e.salary >= 0.7* (select e1.salary from employee e1 where e1.ssn
= e.super_ssn);
The query above is correlated: the
inner query refers to the outer
one. In the earlier examples, we could do query evaluation from the inside
out; here we can not. Instead, our
evaluation strategy turns out to be more like an inner loop.
Example of failure if multiple values are returned:
select e.fname, e.lname from employee e
where e.salary >
(select e.salary from employee e where e.lname LIKE
'W%');
Finally, we can use the COUNT() function. First, here is an example of its
standalone use, to get the number of employees in department 5; the count(*)
means to count all records returned.
select count(*) from employee e where e.dno
= 5;
Next, here is an example where the inner query contains count(). This lists
all departments with at least 3 members:
select d.dname from department d
where (select count(*) from employee e where e.dno = d.dnumber) >=3;
Finally, here's an example where we use the count in the select
part as a select expression, to list all departments and
their number of employees:
select d.dname, (select count(*) from
employee e where e.dno = d.dnumber) AS dept_size
from department d;
Note the use of AS here, to name the computed column; it is very
different from the (usually omitted) use of AS in the from
section.
This is not the most common way to use count(), though; more on the COUNT()
function follows.
What we can not do with single-value queries is to save
the value in a numeric variable and then use it in other expressions.
Grouping functions
Sometimes the inner query returns multiple rows (usually numeric), but we
reduce those rows to a single value using one of the aggregate
functions (E&N §5.1.7):
count
sum
avg
max
min
For example, here is how we would find all employees who make at least 25%
of their department total:
select e.fname, e.lname, e.salary from
employee e
where e.salary >= 0.25 *(select sum(e1.salary) from employee e1 where
e1.dno = e.dno);
Notice we can not do the following.
Aggregation functions must be used within the select clause, not on the
query result as a whole.
select e.fname, e.lname, e.salary from
employee e
-- WRONG!
where e.salary >= 0.25 * sum(select
e1.salary from employee e1 where e1.dno = e.dno);
Let's take a look at E&N 4.12(c):
For each section taught by Professor King,
retrieve the course number, semester, year, and number of students who
took the section
The easiest way to do this is to use nested queries, where you include in
the select part of the outer query
a computed inner-query column of the form
(select count(*) from grade_report g where
g.section_identifier = s.section_identifier) AS NUM_STUDENTS
Note that in the query above, the table variable s is a free
variable, to be supplied by context in the outer query.
You can do this query a few other ways, too. First, note the following
peculiar query:
select s.course_number, s.semester, s.year
from section s, grade_report g
where g.section_identifier = s.section_identifier and s.instructor='King';
It is peculiar because the involvement of grade_report
is unclear; all it seems to do (all it does
do) is to generate a separate copy of the fields
⟨s.course_number,s.semester,s.year⟩ for each separate student (each record
in grade_report for that particuar section_identifier).
One way to use COUNT(*) which is technically wrong
but which works in MySQL is:
// wrong
select s.course_number, s.semester, s.year, COUNT(*)
as NUM_STUDENTS
from section s, grade_report g
where g.section_identifier = s.section_identifier and s.instructor='King';
The right way to do this is to use the GROUP BY option. We have multiple
records for each ⟨s.course_number,s.semester,s.year⟩; we group these common
fields together (by s.section_identifier) for counting:
select s.course_number, COUNT(*) as
NUM_STUDENTS
from section s, grade_report g
where g.section_identifier = s.section_identifier and s.instructor='King'
GROUP BY s.course_number;
We'll look into the GROUP BY in more detail later; for now, note that if
aggregation is used in some of the
select columns, then all the
non-aggregated ("ordinary") columns must be included in the GROUP BY clause.
GROUP BY
Suppose we want to count the employees in each department. We could run the
following (sorting if preferred) and count each department by hand:
select e.dno from EMPLOYEE e;
But we really want SQL to do this; that is, we want a table of⟨dept_no,
count⟩. Here's how:
select e.dno, count(*)
from EMPLOYEE e
group by dno;
What is going on here is that we select the e.dno records, and then group
them by dno, and then select a single
row for each group. The selection values can either be conventional
attributes that are known to be constant
for the group, or else one of the aggregate
functions count, sum, max,min, avg applied to a non-constant attribute. For
count(), we can use the form count(*) to count all records; for the others,
we must apply them to specific numeric attributes.
Conventional attributes are known to be constant for the group if they
appear in the group by clause. In
other words, every non-aggregate attribute in the select
clause must also be in the group by
clause. Usually all attributes in
the group by clause are used in the
select as well.
Here's the department information with average salaries added (and also the
birthdate of the oldest employee) (Query 24)
select e.dno, count(*), avg(e.salary),
min(bdate)
from EMPLOYEE e
group by dno;
What is going on conceptually is that, after the rows are selected(there is
no where clause in the example
above, so this row-selection is trivial), the remaining rows are then
partitioned according to the group by
condition. Then, the select results
are given for each group: they
will be either constant for the group, or else an aggregate function that
will now be computed for the group. Recall again that any non-aggregate
entry in the select clause must be
listed in the group by clause; this
is what guarantees that non-aggregate entries are constant for each group.
If we want to add department names to the above we can do a simplejoin. Note
that we must (except in MySQL) add
dname to the group
by clause.
select d.dname, e.dno, count(*),
avg(e.salary), min(bdate)
from EMPLOYEE e, DEPARTMENT d
where e.dno = d.dnumber
group by dno, d.dname;
Here's another example: Query 25:
for each project, give the project number, the project name, and the number
of employees who worked on it. Note that it is usually appropriate to add an
as title for count(*) columns to
identify what is being counted.
select p.pnumber, p.pname, count(*) as
employees
from PROJECT p, WORKS_ON w
where p.pnumber = w.pno
group by p.pnumber, p.pname
The output is:
pnumber
|
pname
|
employees
|
1
|
ProductX
|
2
|
2
|
ProductY
|
4
|
3
|
ProductZ
|
2
|
10
|
Computerization
|
3
|
20
|
Reorganization
|
3
|
30
|
Newbenefits
|
3
|
Demo: this fails if we leave either attribute out of the group
by clause (at least in postgres). (Note that, nonetheless, it is
clear that if we leave out p.name then this attribute is still constant for
p.pnumber groups;p.pnumber is after all the key
for the PROJECT table.)
Also note that this might be a candidate for leaving p.pnumber off the select clause, if p.pname is sufficient.
Here's a tabular representation of the full join table here, with all
attributes from works_on; it
represents Fig 5.1(b) of E&N. The group
by option means we take each group,
identified in the virtual final column; count(*) refers to the number of
rows in that group.
p.pname
|
p.pnumber
|
w.essn
|
w.pno
|
w.hours
|
|
ProductX |
1 |
123456789 |
1 |
32.5 |
group 1
|
ProductX |
1 |
453453453 |
1 |
20 |
ProductY |
2 |
453453453 |
2 |
20 |
group 2
|
ProductY |
2 |
333445555 |
2 |
10 |
ProductY |
2 |
987654321 |
2 |
7.5 |
ProductY |
2 |
123456789 |
2 |
7.5 |
ProductZ |
3 |
333445555 |
3 |
10 |
group 3
|
ProductZ |
3 |
666884444 |
3 |
40 |
Computerization |
10 |
987987987 |
10 |
35 |
group 10
|
Computerization |
10 |
333445555 |
10 |
10 |
Computerization |
10 |
999887777 |
10 |
10 |
Reorganization |
20 |
333445555 |
20 |
10 |
group 20
|
Reorganization |
20 |
987654321 |
20 |
15 |
Reorganization |
20 |
888665555 |
20 |
0 |
Newbenefits |
30 |
999887777 |
30 |
30 |
group 30
|
Newbenefits |
30 |
987987987 |
30 |
5 |
Newbenefits |
30 |
987654321 |
30 |
20 |
All the aggregation functions (count, sum, min, max, avg) can be applied in
select clauses to ungrouped queries,
but then if any aggregation functions appear, then everything
in the select clause must be an
aggregation function. What we get are various "statistics" for the entire
table.Example:
select count(*), min(e.bdate),
sum(e.salary), avg(e.dno) from employee e;
This is a relatively limited situation; it is more common for the
aggregation functions to be combined with group
by.
Remember
E&N
4.12(c)?
For each section taught by Professor King,
retrieve the course number, semester, year, and number ofstudents who took
the section
Here is a solution that works in MySQL, but is not
standard SQL. Oracle and postgres do
not allow this:
select s.course_number, s.semester, s.year,
COUNT(*) as NUM_STUDENTS
from SECTION s, GRADE_REPORT g
where g.section_identifier = s.section_identifier and s.instructor='King';
The COUNT(*) counts all records, and when it is present, everything else
in the select clause should be
"constant" (either there should be
nothing else, or else the query should use group
by for all the non-aggregated attributes.
Here's a set of examples that use count(*) appropriately (and work under
postgres):
- select student_number from
GRADE_REPORT;
-- no aggregation
- select count(*) from
GRADE_REPORT;
-- no non-aggregated attributes
- select count(student_number) from GRADE_REPORT;
- select count(distinct student_number) from GRADE_REPORT;
- select count(*) from (select distinct student_number,
section_identifier from grade_report) as dummy;
Here are some examples that are wrong,
though, as usual, some work in MySQL
- select student_number, count(*) from
GRADE_REPORT;
-- needs GROUP BY
- select count(distinct student_number, section_identifier) from
GRADE_REPORT
- select count(distinct *) from GRADE_REPORT;
/* fails in mysql*/
select count(distinct ____) ... requires a single
column attribute!
Multi-value nested queries
In these queries, the inner query returns (or potentially returns) a set of
rows, or rows with multiple columns. In this case, the following operations
are allowed:
- IN
- EXISTS / NOT EXISTS
- UNIQUE
- = ANY / = SOME
- op ALL (op is typically a comparison like > or >=)
Suppose we want to find the names of employees who have worked on project
10. We can do this as a traditional join, but we can also do:
select e.fname, e.lname from employee e
where e.ssn in
(select w.essn from works_on w where w.pno = 10);
E&N Q4A, p 118: List all project
numbers for projects that involve an employee whose last name is 'Smith',
either as worker or manager.
We did this before with a standard join; here is a nested version:
select distinct p.pnumber
from project p
where p.pnumber IN
(select p1.pnumber from project p1,
department d, employee e
where p1.dnum = d.dnumber and d.mgr_ssn = e.ssn and e.lname = 'Smith')
or
p.pnumber IN
(select w.pno from works_on w, employee e
where w.essn = e.ssn and e.lname = 'Smith');
This one is not correlated; the
inner queries can be evaluated stand-alone.
In the middle of p 118, E&N6 states
SQL allows the use of tuples
of values in comparisons by placing them within parentheses. To illustrate
this, consider the following query [which I edited slightly -- pld]
select distinct w.essn
from works_on w
where (w.pno, w.hours) IN
(select w1.pno, w1.hours from works_on w1 where w1.essn = '123456789');
Just what question does this answer?
Query 16: an example of a nested correlated query:
Retrieve the name of each employee who has
a dependent with the same first name and is the same sex as the employee:
Here is E&N6's solution:
select e.fname, e.lname from employee e
where e.ssn in
(select d.essn from dependent d where e.fname = d.dependent_name and e.sex
= d.sex);
Note that, in the inner query, e is a "free variable",
referring to the employee e declared in the outer query.
Let's add a row to make this nonempty:
insert into dependent values ('123456789',
'John', 'M', '1990-02-24', 'son');
delete from dependent where
bdate='1990-02-24';
-- to delete the record above
Note that this query only returns rows where e.ssn = d.essn, although this
is not stated as an equality check (if we don't have e.ssn = d.essn, then
e.ssn will not be in the set of
d.essn). But note that if we add the above son John to another employee,
'987654321', then the inner query for employee John Smith, '123456789',
would amount to
select d.essn from dependent d where 'John'
= d.dependent_name and 'M' = d.sex;
and would return '987654321'. But the in
clause would then not match (which is the correct behavior).
It can be confusing for an inner query to return such "irrelevant" results,
even if the end result is correct.
Another, possibly better, way to write this query might be:
select e.fname, e.lname from employee e
join dependent d on e.ssn = d.essn
where e.fname = d.dependent_name and e.sex = d.sex;
Note that queries using = or in to
connect the inner and outer queries can always
be done without nesting, using an additional join. Is this more or less
efficient?
Nested queries with Exists and Unique
Here's the above example again of a query to find all employees with a
dependent of the same name and sex, this time done with exists
instead of in:
select e.fname, e.lname from EMPLOYEE e
where exists
(select * from DEPENDENT d where e.ssn = d.essn and e.fname =
d.dependent_name and e.sex = d.sex);
(Note here that the inner query will return no records if the result is
empty, unlike the in version.)
What about the names of employees with no
dependents (Query 6 of E&N)? We
can do a query with COUNT(*) to check if an employee e qualifies:
select count(*) from DEPENDENT d where d.essn = e.ssn;
/* e is a "free variable" here */
The full query is:
select e.fname, e.lname from EMPLOYEE e
where (select count(*) from DEPENDENT d where d.essn = e.ssn) = 0;
But here it is with not exists:
select e.fname, e.lname from EMPLOYEE e
where not exists
(select * from DEPENDENT d where e.ssn = d.essn);
Query 7: list the names of managers with at
least one dependent:
select e.fname, e.lname from EMPLOYEE e
where
exists (select * from DEPENDENT d where d.essn = e.ssn)
and
exists( select * from DEPARTMENT d where e.ssn = d.mgr_ssn);
Or perhaps
select e.fname, e.lname from employee e,
department d
where
e.ssn = d.mgr_ssn
and
exists (select * from dependent dd where dd.essn = e.ssn);
Generally speaking, joins are slightly preferred, simply because it is
likely they are better optimized.
Suppose we want to find all employees who
have worked on project 3. We can do this with a straightforward join:
select e.fname, e.lname from employee e,
works_on w
where e.ssn = w.essn and w.pno = 3;
or we can write:
select e.fname, e.lname from EMPLOYEE e
where exists (select * from
WORKS_ON w where w.essn = e.ssn and w.pno = 3);
But now suppose we want all employees who have not
worked on project 3. If this can be done with a straightforward join, it is
beyond me. But it is easy with the exists
pattern above:
select e.fname, e.lname from EMPLOYEE e
where not exists (select * from
WORKS_ON w where w.essn = e.ssn and w.pno = 3);
Application of not exists to the results of a subquery is
a powerful technique.
How about all the employees who have worked more hours on some project than
anyone in department 4?
select e.fname, e.lname from EMPLOYEE e,
works_on w
where e.ssn = w.essn and w.hours >all (
select w.hours from WORKS_ON w, EMPLOYEE e2 where
w.essn= e2.ssn and e2.dno = 4
);
We could also write that
where e.ssn = w.essn and w.hours >
(select max(w.hours) from WORKS_ON w, EMPLOYEE e2 where w.essn = e2.ssn
and e2.dno = 4)
Here's yet another way to do the intersect
example from last week, without using intersect.
Recall that the point was to list all employees who worked on project 2 and project 3.
select e.fname, e.lname
from EMPLOYEE e
where
e.ssn in (select e.ssn from EMPLOYEE e,
WORKS_ON w where e.ssn = w.essn and w.pno = 2)
and
e.ssn in (select e.ssn from EMPLOYEE e, WORKS_ON w where e.ssn = w.essn
and w.pno = 3);
What about the name of the oldest employee? This nested query isn't even
correlated.
select e.fname, e.lname from EMPLOYEE e
where e.bdate = (select min(e2.bdate) from EMPLOYEE e2);
And if we want the names of all employees older than everyone in department
5:
select e.fname, e.lname from EMPLOYEE e
where e.bdate < all (select
e.bdate from EMPLOYEE e where e.dno = 5);
What about all employees who have worked on a dept 5 project and also a dept
4? This can be done as a join and also as:
select e.fname, e.lname from EMPLOYEE e
where
exists (select * from WORKS_ON w, PROJECT p where e.ssn = w.essn and w.pno
= p.pnumber and p.dnum = 4)
and
exists (select * from WORKS_ON w, PROJECT p where e.ssn = w.essn and w.pno
= p.pnumber and p.dnum = 5);
(Wong has worked on projects 2 & 3 in dept 5 and 10 in dept 4, and
Wallace has worked on projects 30 in dept 4 and the following adds her to
project 2 in dept 5)
insert into works_on values ('987654321', 2,
3);
-- project 2 is in dept 5; the 3 is the hours/week worked
delete from works_on where essn='987654321' and pno=2;
How about employees who have worked on a project in a department other than
their "home" department? Again, this can be done as a join but also as:
select e.fname, e.lname from EMPLOYEE e
where exists (select * from WORKS_ON w, PROJECT p where e.ssn = w.essn and
w.pno = p.pnumber and e.dno <> p.dnum);
That last <> isn't really a join condition; we already have enough of
those.
Queries with ALL (Universal
Quantification)
Here's a harder one: list the employees who have worked for ALL departments,
or, in more detail, list all employees e
such that for every department d,
e worked for a project of d. You might try to approach this by
listing employees for which the set of departments they have worked for
equals the set of all departments:
select e.lname from employee e
where
(select distinct p.dnum from project p, works_on w
where w.pno = p.pnumber and w.essn = e.ssn order by
p.dnum) -- projects e has worked on
=
(select d.dnumber from department d);
However, SQL does not allow arbitrary set comparisons: MySQL yields the
error "subquery returns more than 1 row". Postgres complains "ERROR:
more than one row returned by a subquery used as an expression".
The second department set above, (select d.dnumber from department d), is
the set of all departments. Let us call this D2. The first department set is
the set of departments that employee e has worked for; let us call this D1.
It is automatic that D1 ⊆ D2; what we really want to know is whether there
is anything in D2 that is not in D1. The set-theoretic MINUS operator works
as follows:
A MINUS B = {x∈A | x∉B}
So what we really need to know is whether D2 MINUS D1 is nonempty. We can
actually do this; in Oracle we'd use "MINUS" but in postgres the set-minus
operator is called "EXCEPT":
select e.lname from employee e
where NOT EXISTS (
(select d.dnumber from department d)
EXCEPT
(select distinct p.dnum from project p, works_on w
where w.pno = p.pnumber and w.essn = e.ssn order by
p.dnum) -- projects e has worked on
)
Alas, there is no MINUS / EXCEPT operator in MySQL.
A way to think of this that MySQL can
handle is by is asking for all employees such that there is not
a department that the employee has not worked on any of that department's
projects, or list
all employees e such that it is there is no
department d such that e did not
work for any project of d.
The following two ways of describing a set are the same, where D is the
domain of y:
- { x | for all y ∈ D, foo(x,y) }
- { x | there does not exist y ∈ D for which ¬foo(x,y) }
That is, "for every d, foo(d) is true" is to say that "there is no d
such that foo(d) is false". We might call the latter the double-negative
formulation.
We cannot phrase the first bulleted form directly in SQL, but we can
express the second form, using at least one and usually two not
exist clauses.
Back to the example "list all employees e such that for every department d,
e worked for a project of d". By the above, this is the same as "list all
employees e such that there does not exist a department d for which e did
not work on any project of d". To list the projects that e worked on
in department d, we could use the following (free variables e and d
highlighted):
select * from works_on w, project p
where e.ssn = w.essn and w.pno =
p.pnumber and p.dnum = d.dnumber
We can say "e did not work on any project of d" by saying the above list is
empty:
not exists (select * from works_on w,
project p where e.ssn = w.essn
and w.pno = p.pnumber and p.dnum = d.dnumber)
So here is the final query: there does not exist a department e did not work
for
select e.fname, e.lname from EMPLOYEE e
where not exists (
select d.dnumber from department d where
not exists (select * from WORKS_ON
w, PROJECT p where e.ssn = w.essn and w.pno = p.pnumber and p.dnum =
d.dnumber)
);
Queries involving ALL in the qualification (as opposed to "list ALL senior
students") tend to be hard to follow, at least until the "double negative"
reformulation becomes more natural.
As another example, suppose we want the names of each employee who works on
all the projects controlled by
department 4 (Query Q3). (Those
projects are pnumbers 10 and 30 using the E&N original data.) To do this
one, we need to select the department-4 projects.
Here is E&N's version of this query, Q3B (p 122, 6th ed), slightly
rearranged
select e.fname, e.lname from EMPLOYEE e
where
not exists (
select * from WORKS_ON w1 where
w1.pno in (select p.pnumber from PROJECT p where p.dnum = 4)
and
not exists (select * from
WORKS_ON w2 where w2.essn = e.ssn and w2.pno = w1.pno)
But here is an arguably more "natural" version, or at least more directly
using the double-negative style:
select e.fname, e.lname from EMPLOYEE e
where
not exists (
select * from project p where p.dnum = 4
and
not exists (select * from WORKS_ON w where w.essn = e.ssn and w.pno =
p.pnumber) /* e did not work on p */
);
The idea here is that someone has worked on all
department-4 projects if there is no
department-4 project that the employee did not
work on.
Now suppose we want employees who have worked on at
least two projects in department 4. This is easier.
select e.fname, e.lname from EMPLOYEE e
where
(select count(*) from WORKS_ON w, PROJECT p where e.ssn = w.essn and w.pno
= p.pnumber and p.dnum = 4) >= 2;
One of the homework-2 exercises asks you to list all students who have
received all A's. This is another example of this pattern: it is equivalent
to asking for all students where there does not
exist a grade that is not
A. Here, though, the inner query would be the set of grades received by the
student that are ≠ A; there would not be a second not
exist clause.
Except / Minus
Like for intersect, there is a general way to take a query
that uses the except (or minus)
operator, and reimplement it as a join. In this case, however, an outer
join is needed. Suppose we have
(select a.col1, a.col2 from TableA a)
except
(select b.col1, b.col2 from TableB b)
We can convert this as follows using NOT IN:
select distinct a.col1, a.col2 from TableA a
where (a.col1, a.col2) NOT IN
(select b.col1, b.col2 from TableB b);
We can also do this as a left outer join:
select distinct a.col1, a.col2 from TableA a
left join TableB b on a.col1=b.col1 and a.col2 = b.col2
where b.col1 is null
Recall that the left outer join will include in the join above all records
that represent "real" matches, and also all records in a that have no match
in b.
As a demo, let us first create a smaller version of the employee table:
create view employee2 as
select * from employee e where e.dno <> 5;
Now the join:
select distinct e.lname, e.ssn, e.dno from
employee e left join employee2 e2 on e.ssn = e2.ssn
where e2.ssn is null;