Previous Page Next Page Up One Level

Lecture Slides available: PDF PowerPoint

JOINs and VIEWs

Contents

Multiple source tables

Sometimes you will need to write a query which uses more than a single table. This is perfectly acceptable in SQL, but needs a little care... It is very easy to produce multi-table queries which produce mostly rubbush.

The basic concept for producing multi-table queries is that all the tables you need must be listed in the FROM clause of the query. For example, lets try to write a query which lists the owner name, date of birth, and the registration number, for each car in the database. REGNO is in CAR, but DOB is in DRIVER. Therefore both tables are needed. The basic query looks like:

SELECT name,dob,regno
FROM   car,driver
;

The order in which the tables appear in the FROM line are irrelevent. However, this query does not produce the right answer. The reason for this is that the DBMS does not understand how to relate one table to the other.

JOIN condition

In order to usefully join multiple tables together we need to explain to the database how they are joined. The FROM clause takes all rows in all the tables listed, and forms a new table which contains all combinations of the original rows. Most of the time this results in rubbish. Look at this example.

SELECT *
FROM   car
;
REGNOMAKECOLOURPRICEOWNER
F611 AAAFORDRED12000Jim Smith
J111 BBBSKODABLUE11000Jim Smith
A155 BDEMERCEDESBLUE22000Bob Smith
K555 GHTFIATGREEN6000Bob Jones
SC04 BFESMARTBLUE13000
SELECT *
FROM   driver
;
NAMEDOB
Jim Smith11 Jan 1980
Bob Smith23 Mar 1981
Bob Jones3 Dec 1986
SELECT *
FROM   car,driver
;
REGNOMAKECOLOURPRICEOWNER NAMEDOB
F611 AAAFORDRED12000Jim Smith Jim Smith11 Jan 1980
J111 BBBSKODABLUE11000Jim Smith Jim Smith11 Jan 1980
A155 BDEMERCEDESBLUE22000Bob Smith Jim Smith11 Jan 1980
K555 GHTFIATGREEN6000Bob Jones Jim Smith11 Jan 1980
SC04 BFESMARTBLUE13000 Jim Smith11 Jan 1980
F611 AAAFORDRED12000Jim Smith Bob Smith23 Mar 1981
J111 BBBSKODABLUE11000Jim Smith Bob Smith23 Mar 1981
A155 BDEMERCEDESBLUE22000Bob Smith Bob Smith23 Mar 1981
K555 GHTFIATGREEN6000Bob Jones Bob Smith23 Mar 1981
SC04 BFESMARTBLUE13000 Bob Smith23 Mar 1981
F611 AAAFORDRED12000Jim Smith Bob Jones3 Dec 1986
J111 BBBSKODABLUE11000Jim Smith Bob Jones3 Dec 1986
A155 BDEMERCEDESBLUE22000Bob Smith Bob Jones3 Dec 1986
K555 GHTFIATGREEN6000Bob Jones Bob Jones3 Dec 1986
SC04 BFESMARTBLUE13000 Bob Jones3 Dec 1986

In our query, we are only interested in table combinations which obey the rules of the FOREIGN KEY relationship which joins these two tables. If you remember, the PRIMARY KEY of DRIVER (NAME) was copied into CAR as a FOREIGN KEY (named OWNER). Thus this FROM generated table needs to be filtered so that only the rows where NAME = OWNER appear. Note that this FROM generated table containing all the combinations of the listed tables is known as the cartesian cross product. We will return to the subject of the cross product in the relational algebra chapter.

Now, in order to get our query working properly, we put in the JOIN condition NAME = OWNER. There are two basic ways to do this, which we will call traditional and modern. Both ways are usually referred to as an INNER JOIN.

Traditional JOIN

To put the join condition NAME = OWNER into a query using the traditional approach is simply to list it in the WHERE clause as a rule. So...

SELECT *
FROM   car,driver
WHERE  owner = name
;
REGNOMAKECOLOURPRICEOWNER NAMEDOB
F611 AAAFORDRED12000Jim Smith Jim Smith11 Jan 1980
J111 BBBSKODABLUE11000Jim Smith Jim Smith11 Jan 1980
A155 BDEMERCEDESBLUE22000Bob Smith Bob Smith23 Mar 1981
K555 GHTFIATGREEN6000Bob Jones Bob Jones3 Dec 1986

Modern JOIN

To put the join condition NAME = OWNER into a query using the modern approach, you rewrite the FROM line to say:

FROM table1 JOIN table2 ON (rules)
So in our example:

SELECT *
FROM   car JOIN driver ON (owner = name)
;
REGNOMAKECOLOURPRICEOWNER NAMEDOB
F611 AAAFORDRED12000Jim Smith Jim Smith11 Jan 1980
J111 BBBSKODABLUE11000Jim Smith Jim Smith11 Jan 1980
A155 BDEMERCEDESBLUE22000Bob Smith Bob Smith23 Mar 1981
K555 GHTFIATGREEN6000Bob Jones Bob Jones3 Dec 1986

OUTER JOIN

You might have noticed a result in the previous query (when there were no join conditions) where there was a NULL in the OWNER field. This is for a car with no current owner. Once the join condition was inserted into the query the rows with NULL owners were filtered out. This is usually exactly what is desired, but sometimes we want the join condition to be obeyed if the fields are not NULL, and the rules to be broken when there is a NULL. Such JOINs are called OUTER JOINS. In the modern JOIN syntax you simply insert either the word LEFT or the word RIGHT in front of the word JOIN.

To decide if the right word is LEFT of RIGHT, you have to consider where the NULL values will be. In our example query, the NULL value is in the OWNER field, which belongs to the CAR table. The current JOIN is:

FROM car JOIN driver on (owner = name)
      ^    ^   ^
      |    |   |
      |    |   +-------- To the right of JOIN
      |    +--------------- The JOIN statement
      +------------------------- To the left of JOIN
As the CAR table has the NULL values, and CAR appears to the left of the word JOIN in the query, the right keyword to use is LEFT JOIN. The query becomes:

SELECT *
FROM   car LEFT JOIN driver ON (owner = name)
;
REGNOMAKECOLOURPRICEOWNER NAMEDOB
F611 AAAFORDRED12000Jim Smith Jim Smith11 Jan 1980
J111 BBBSKODABLUE11000Jim Smith Jim Smith11 Jan 1980
A155 BDEMERCEDESBLUE22000Bob Smith Bob Smith23 Mar 1981
K555 GHTFIATGREEN6000Bob Jones Bob Jones3 Dec 1986
SC04 BFESMARTBLUE13000

The OUTER JOIN fills in the missing data (for the things which do not satisfy the rules) with NULLs. Note that if you swap CAR and DRIVER around in the JOIN statement you can write it as a RIGHT JOIN just as easily...

SELECT *
FROM   driver RIGHT JOIN car ON (owner = name)
;

The order of the rules in ON have no significance in deciding what is right and what is left.

FULL OUTER JOIN

First, assume that we have added a new row to DRIVER, so that it now reads as:

NAMEDOB
Jim Smith11 Jan 1980
Bob Smith23 Mar 1981
Bob Jones3 Dec 1986
David Davis1 Oct 1975

Now, David Davis does not own a car, and thus never appears in a normal inner JOIN. In an outer join, we can have:

SELECT *
FROM   car LEFT JOIN driver ON (owner = name)
;
REGNOMAKECOLOURPRICEOWNER NAMEDOB
F611 AAAFORDRED12000Jim Smith Jim Smith11 Jan 1980
J111 BBBSKODABLUE11000Jim Smith Jim Smith11 Jan 1980
A155 BDEMERCEDESBLUE22000Bob Smith Bob Smith23 Mar 1981
K555 GHTFIATGREEN6000Bob Jones Bob Jones3 Dec 1986
SC04 BFESMARTBLUE13000
SELECT *
FROM   car RIGHT JOIN driver ON (owner = name)
;
REGNOMAKECOLOURPRICEOWNER NAMEDOB
F611 AAAFORDRED12000Jim Smith Jim Smith11 Jan 1980
J111 BBBSKODABLUE11000Jim Smith Jim Smith11 Jan 1980
A155 BDEMERCEDESBLUE22000Bob Smith Bob Smith23 Mar 1981
K555 GHTFIATGREEN6000Bob Jones Bob Jones3 Dec 1986
David Davis 1 Oct 1975

In some relatively unusual queries, it might be useful if we see all the rows which obey the join condition, followed by the rows left over from each of the tables involved in the join. This is called a FULL OUTER JOIN and is written in SQL as FULL JOIN.

SELECT *
FROM   car FULL JOIN driver ON (owner = name)
;
REGNOMAKECOLOURPRICEOWNER NAMEDOB
F611 AAAFORDRED12000Jim Smith Jim Smith11 Jan 1980
J111 BBBSKODABLUE11000Jim Smith Jim Smith11 Jan 1980
A155 BDEMERCEDESBLUE22000Bob Smith Bob Smith23 Mar 1981
K555 GHTFIATGREEN6000Bob Jones Bob Jones3 Dec 1986
SC04 BFESMARTBLUE13000
David Davis 1 Oct 1975

Naming

In some complex queries the DBMS may not be able to identify what table an attribute belongs to. For instance, joining two tables ALPHA and BRAVO, where both tables have a column called NAME. Running the following:

SELECT name from ALPHA,BRAVO
would produce an error. The problem is when you say "name" is it the one in ALPHA or the one in BRAVO? Instead you have to make the query more explicit.

What you are allowed to do is in front of a column name you can say with table that column belongs to. If you wanted to say "name" in ALPHA, you could say alpha.name. Now it is clear what table the column belongs to, and the query will work:

SELECT alpha.name from ALPHA,BRAVO

Aliases

Sometimes you can be dealing with large table names, and finding you have to put the table name in front of many of the attribute names. This can be a lot of typing. SQL allows you to pretend that a table is called something else for the duration of your query. This is called aliasing. So instead of

SELECT car.owner from car
you can write
SELECT c.owner FROM car c
In this way aliases provide a shorthand way to refer to tables. In a more complex example:

SELECT c.regno,c.owner,d.dob
FROM   car c JOIN driver d ON (c.owner = d.name)
;

Remember you only have to use aliases if you want to, and decorate attributes with the table names when the computer cannot work out which table attribute you are talking about.

Self Joins

Self-joins, or Equijoins, are where you want the query to use the same table more than once, but each time you use it for a different purpose.

Consider the question "Who drives a car the same colour as Bob Smith"?

SELECT colour FROM car WHERE owner = 'Bob Smith';
colour
BLUE
SELECT owner FROM car
WHERE colour = 'BLUE'
AND   owner != 'Bob Smith'
AND   owner NOT NULL
owner
Jim Smith

To run this query, we need to use CAR twice. First to find the colour, and then to find the other owners. Thus CAR is used for two different purposes. To combine these queries together, we need to use CAR twice. To make this work, we need to use table aliases to make CAR appear to be two different tables. After that, its easy!

SELECT other.owner
FROM  car bobsmith, car other
WHERE bobsmith.colour = other.colour -- join on the colour
AND   bobsmith.owner = 'Bob Smith'   -- In bobsmith look only for Bob Smith
AND   bobsmith.owner != other.owner  -- OTHER cannot be Bob Smith
AND   other.owner NOT NULL           -- Exclude cars without owners
owner
Jim Smith

VIEWs

When writing queries, the query can get longer and longer. In addition, sometimes you find that a single query uses the same rules in two different parts of the query. In programming languages you would move the duplicated code into some sort of library of subroutines. In SQL, the idea of a subroutine is called a VIEW.

A VIEW can be created in the DBMS, and each view represents a particular SQL query. Once the view is created, the query it represents is hidden from the user, and instead the view appears to be just another table. The contents of the VIEW always remains identical to the result of running the query which the view represents.

Lets say you want a query to tell you how many drivers there are and how many cars exist in the database. You could run two different queries:

SELECT count(*) from DRIVER;
SELECT count(*) from CAR;

Instead, lets put each of them in a VIEW

CREATE VIEW count1 (total) AS SELECT count(*) from DRIVER;
CREATE VIEW count2 (total) AS SELECT count(*) from CAR;

SELECT * from count1;
total
3
SELECT * from count2;
total
5
SELECT count1.total,count2.total from count1,count2;
totaltotal
35

DROP View

Once you are finished with your VIEW, you can delete it. The command to do this is DROP VIEW viewname. Continuing our count1 and count2 example, to delete the count1 view you would type:

DROP VIEW count1;

Previous Page Next Page Up One Level