JOINs and VIEWs

Sometimes you will need to write a query which uses more than a single table. This is perfectly acceptable in SQL, but needs a little care... It is very easy to produce multi-table queries which produce mostly rubbush.

The basic concept for producing multi-table queries is that all the tables you need must be listed in the FROM clause of the query. For example, lets try to write a query which lists the owner name, date of birth, and the registration number, for each car in the database. REGNO is in CAR, but DOB is in DRIVER. Therefore both tables are needed. The basic query looks like:

SELECT name,dob,regno
FROM   car,driver
;

The order in which the tables appear in the FROM line are irrelevent. However, this query does not produce the right answer. The reason for this is that the DBMS does not understand how to relate one table to the other.

JOIN condition

In order to usefully join multiple tables together we need to explain to the database how they are joined. The FROM clause takes all rows in all the tables listed, and forms a new table which contains all combinations of the original rows. Most of the time this results in rubbish. Look at this example.

SELECT *
FROM   car
;

REGNO	MAKE	COLOUR	PRICE	OWNER
F611 AAA	FORD	RED	12000	Jim Smith
J111 BBB	SKODA	BLUE	11000	Jim Smith
A155 BDE	MERCEDES	BLUE	22000	Bob Smith
K555 GHT	FIAT	GREEN	6000	Bob Jones
SC04 BFE	SMART	BLUE	13000

SELECT *
FROM   driver
;

NAME	DOB
Jim Smith	11 Jan 1980
Bob Smith	23 Mar 1981
Bob Jones	3 Dec 1986

SELECT *
FROM   car,driver
;

REGNO	MAKE	COLOUR	PRICE	OWNER	NAME	DOB
F611 AAA	FORD	RED	12000	Jim Smith	Jim Smith	11 Jan 1980
J111 BBB	SKODA	BLUE	11000	Jim Smith	Jim Smith	11 Jan 1980
A155 BDE	MERCEDES	BLUE	22000	Bob Smith	Jim Smith	11 Jan 1980
K555 GHT	FIAT	GREEN	6000	Bob Jones	Jim Smith	11 Jan 1980
SC04 BFE	SMART	BLUE	13000		Jim Smith	11 Jan 1980
F611 AAA	FORD	RED	12000	Jim Smith	Bob Smith	23 Mar 1981
J111 BBB	SKODA	BLUE	11000	Jim Smith	Bob Smith	23 Mar 1981
A155 BDE	MERCEDES	BLUE	22000	Bob Smith	Bob Smith	23 Mar 1981
K555 GHT	FIAT	GREEN	6000	Bob Jones	Bob Smith	23 Mar 1981
SC04 BFE	SMART	BLUE	13000		Bob Smith	23 Mar 1981
F611 AAA	FORD	RED	12000	Jim Smith	Bob Jones	3 Dec 1986
J111 BBB	SKODA	BLUE	11000	Jim Smith	Bob Jones	3 Dec 1986
A155 BDE	MERCEDES	BLUE	22000	Bob Smith	Bob Jones	3 Dec 1986
K555 GHT	FIAT	GREEN	6000	Bob Jones	Bob Jones	3 Dec 1986
SC04 BFE	SMART	BLUE	13000		Bob Jones	3 Dec 1986

In our query, we are only interested in table combinations which obey the rules of the FOREIGN KEY relationship which joins these two tables. If you remember, the PRIMARY KEY of DRIVER (NAME) was copied into CAR as a FOREIGN KEY (named OWNER). Thus this FROM generated table needs to be filtered so that only the rows where NAME = OWNER appear. Note that this FROM generated table containing all the combinations of the listed tables is known as the cartesian cross product. We will return to the subject of the cross product in the relational algebra chapter.

Now, in order to get our query working properly, we put in the JOIN condition NAME = OWNER. There are two basic ways to do this, which we will call traditional and modern. Both ways are usually referred to as an INNER JOIN.

Traditional JOIN

To put the join condition NAME = OWNER into a query using the traditional approach is simply to list it in the WHERE clause as a rule. So...

SELECT *
FROM   car,driver
WHERE  owner = name
;

REGNO	MAKE	COLOUR	PRICE	OWNER	NAME	DOB
F611 AAA	FORD	RED	12000	Jim Smith	Jim Smith	11 Jan 1980
J111 BBB	SKODA	BLUE	11000	Jim Smith	Jim Smith	11 Jan 1980
A155 BDE	MERCEDES	BLUE	22000	Bob Smith	Bob Smith	23 Mar 1981
K555 GHT	FIAT	GREEN	6000	Bob Jones	Bob Jones	3 Dec 1986

Modern JOIN

To put the join condition NAME = OWNER into a query using the modern approach, you rewrite the FROM line to say:

SELECT *
FROM   car JOIN driver ON (owner = name)
;

REGNO	MAKE	COLOUR	PRICE	OWNER	NAME	DOB
F611 AAA	FORD	RED	12000	Jim Smith	Jim Smith	11 Jan 1980
J111 BBB	SKODA	BLUE	11000	Jim Smith	Jim Smith	11 Jan 1980
A155 BDE	MERCEDES	BLUE	22000	Bob Smith	Bob Smith	23 Mar 1981
K555 GHT	FIAT	GREEN	6000	Bob Jones	Bob Jones	3 Dec 1986

OUTER JOIN

You might have noticed a result in the previous query (when there were no join conditions) where there was a NULL in the OWNER field. This is for a car with no current owner. Once the join condition was inserted into the query the rows with NULL owners were filtered out. This is usually exactly what is desired, but sometimes we want the join condition to be obeyed if the fields are not NULL, and the rules to be broken when there is a NULL. Such JOINs are called OUTER JOINS. In the modern JOIN syntax you simply insert either the word LEFT or the word RIGHT in front of the word JOIN.

To decide if the right word is LEFT of RIGHT, you have to consider where the NULL values will be. In our example query, the NULL value is in the OWNER field, which belongs to the CAR table. The current JOIN is:

SELECT *
FROM   car LEFT JOIN driver ON (owner = name)
;

REGNO	MAKE	COLOUR	PRICE	OWNER	NAME	DOB
F611 AAA	FORD	RED	12000	Jim Smith	Jim Smith	11 Jan 1980
J111 BBB	SKODA	BLUE	11000	Jim Smith	Jim Smith	11 Jan 1980
A155 BDE	MERCEDES	BLUE	22000	Bob Smith	Bob Smith	23 Mar 1981
K555 GHT	FIAT	GREEN	6000	Bob Jones	Bob Jones	3 Dec 1986
SC04 BFE	SMART	BLUE	13000

The OUTER JOIN fills in the missing data (for the things which do not satisfy the rules) with NULLs. Note that if you swap CAR and DRIVER around in the JOIN statement you can write it as a RIGHT JOIN just as easily...

SELECT *
FROM   driver RIGHT JOIN car ON (owner = name)
;

The order of the rules in ON have no significance in deciding what is right and what is left.

FULL OUTER JOIN

Now, David Davis does not own a car, and thus never appears in a normal inner JOIN. In an outer join, we can have:

SELECT *
FROM   car LEFT JOIN driver ON (owner = name)
;

REGNO	MAKE	COLOUR	PRICE	OWNER	NAME	DOB
F611 AAA	FORD	RED	12000	Jim Smith	Jim Smith	11 Jan 1980
J111 BBB	SKODA	BLUE	11000	Jim Smith	Jim Smith	11 Jan 1980
A155 BDE	MERCEDES	BLUE	22000	Bob Smith	Bob Smith	23 Mar 1981
K555 GHT	FIAT	GREEN	6000	Bob Jones	Bob Jones	3 Dec 1986
SC04 BFE	SMART	BLUE	13000

SELECT *
FROM   car RIGHT JOIN driver ON (owner = name)
;

REGNO	MAKE	COLOUR	PRICE	OWNER	NAME	DOB
F611 AAA	FORD	RED	12000	Jim Smith	Jim Smith	11 Jan 1980
J111 BBB	SKODA	BLUE	11000	Jim Smith	Jim Smith	11 Jan 1980
A155 BDE	MERCEDES	BLUE	22000	Bob Smith	Bob Smith	23 Mar 1981
K555 GHT	FIAT	GREEN	6000	Bob Jones	Bob Jones	3 Dec 1986
					David Davis	1 Oct 1975

In some relatively unusual queries, it might be useful if we see all the rows which obey the join condition, followed by the rows left over from each of the tables involved in the join. This is called a FULL OUTER JOIN and is written in SQL as FULL JOIN.

SELECT *
FROM   car FULL JOIN driver ON (owner = name)
;

REGNO	MAKE	COLOUR	PRICE	OWNER	NAME	DOB
F611 AAA	FORD	RED	12000	Jim Smith	Jim Smith	11 Jan 1980
J111 BBB	SKODA	BLUE	11000	Jim Smith	Jim Smith	11 Jan 1980
A155 BDE	MERCEDES	BLUE	22000	Bob Smith	Bob Smith	23 Mar 1981
K555 GHT	FIAT	GREEN	6000	Bob Jones	Bob Jones	3 Dec 1986
SC04 BFE	SMART	BLUE	13000
					David Davis	1 Oct 1975

Naming

In some complex queries the DBMS may not be able to identify what table an attribute belongs to. For instance, joining two tables ALPHA and BRAVO, where both tables have a column called NAME. Running the following:

What you are allowed to do is in front of a column name you can say with table that column belongs to. If you wanted to say "name" in ALPHA, you could say alpha.name. Now it is clear what table the column belongs to, and the query will work:

Aliases

Sometimes you can be dealing with large table names, and finding you have to put the table name in front of many of the attribute names. This can be a lot of typing. SQL allows you to pretend that a table is called something else for the duration of your query. This is called aliasing. So instead of

SELECT c.regno,c.owner,d.dob
FROM   car c JOIN driver d ON (c.owner = d.name)
;

Remember you only have to use aliases if you want to, and decorate attributes with the table names when the computer cannot work out which table attribute you are talking about.

Self Joins

Self-joins, or Equijoins, are where you want the query to use the same table more than once, but each time you use it for a different purpose.

SELECT colour FROM car WHERE owner = 'Bob Smith';

colour
BLUE

SELECT owner FROM car
WHERE colour = 'BLUE'
AND   owner != 'Bob Smith'
AND   owner NOT NULL

owner
Jim Smith

To run this query, we need to use CAR twice. First to find the colour, and then to find the other owners. Thus CAR is used for two different purposes. To combine these queries together, we need to use CAR twice. To make this work, we need to use table aliases to make CAR appear to be two different tables. After that, its easy!

SELECT other.owner
FROM  car bobsmith, car other
WHERE bobsmith.colour = other.colour -- join on the colour
AND   bobsmith.owner = 'Bob Smith'   -- In bobsmith look only for Bob Smith
AND   bobsmith.owner != other.owner  -- OTHER cannot be Bob Smith
AND   other.owner NOT NULL           -- Exclude cars without owners

owner
Jim Smith

VIEWs

When writing queries, the query can get longer and longer. In addition, sometimes you find that a single query uses the same rules in two different parts of the query. In programming languages you would move the duplicated code into some sort of library of subroutines. In SQL, the idea of a subroutine is called a VIEW.

A VIEW can be created in the DBMS, and each view represents a particular SQL query. Once the view is created, the query it represents is hidden from the user, and instead the view appears to be just another table. The contents of the VIEW always remains identical to the result of running the query which the view represents.

Lets say you want a query to tell you how many drivers there are and how many cars exist in the database. You could run two different queries:

CREATE VIEW count1 (total) AS SELECT count(*) from DRIVER;
CREATE VIEW count2 (total) AS SELECT count(*) from CAR;

SELECT * from count1;

total
3

SELECT * from count2;

total
5

SELECT count1.total,count2.total from count1,count2;

total	total
3	5

DROP View

Once you are finished with your VIEW, you can delete it. The command to do this is DROP VIEW viewname. Continuing our count1 and count2 example, to delete the count1 view you would type: