JOINs and VIEWs
Sometimes you will need to write a query which uses more than a single table. This is perfectly acceptable in SQL, but needs a little care... It is very easy to produce multi-table queries which produce mostly rubbush.
The basic concept for producing multi-table queries is that all the tables you need must be listed in the FROM clause of the query. For example, lets try to write a query which lists the owner name, date of birth, and the registration number, for each car in the database. REGNO is in CAR, but DOB is in DRIVER. Therefore both tables are needed. The basic query looks like:
SELECT name,dob,regno FROM car,driver ;
The order in which the tables appear in the FROM line are irrelevent. However, this query does not produce the right answer. The reason for this is that the DBMS does not understand how to relate one table to the other.
In order to usefully join multiple tables together we need to explain to the database how they are joined. The FROM clause takes all rows in all the tables listed, and forms a new table which contains all combinations of the original rows. Most of the time this results in rubbish. Look at this example.
SELECT * FROM car ;
SELECT * FROM driver ;
SELECT * FROM car,driver ;
In our query, we are only interested in table combinations which obey the rules of the FOREIGN KEY relationship which joins these two tables. If you remember, the PRIMARY KEY of DRIVER (NAME) was copied into CAR as a FOREIGN KEY (named OWNER). Thus this FROM generated table needs to be filtered so that only the rows where NAME = OWNER appear. Note that this FROM generated table containing all the combinations of the listed tables is known as the cartesian cross product. We will return to the subject of the cross product in the relational algebra chapter.
Now, in order to get our query working properly, we put in the JOIN condition NAME = OWNER. There are two basic ways to do this, which we will call traditional and modern. Both ways are usually referred to as an INNER JOIN.
To put the join condition NAME = OWNER into a query using the traditional approach is simply to list it in the WHERE clause as a rule. So...
SELECT * FROM car,driver WHERE owner = name ;
To put the join condition NAME = OWNER into a query using the modern approach, you rewrite the FROM line to say:
FROM table1 JOIN table2 ON (rules)So in our example:
SELECT * FROM car JOIN driver ON (owner = name) ;
You might have noticed a result in the previous query (when there were no join conditions) where there was a NULL in the OWNER field. This is for a car with no current owner. Once the join condition was inserted into the query the rows with NULL owners were filtered out. This is usually exactly what is desired, but sometimes we want the join condition to be obeyed if the fields are not NULL, and the rules to be broken when there is a NULL. Such JOINs are called OUTER JOINS. In the modern JOIN syntax you simply insert either the word LEFT or the word RIGHT in front of the word JOIN.
To decide if the right word is LEFT of RIGHT, you have to consider where the NULL values will be. In our example query, the NULL value is in the OWNER field, which belongs to the CAR table. The current JOIN is:
FROM car JOIN driver on (owner = name) ^ ^ ^ | | | | | +-------- To the right of JOIN | +--------------- The JOIN statement +------------------------- To the left of JOINAs the CAR table has the NULL values, and CAR appears to the left of the word JOIN in the query, the right keyword to use is LEFT JOIN. The query becomes:
SELECT * FROM car LEFT JOIN driver ON (owner = name) ;
The OUTER JOIN fills in the missing data (for the things which do not satisfy the rules) with NULLs. Note that if you swap CAR and DRIVER around in the JOIN statement you can write it as a RIGHT JOIN just as easily...
SELECT * FROM driver RIGHT JOIN car ON (owner = name) ;
The order of the rules in ON have no significance in deciding what is right and what is left.
First, assume that we have added a new row to DRIVER, so that it now reads as:
Now, David Davis does not own a car, and thus never appears in a normal inner JOIN. In an outer join, we can have:
SELECT * FROM car LEFT JOIN driver ON (owner = name) ;
SELECT * FROM car RIGHT JOIN driver ON (owner = name) ;
In some relatively unusual queries, it might be useful if we see all the rows which obey the join condition, followed by the rows left over from each of the tables involved in the join. This is called a FULL OUTER JOIN and is written in SQL as FULL JOIN.
SELECT * FROM car FULL JOIN driver ON (owner = name) ;
In some complex queries the DBMS may not be able to identify what table an attribute belongs to. For instance, joining two tables ALPHA and BRAVO, where both tables have a column called NAME. Running the following:
SELECT name from ALPHA,BRAVOwould produce an error. The problem is when you say "name" is it the one in ALPHA or the one in BRAVO? Instead you have to make the query more explicit.
What you are allowed to do is in front of a column name you can say with table that column belongs to. If you wanted to say "name" in ALPHA, you could say alpha.name. Now it is clear what table the column belongs to, and the query will work:
SELECT alpha.name from ALPHA,BRAVO
Sometimes you can be dealing with large table names, and finding you have to put the table name in front of many of the attribute names. This can be a lot of typing. SQL allows you to pretend that a table is called something else for the duration of your query. This is called aliasing. So instead of
SELECT car.owner from caryou can write
SELECT c.owner FROM car cIn this way aliases provide a shorthand way to refer to tables. In a more complex example:
SELECT c.regno,c.owner,d.dob FROM car c JOIN driver d ON (c.owner = d.name) ;
Remember you only have to use aliases if you want to, and decorate attributes with the table names when the computer cannot work out which table attribute you are talking about.
Self-joins, or Equijoins, are where you want the query to use the same table more than once, but each time you use it for a different purpose.
Consider the question "Who drives a car the same colour as Bob Smith"?
SELECT colour FROM car WHERE owner = 'Bob Smith';
SELECT owner FROM car WHERE colour = 'BLUE' AND owner != 'Bob Smith' AND owner NOT NULL
To run this query, we need to use CAR twice. First to find the colour, and then to find the other owners. Thus CAR is used for two different purposes. To combine these queries together, we need to use CAR twice. To make this work, we need to use table aliases to make CAR appear to be two different tables. After that, its easy!
SELECT other.owner FROM car bobsmith, car other WHERE bobsmith.colour = other.colour -- join on the colour AND bobsmith.owner = 'Bob Smith' -- In bobsmith look only for Bob Smith AND bobsmith.owner != other.owner -- OTHER cannot be Bob Smith AND other.owner NOT NULL -- Exclude cars without owners
When writing queries, the query can get longer and longer. In addition, sometimes you find that a single query uses the same rules in two different parts of the query. In programming languages you would move the duplicated code into some sort of library of subroutines. In SQL, the idea of a subroutine is called a VIEW.
A VIEW can be created in the DBMS, and each view represents a particular SQL query. Once the view is created, the query it represents is hidden from the user, and instead the view appears to be just another table. The contents of the VIEW always remains identical to the result of running the query which the view represents.
Lets say you want a query to tell you how many drivers there are and how many cars exist in the database. You could run two different queries:
SELECT count(*) from DRIVER; SELECT count(*) from CAR;
Instead, lets put each of them in a VIEW
CREATE VIEW count1 (total) AS SELECT count(*) from DRIVER; CREATE VIEW count2 (total) AS SELECT count(*) from CAR; SELECT * from count1;
SELECT * from count2;
SELECT count1.total,count2.total from count1,count2;
Once you are finished with your VIEW, you can delete it. The command to do this is DROP VIEW viewname. Continuing our count1 and count2 example, to delete the count1 view you would type:
DROP VIEW count1;