Lecture Slides available: PDF PowerPoint

Subqueries and Schema

Subqueries

One SELECT statement can be used inside another, allowing the result of executing one query to be used in the WHERE rules of the other SELECT statement. Where one SELECT statement appears within another SELECT statement's WHERE clause it is known as a SUBQUERY.

One limitation of subqueries is that it can only return one attribute. This means that the subquery can only have one attribute in its SELECT line. If you supply more than one attribute the system will report an error.

Subqueries are generally used in situations where one might normally use a self join or a view. Subqueries tend to be much easier to understand.

Simple Example

Who in the database is older than Jim Smith?

SELECT dob FROM driver WHERE name = 'Jim Smith'

dob
11 Jan 1980

SELECT name FROM driver WHERE dob > '11 Jan 1980';

name
Bob Smith
Bob Jones

SELECT name 
FROM driver
WHERE dob > (SELECT dob FROM driver WHERE name = 'Jim Smith')

This subquery works well, and is simple to understand, but you must take care that the subquery returns only 1 row. If there were two people called Jim Smith, the query would return two different dates of birth, and this would break the query. To get around this problem, we use ANY or ALL.

ANY and ALL

This allows us to handle subqueries which return multiple rows. You still must only have a subquery which has only a single column. If you put ANY in front of a query, then the rule you provide must be true for at least 1 of the rows returned. If you put ALL in front of the subquery, then your rule must be true for all the rows returned.

Question: What cars are the same colour as a car owned by Jim Smith?

Jim Smith owns two cars, and their colours are RED and BLUE. We want to know what cars are EITHER RED or BLUE...

SELECT regno FROM car
WHERE colour = ANY (SELECT colour FROM car WHERE owner = 'Jim Smith')
;

Question: List the drivers younger than all the people who own a blue car.

This is really looking for the age of people who own a BLUE car (2 people) and listing drivers who are younger than both of these people.

SELECT name,dob
FROM driver
WHERE dob > ALL (
              SELECT dob
              FROM car join driver on (owner=name)
              WHERE colour = 'BLUE'
      )
;

IN and NOT IN for subqueries

Just like IN could be used with something like ('BLUE','BLACK'), a subquery returns a similar construct which can similarly be accessed using IN

Question: Which cars the same colour as one of Jim Smith's cars?

SELECT regno FROM car
WHERE colour IN (SELECT colour FROM car WHERE owner = 'Jim Smith')
;

Question: Which cars do not have the same colour as one of Jim Smith's cars?

SELECT regno FROM car
WHERE colour NOT IN (SELECT colour FROM car WHERE owner = 'Jim Smith')
;

EXISTS

In almost all cases, when a question involves uniqueness then you can do it with a subquery and EXISTS or NOT EXISTS. The EXISTS operator is a simple test, which is TRUE if the subquery returns at least 1 row, and FALSE if it return 0 rows. NOT EXISTS does the opposite.

Question: List the colours which are only used once in the database.

SELECT colour
FROM car a
WHERE exists (
        select colour             -- does not matter what is selected
        from car b                -- As we use CAR twice, call this one b
        where a.colour = b.colour -- CAR rows with the same colour as a
        and   a.regno != b.regno  -- but a car different to the one in a
     );

Remember that the rules are processed for each row of a. So the query looks at row 1 of a, runs the subquery, and decides if the colour is unique. It then moves to row 2 of a and reruns the subquery.

UNION

Sometimes it is desirable to merge the results of two queries together to form a single output table. This is known as UNION. UNION only works if each query in the statement has the same number of columns, and each of the corresponding columns are of the same type.

Question: List all the drivers in the DRIVER table, and show how many cars each of them own. If a driver owns no cars, the total should be 0. We will assume that David Davis has been added to the DRIVER table, but that he owns no cars.

SELECT name,count(*)
FROM   driver JOIN car on (name = owner)

NAME	count(*)
Jim Smith	2
Bob Smith	1
Bob Jones	1

This does not show David Davis, but we could write a query to find people who own no cars using NOT IN and a subquery.

SELECT name,0
FROM   driver
WHERE  name not in (select owner from car);

name	0
David Davis	0

Now, we can merge these two results together using UNION, and thus:

SELECT name,count(*)
FROM   driver JOIN car on (name = owner)
UNION
SELECT name,0
FROM   driver
WHERE  name not in (select owner from car)

NAME	count(*)
Jim Smith	2
Bob Smith	1
Bob Jones	1
David Davis	0

Changing Data

So far we have just looked at SELECT but we need to be able to do other operations as follows:

INSERT - which writes new rows into a database
DELETE - which deletes rows from a database
UPDATE - which changes values in existing rows

INSERT

The INSERT command allows you to put new rows into a table.

INSERT INTO table_name
 [(column_list)] VALUES (value_list)

The column_list lists columns to be assigned values. It can be omitted if every column is to be assigned a value. The value_list is a set of literal values giving the value for each column in column_list or CREATE TABLE order.

insert into driver
  values ('Jessie James','31 Nov 1892');
insert into driver (name,dob)
  values ('John Johnstone','1 Aug 1996');

Usually you do not have to specify the columns in the insert statement, but doing so is useful in case someone changes the table at some point in the future. By mentioning the column names you are certain that the values specified are going into the correct columns.

DELETE

The DELETE command allows you to remove rows from a table.

 DELETE FROM table_name [WHERE condition];

the rows of table_name which satify the condition are deleted.

Example:

DELETE FROM car        -- Delete all rows from CAR
;

DELETE from car
WHERE  owner is null   -- Delete any row where a car has no owner
;

UPDATE

UPDATE allows you to write queries which change data already in a table. It cannot add more rows or take rows away.

UPDATE table_name
 SET column_name = expression,{column_name=expression}
 [WHERE condition]

For example, lets set all BLUE cars to GREEN.

UPDATE car SET colour = 'GREEN'
WHERE colour = 'BLUE';

This next example shows how the update calculation can be an expression. Lets add VAT of 17.5% to all prices in the CAR table. This is equivalent to multiplying the car price by 1.175.

UPDATE car SET price = price * 1.175

View Manipulation

When is the contents of a view calculated? The process of the DBMS calculating the contents of a view is called 'materialising the view'. In theory this could be:

When it is defined or
when it is accessed

If it is the former then subsequent inserts, deletes and updates would not be visible. If the latter then changes will be seen.

Some systems allow you to chose when views are materialised. Most do not, and views are materialised whenever they are accessed, thus all changes to the tables on which the view query is based can instantly be seen.

VIEW update, insert and delete

Can we change the data viewed through a view?

Yes, provided the primary key of all the base tables which make up the view are present in the view.

: Figure : VIEW which can be updated

The following view cannot be changed because we have no means of knowing which row of B to modify

: Figure : VIEW which cannot be updated

Controlling Schema

Up to this point we have assumed that the database has already been created. However, someome must be able to create the schema to allow table structures to be defined. In the ER diagram design phase, the process will take you from written specifications to sets of relations, including foreign key definitions. At that point the relations have to be rewritten into schema creation queries.

CREATE TABLE

CREATE TABLE allows the user to create the table schemas. It has a relatively simple structure, consisting of the column names and the type of each column. We have not really mentioned column types, but there are quite a few different types in an DBMS. The important ones are:

INTEGER - A column to hold numbers. Numbers with decimal points are not permitted. Examples: 5, 6, 10006.
REAL - A column to hold numbers which have decimal points. Examples could be pounds and pence. Examples 5.6, 1000.35567.
DECIMAL - A column to hold numbers which can have decimal points. It is used as DECIMAL(n) or DECIMAL(n,m), where n is the size of the number allowed before the decimal point, and m is the size allowed after the decimal point. If you do not specify m, it is assumed to be 0.
VARCHAR - ASCII characters with a length ranging from 0 characters up to some limit. It is usually used as VARCHAR(n), where n is the maximum number of characters which can be stored. Examples include 'Hello' and 'surprise birthday'.
CHAR - ASCII characters with a fixed length. It is usually used as CHAR(n), where n is the fixed length of the string. If you try to write a string into a CHAR which is shorter than n, spaces are added to the end of your string. For example, with CHAR(5), storing 'Hia' results in 'Hia '.
DATE - A column which holds a day/month/year date. Examples include '1 Jan 2003' and '31 Dec 1885'.

The actual syntax of the statement is:

CREATE TABLE tablename (
   colname type Optionaladditionalinfo
   ,colname type Optionaladditionalinfo
   ,colname type Optionaladditionalinfo
   ,optionaladditionalinfo
);

At the end of each column definition you can have some additional info. This could be range rules or key information. Common ones to use include

REFERENCES - This field is a foreign key which refers to the specified table and key. An example could be
A INTEGER REFERENCES B(C)
which would indicate that column A (an integer) is a foreign key which refers to a table called B, and relates to column C in B (C should be the primary key in a properly designed database).
PRIMARY KEY - This column is the primary key.
NOT NULL - This column must have a value.

At the end of the definition you can have some other types of optional additional information. There is a significant number of possibilities here, but the main ones include:

PRIMARY KEY (column1,column2,...) - If the table is has a composite primary key (more than 1 column makes up the key) then you must define the key in this way at the end of the definition.
FOREIGN KEY (column1,column2,...) REFERENCES othertable - If the table has a relationship with another table which has a composite key, then the columns in this table which form the foreign keys must be listed using this syntax.

Example

As some examples, lets defined the DRIVER and CAR tables.

CREATE TABLE driver (
  name		varchar(30)	PRIMARY KEY
 ,dob		DATE		NOT NULL
);

CREATE TABLE car (
  regno		VARCHAR(8)	PRIMARY KEY
 ,make		VARCHAR(20)
 ,colour	VARCHAR(30)
 ,price		DECIMAL(8,2)
 ,owner		VARCHAR(30)	REFERENCES driver(name)
);

Or, using the additional information aspects of the syntax, the following statements create the same table structures.

CREATE TABLE driver (
  name		varchar(30)
 ,dob		DATE		NOT NULL
 ,PRIMARY KEY (name)
);

CREATE TABLE car (
  regno		VARCHAR(8)
 ,make		VARCHAR(20)
 ,colour	VARCHAR(30)
 ,price		DECIMAL(8,2)
 ,owner		VARCHAR(30)
 ,PRIMARY KEY(regno)
 ,FOREIGN KEY(owner) REFERENCES driver
);

DROP TABLE

Eventually there may come a time when you want to remove a table. The basic syntax is:

DROP TABLE tablename

The only difficulty in dropping tables is that you cannot drop a table if another table refers to it via a foreign key relationship. This would break the referential integrity rules. Thus in our example we can drop CAR and then DRIVER, but we cannot drop DRIVER first.

DROP TABLE car;
DROP TABLE driver;

ALTER TABLE

Most database management systems allow you to alter the definition of a table after it has been constructed, using the ALTER TABLE command. There are many variants to this, and far too much to discuss in this introduction. One simple example would be if there was a need to add a column to DRIVER to indicate the driver's address. This could be done by:

ALTER TABLE driver ADD address varchar(50);

Order of Evaluation

In summary, consider the following information, which depicts the various options of the SELECT statement, and the approximate order in which each statement is evaluated:

 SELECT   [distinct] column_names	7,6 eliminate unwanted data
 FROM     table_list			1   Cartesian Product
[WHERE    conditions ]			2   Filter rows
[GROUP BY colum_list			3   Group Rows
  [HAVING conditions ]] 		4   eliminate unwanted groups
[ORDER BY column_list [DESC]]		5   Sort rows

Database Notes

Tutorial Activities

Future Stuff

Online Relational Algebra

News

Subqueries and Schema

Contents

Example