Lecture Slides available: PDF PowerPoint

Structured Query Language

In the other chapters of this course consideration is given to producing a good design for a database structure or schema. In this chapter the focus is on applying this schema to a database management system, and then using that DBMS to allow storage and retrieval of data.

To communicate with the database system itself we need a language. SQL is an international standard language for manipulating relational databases. It is based on an IBM product. SQL is short for Structured Query Language.

SQL can create schemas, delete them, and change them. It can also put data into schemas and remove data. It is a data handling language, but it is not a programming language.

SQL is a DSL (Data Sub Language), which is really a combination of two languages. These are the Data Definition Language (DDL) and the Data Manipulation Language (DML). Schema changes are part of the DDL, while data changes are part of the DML. We will consider both parts of the DSL in this discussion of SQL.

Database Models

A data model comprises

a data structure
a set of integrity constraints
operations associated with the data structure

Examples of data models include:

hierarchic
network
relational

Models other than the relational database module used to be quite popular. Each model type is appropriate to particular types of problem. The Relational model type is the most popular in use today, and the other types are not discussed further.

Relational Databases

The relational data model comprises:

relational data structure
relational integrity constraints
relational algebra or equivalent (SQL)

SQL is an ISO language based on relational algebra

relational algebra is a mathematical formulation

Relational Data Structure

A relational data structure is a collection of tables or relations.

A relation is a collection of rows or tuples
A tuple is a collection of columns or attributes
A domain is a pool of values from which the actual attribute values are taken.

: Figure : Tuples and Domains

Domain and Integrity Constraints

Domain Constraints
- limit the range of domain values of an attribute
- specify uniqueness and `nullness' of an attribute
- specify a default value for an attribute when no value is provided.
Entity Integrity
- every tuple is uniquely identified by a unique non-null attribute, the primary key.
Referential Integrity
- rows in different tables are correctly related by valid key values (`foreign' keys refer to primary keys).

Structure of a Table

In the design process tables are defined, and the relationships between tables identified. Remember a relationship is just a link between two concepts. Consider a table holding "drivers" and a table holding "car" information... Each car is owned by a driver, and therefore there is a link between "car" and "driver" to indicate which driver owns which car.

In the subsequent pages we will refer back to this driver and car arrangement. To make the examples easier, lets create some example data.

CAR

The CAR table has the following structure:

REGNO : The registration number of the car
MAKE : The manufacturer of the car
COLOUR: The colour of the car
PRICE : The price of the car when it was bought new

DRIVER

The DRIVER table has the following structure:

NAME : The full name of the driver
DOB : The data of birth of the driver

Relationship between CAR and DRIVER

The DRIVER and the CAR has a relationship between them of N:1. This indicates that a CAR can have only 1 DRIVER, but that a DRIVER can own more than 1 CAR simultaneously.

: Figure : ER Diagram of DRIVER and CAR

In the design section we can see that this requires a FOREIGN KEY in the CAR end of the relationship. This foreign key allows us to implement the relationship in the database. We will call this field OWNER.

Example Data

DRIVER

NAME	DOB
Jim Smith	11 Jan 1980
Bob Smith	23 Mar 1981
Bob Jones	3 Dec 1986

CAR

REGNO	MAKE	COLOUR	PRICE	OWNER
F611 AAA	FORD	RED	12000	Jim Smith
J111 BBB	SKODA	BLUE	11000	Jim Smith
A155 BDE	MERCEDES	BLUE	22000	Bob Smith
K555 GHT	FIAT	GREEN	6000	Bob Jones
SC04 BFE	SMART	BLUE	13000

Columns or Attributes

Each column is given a name which is unique within a table

Each column holds data of one specified type. E.g.

 integer               decimal
 character text        data
  -- the range of values can be further constrained

If a column of a row contains no data, we say it is NULL. For example, a car just off the production line might not have an owner in the database until someone buys the car. A NULL value may also indicate that the value is unavailable or inappropriate . This might be the case for a car which is being destroyed or a car where two people are arguing in court that they are both the owner.

Some important rules:

All rows of a table must be different in some way from all other rows.
Sometimes a row is referred to as a Tuple.
Cardinality is the number of ROWS in a table.
Arity is the number of COLUMNS in a table.

Primary Keys

A table requires a key which uniquely identifies each row in the table. This is entity integrity.

The key could have one column, or it could use all the columns. It should not use more columns than necessary. A key with more than one column is called a composite key.

A table may have several possible keys, the candidate keys, from which one is chosen as the primary key.

No part of a primary key may be NULL.

If the rows of the data are not unique, it is necessary to generate an artificial primary key.

In our example, DRIVER has a primary key of NAME, and CAR has a primary key of REGNO. This database will break if there are two drivers with the same name, but it gives you an idea what the primary key means...

Note that if for some reason JIM SMITH decided to change his name to "BRIAN SMITH", then not only would this have to be changed in DRIVER, but it would also have to be changed in CAR. If you changed it only in DRIVER, there would be some foreign keys pointing to DRIVER looking for a driver who does not exist. This would be an error called a REFERENTIAL INTEGRITY error, and the DBMS stops you making changes to the database which would result in such an error.

SQL Basics

Basic SQL Statements include:

CREATE - a data structure
SELECT - read one or more rows from a table
INSERT - one or more rows into a table
DELETE - one or more rows from a table
UPDATE - change the column values in a row
DROP - a data structure

In the remainder of this section only simple SELECT statements are considered.

Simple SELECT

The syntax of a SELECT statement is :

SELECT column FROM tablename

This would produce all the rows from the specified table, but only for the particular column mentioned. If you want more than one column shown, you can put in multiple columns separating them with commas, like:

SELECT column1,column2,column3 FROM tablename

If you want to see all the columns of a particular table, you can type:

SELECT * FROM tablename

Lets see it in action on CAR...

SELECT * FROM car;

REGNO	MAKE	COLOUR	PRICE	OWNER
F611 AAA	FORD	RED	12000	Jim Smith
J111 BBB	SKODA	BLUE	11000	Jim Smith
A155 BDE	MERCEDES	BLUE	22000	Bob Smith
K555 GHT	FIAT	GREEN	6000	Bob Jones
SC04 BFE	SMART	BLUE	13000

SELECT regno FROM car;

REGNO
F611 AAA
J111 BBB
A155 BDE
K555 GHT
SC04 BFE

SELECT colour,owner FROM car;

COLOUR	OWNER
RED	Jim Smith
BLUE	Jim Smith
BLUE	Bob Smith
GREEN	Bob Jones
BLUE

In SQL, you can put extra space characters and return characters just about anywhere without changing the meaning of the SQL. SQL is also case-insensitive (except for things in quotes). In addition, SQL in theory should always end with a ';' character. You need to include the ';' if you have two different SQL queries so that the system can tell when one SQL statement stops and another one starts. If you forget the ';' the online interface will put one in for you. For these reasons all of the following statements are identical and valid.

SELECT REGNO FROM CAR;

SELECT REGNO FROM CAR

Select REGNO from CAR

select regno FROM car


SELECT 
    regno
 FROM             car;

Comments

Sometimes you might want to write a comment in somewhere as part of an SQL statement. A comment in this case is a simple piece of text which is meaningful to yourself, but should be ignored by the database. The characters '--', when they appear in a query, indicate the start of a comment. Everything after that point is ignored until the end of that line. The following queries are all equivalent.

SELECT regno
FROM   car;

SELECT regno -- The registration number
FROM   car   -- The car storage table
;

Warning: You cannot put a comment immediately after a ';'. Comments are only supported within the text of an SQL statement. The following will cause SQL errors:

SELECT regno  
FROM   car;   -- Error here as comment is after the query

-- Error here as comment is before the start of the query
SELECT regno  
FROM   car;

SELECT filters

Displaying all the rows of a table can be handy, but if we have tables with millions of rows then this type of query could take hours. Instead, we can add "filters" onto a SELECT statement to only show specific rows of a table. These filters are written into an optional part of the SELECT statement, known as a WHERE clause.

SELECT columns
FROM table
WHERE rule

The "rule" section of the WHERE clause is checked for every row that a select statement would normally show. If the whole rule is TRUE, then that row is shown, whereas if the rule is FALSE, then that row is not shown.

The rule itself can be quite complex. The simplest rule is a single equality test, such as "COLOUR = 'RED'".

Without the WHERE rule would show:

SELECT regno from CAR;

REGNO
F611 AAA
J111 BBB
A155 BDE
K555 GHT
SC04 BFE

From the database we know that only F611 AAA is RED, and the rest of the cars are either BLUE or GREEN. Thus a rule COLOUR = 'RED' is only true on the row with F611 AAA, and false elsewhere. With everything in a query:

SELECT regno from CAR
WHERE colour = 'RED';

REGNO
F611 AAA

An important point to note is that queries are case sensitive between the quotes. Thus 'RED' will work, but 'red' will produce nothing. The case used in the quotes must match perfectly the case stored in the table. SQL is not forgiving and if you forget you can be scratching you head for hours trying to fix it.

Note also that "colour" does not have to appear on the SELECT line as a column name. It can if you want to see the colour, but there is no requirement for it to be there. Therefore this will work too:

SELECT regno,colour from CAR
WHERE colour = 'RED';

REGNO	COLOUR
F611 AAA	RED

Comparisons

SQL supports a variety of comparison rules for use in a WHERE clause. These include =,!=,<>, <, <=, >, and >=.

Examples of a single rule using these comparisons are:

WHERE colour = 'RED'	The colour attribute must be RED
WHERE colour != 'RED'	The colour must be a colour OTHER THAN RED
WHERE colour <> 'RED'	The same as !=
WHERE PRICE > 10000	The price of the car is MORE THAN 10000
WHERE PRICE >= 10000	The price of the car is EQUAL TO OR MORE THAN 10000
WHERE PRICE < 10000	The price of the car is LESS THAN 10000
WHERE PRICE <= 10000	The price of the car is EQUAL TO OR LESS THAN 10000

Note that when dealing with strings, like RED, you must say 'RED'. When dealing with numbers, like 10000, you can say '10000' or 10000. The choice is yours.

Dates

Date rules are some of the hardest rules to get right in writing SQL, yet there is nothing particularly complex about them. The hard part is working out what it means to be GREATER THAN a particular date.

In date calculations, you can use all the normal comparators.

SELECT name,dob from driver

NAME	DOB
Jim Smith	11 Jan 1980
Bob Smith	23 Mar 1981
Bob Jones	3 Dec 1986

SELECT name,dob from driver
WHERE DOB = '3 Dec 1986'

NAME	DOB
Bob Jones	3 Dec 1986

In other comparators, it is important to realise that a date gets bigger as you move into the future, and smaller as you move into the past. Thus to say 'DATE1 < DATE2' you are stating that DATE1 occurs before DATE2 on a calender. For example, to find all drivers who were born on or after the 1st Jan 1981 you would do:

SELECT name,dob from driver
WHERE DOB >= '1 Jan 1981'

NAME	DOB
Bob Smith	23 Mar 1981
Bob Jones	3 Dec 1986

The syntax for dates does change slightly on difference database systems, but the syntax '1 Jan 2000' works in general on all systems. Oracle also allows dates like '1-Jan-2000' and '1-Jan-00'. If you specify a year using only the last two digits, Oracle uses the current date to compute the missing parts of the year, converting '00' to '2000'. Do not get confused by saying '87' for '1987' and ending up with '2087'!

BETWEEN

Sometimes when you are dealing with dates you want to specify a range of dates to check. The best way of doing this is using BETWEEN. For instance, to find all the drivers born between 1995 and 1999 you could use:

SELECT name,dob from driver
WHERE DOB between '1 Jan 1985' and '31 Dec 1999'

NAME	DOB
Bob Jones	3 Dec 1986

Note that the dates have day of the month and month in them, and not just the year. In SQL, all dates must have a month and a year. If you try to use just a year the query will fail.

BETWEEN works for other things, not just dates. For instance, to find cars worth between 5000 and 10000, you could execute:

SELECT regno
FROM car
where price between 5000 and 10000;

REGNO	PRICE
K555 GHT

NULL

The NULL value indicates that something has no real value. For this reason the normal value comparisons will always fail if you are dealing with a NULL. If you are looking for NULL, for instance looking for cars without owners using OWNER of CAR, all of the following are wrong!

SELECT regno from CAR WHERE OWNER = NULL            WRONG!
SELECT regno from CAR WHERE OWNER = 'NULL'          WRONG!

Instead SQL has a special comparison operator called IS which allows us to find NULL values. There is also an opposite to IS, called IS NOT, which finds all the values which are not NULL. So finding all the regnos of cars with current owners would be (note that if they have an owner, then the owner has a value and thus is NOT NULL):

SELECT REGNO from CAR
WHERE OWNER is not NULL

REGNO
F611 AAA
J111 BBB
A155 BDE
K555 GHT

And finding cars without owners would be:

SELECT REGNO from CAR
WHERE OWNER is NULL

REGNO
SC04 BFE

LIKE

When dealing with strings, sometimes you do not want to match on exact strings like ='RED', but instead on partial strings, substrings, or particular patterns. This could allow you, for instance, to find all cars with a colour starting with 'B'. The LIKE operator provides this functionality.

The LIKE operator is used in place of an '=' sign. In its basic form it is identical to '='. For instance, both of the following statements are identical:

SELECT regno FROM car WHERE colour = 'RED';
SELECT regno FROM car WHERE colour LIKE 'RED';

The power of LIKE is that it supports two special characters, '%' and '-'. These are equivalent to the '*' and '?' wildcard characters of DOS. Whenever there is an '-' character in the string, any character will match. Whenever there is an '%' character in the string, 0 or more characters will match. Consider these rules:

name LIKE 'Jim Smith'	Matches 'Jim Smith'
name LIKE '_im Smith'	Matches things like 'Jim Smith' or 'Tim Smith'
name LIKE '___ Smith'	Matches 'Jim Smith' and 'Bob Smith'
name LIKE '% Smith'	Matches 'Jim Smith' and 'Bob Smith'
name LIKE '% S%'	Matches 'Jim Smith' and 'Bob Smith'
name LIKE 'Bob %'	Matches 'Bob Jones' and 'Bob Smith'
name LIKE '%'	Matches anything not null

Note however that LIKE is more powerful than a simple '=' operator, and thus takes longer to run. If you are not using any wildcard characters in a LIKE operator then you should always replace LIKE with '='.

Database Notes

Tutorial Activities

Future Stuff

Online Relational Algebra

News

Structured Query Language

Contents

DRIVER

CAR