Structured Query Language
In the other chapters of this course consideration is given to producing a good design for a database structure or schema. In this chapter the focus is on applying this schema to a database management system, and then using that DBMS to allow storage and retrieval of data.
To communicate with the database system itself we need a language. SQL is an international standard language for manipulating relational databases. It is based on an IBM product. SQL is short for Structured Query Language.
SQL can create schemas, delete them, and change them. It can also put data into schemas and remove data. It is a data handling language, but it is not a programming language.
SQL is a DSL (Data Sub Language), which is really a combination of two languages. These are the Data Definition Language (DDL) and the Data Manipulation Language (DML). Schema changes are part of the DDL, while data changes are part of the DML. We will consider both parts of the DSL in this discussion of SQL.
A data model comprises
Examples of data models include:
Models other than the relational database module used to be quite popular. Each model type is appropriate to particular types of problem. The Relational model type is the most popular in use today, and the other types are not discussed further.
The relational data model comprises:
A relational data structure is a collection of tables or relations.
In the design process tables are defined, and the relationships between tables identified. Remember a relationship is just a link between two concepts. Consider a table holding "drivers" and a table holding "car" information... Each car is owned by a driver, and therefore there is a link between "car" and "driver" to indicate which driver owns which car.
In the subsequent pages we will refer back to this driver and car arrangement. To make the examples easier, lets create some example data.
The CAR table has the following structure:
The DRIVER table has the following structure:
The DRIVER and the CAR has a relationship between them of N:1. This indicates that a CAR can have only 1 DRIVER, but that a DRIVER can own more than 1 CAR simultaneously.
In the design section we can see that this requires a FOREIGN KEY in the CAR end of the relationship. This foreign key allows us to implement the relationship in the database. We will call this field OWNER.
Each column is given a name which is unique within a table
Each column holds data of one specified type. E.g.
integer decimal character text data -- the range of values can be further constrained
If a column of a row contains no data, we say it is NULL. For example, a car just off the production line might not have an owner in the database until someone buys the car. A NULL value may also indicate that the value is unavailable or inappropriate . This might be the case for a car which is being destroyed or a car where two people are arguing in court that they are both the owner.
Some important rules:
A table requires a key which uniquely identifies each row in the table. This is entity integrity.
The key could have one column, or it could use all the columns. It should not use more columns than necessary. A key with more than one column is called a composite key.
A table may have several possible keys, the candidate keys, from which one is chosen as the primary key.
No part of a primary key may be NULL.
If the rows of the data are not unique, it is necessary to generate an artificial primary key.
In our example, DRIVER has a primary key of NAME, and CAR has a primary key of REGNO. This database will break if there are two drivers with the same name, but it gives you an idea what the primary key means...
Note that if for some reason JIM SMITH decided to change his name to "BRIAN SMITH", then not only would this have to be changed in DRIVER, but it would also have to be changed in CAR. If you changed it only in DRIVER, there would be some foreign keys pointing to DRIVER looking for a driver who does not exist. This would be an error called a REFERENTIAL INTEGRITY error, and the DBMS stops you making changes to the database which would result in such an error.
Basic SQL Statements include:
In the remainder of this section only simple SELECT statements are considered.
The syntax of a SELECT statement is :
SELECT column FROM tablename
This would produce all the rows from the specified table, but only for the particular column mentioned. If you want more than one column shown, you can put in multiple columns separating them with commas, like:
SELECT column1,column2,column3 FROM tablename
If you want to see all the columns of a particular table, you can type:
SELECT * FROM tablename
Lets see it in action on CAR...
SELECT * FROM car;
SELECT regno FROM car;
SELECT colour,owner FROM car;
In SQL, you can put extra space characters and return characters just about anywhere without changing the meaning of the SQL. SQL is also case-insensitive (except for things in quotes). In addition, SQL in theory should always end with a ';' character. You need to include the ';' if you have two different SQL queries so that the system can tell when one SQL statement stops and another one starts. If you forget the ';' the online interface will put one in for you. For these reasons all of the following statements are identical and valid.
SELECT REGNO FROM CAR; SELECT REGNO FROM CAR Select REGNO from CAR select regno FROM car SELECT regno FROM car;
Sometimes you might want to write a comment in somewhere as part of an SQL statement. A comment in this case is a simple piece of text which is meaningful to yourself, but should be ignored by the database. The characters '--', when they appear in a query, indicate the start of a comment. Everything after that point is ignored until the end of that line. The following queries are all equivalent.
SELECT regno FROM car; SELECT regno -- The registration number FROM car -- The car storage table ;
Warning: You cannot put a comment immediately after a ';'. Comments are only supported within the text of an SQL statement. The following will cause SQL errors:
SELECT regno FROM car; -- Error here as comment is after the query -- Error here as comment is before the start of the query SELECT regno FROM car;
Displaying all the rows of a table can be handy, but if we have tables with millions of rows then this type of query could take hours. Instead, we can add "filters" onto a SELECT statement to only show specific rows of a table. These filters are written into an optional part of the SELECT statement, known as a WHERE clause.
SELECT columns FROM table WHERE rule
The "rule" section of the WHERE clause is checked for every row that a select statement would normally show. If the whole rule is TRUE, then that row is shown, whereas if the rule is FALSE, then that row is not shown.
The rule itself can be quite complex. The simplest rule is a single equality test, such as "COLOUR = 'RED'".
Without the WHERE rule would show:
SELECT regno from CAR;
From the database we know that only F611 AAA is RED, and the rest of the cars are either BLUE or GREEN. Thus a rule COLOUR = 'RED' is only true on the row with F611 AAA, and false elsewhere. With everything in a query:
SELECT regno from CAR WHERE colour = 'RED';
An important point to note is that queries are case sensitive between the quotes. Thus 'RED' will work, but 'red' will produce nothing. The case used in the quotes must match perfectly the case stored in the table. SQL is not forgiving and if you forget you can be scratching you head for hours trying to fix it.
Note also that "colour" does not have to appear on the SELECT line as a column name. It can if you want to see the colour, but there is no requirement for it to be there. Therefore this will work too:
SELECT regno,colour from CAR WHERE colour = 'RED';
SQL supports a variety of comparison rules for use in a WHERE clause. These include =,!=,<>, <, <=, >, and >=.
Examples of a single rule using these comparisons are:
Note that when dealing with strings, like RED, you must say 'RED'. When dealing with numbers, like 10000, you can say '10000' or 10000. The choice is yours.
Date rules are some of the hardest rules to get right in writing SQL, yet there is nothing particularly complex about them. The hard part is working out what it means to be GREATER THAN a particular date.
In date calculations, you can use all the normal comparators.
SELECT name,dob from driver
SELECT name,dob from driver WHERE DOB = '3 Dec 1986'
In other comparators, it is important to realise that a date gets bigger as you move into the future, and smaller as you move into the past. Thus to say 'DATE1 < DATE2' you are stating that DATE1 occurs before DATE2 on a calender. For example, to find all drivers who were born on or after the 1st Jan 1981 you would do:
SELECT name,dob from driver WHERE DOB >= '1 Jan 1981'
The syntax for dates does change slightly on difference database systems, but the syntax '1 Jan 2000' works in general on all systems. Oracle also allows dates like '1-Jan-2000' and '1-Jan-00'. If you specify a year using only the last two digits, Oracle uses the current date to compute the missing parts of the year, converting '00' to '2000'. Do not get confused by saying '87' for '1987' and ending up with '2087'!
Sometimes when you are dealing with dates you want to specify a range of dates to check. The best way of doing this is using BETWEEN. For instance, to find all the drivers born between 1995 and 1999 you could use:
SELECT name,dob from driver WHERE DOB between '1 Jan 1985' and '31 Dec 1999'
Note that the dates have day of the month and month in them, and not just the year. In SQL, all dates must have a month and a year. If you try to use just a year the query will fail.
BETWEEN works for other things, not just dates. For instance, to find cars worth between 5000 and 10000, you could execute:
SELECT regno FROM car where price between 5000 and 10000;
The NULL value indicates that something has no real value. For this reason the normal value comparisons will always fail if you are dealing with a NULL. If you are looking for NULL, for instance looking for cars without owners using OWNER of CAR, all of the following are wrong!
SELECT regno from CAR WHERE OWNER = NULL WRONG! SELECT regno from CAR WHERE OWNER = 'NULL' WRONG!
Instead SQL has a special comparison operator called IS which allows us to find NULL values. There is also an opposite to IS, called IS NOT, which finds all the values which are not NULL. So finding all the regnos of cars with current owners would be (note that if they have an owner, then the owner has a value and thus is NOT NULL):
SELECT REGNO from CAR WHERE OWNER is not NULL
And finding cars without owners would be:
SELECT REGNO from CAR WHERE OWNER is NULL
When dealing with strings, sometimes you do not want to match on exact strings like ='RED', but instead on partial strings, substrings, or particular patterns. This could allow you, for instance, to find all cars with a colour starting with 'B'. The LIKE operator provides this functionality.
The LIKE operator is used in place of an '=' sign. In its basic form it is identical to '='. For instance, both of the following statements are identical:
SELECT regno FROM car WHERE colour = 'RED'; SELECT regno FROM car WHERE colour LIKE 'RED';
The power of LIKE is that it supports two special characters, '%' and '-'. These are equivalent to the '*' and '?' wildcard characters of DOS. Whenever there is an '-' character in the string, any character will match. Whenever there is an '%' character in the string, 0 or more characters will match. Consider these rules:
Note however that LIKE is more powerful than a simple '=' operator, and thus takes longer to run. If you are not using any wildcard characters in a LIKE operator then you should always replace LIKE with '='.