Next Page Up One Level

Lecture Slides available: PDF PowerPoint

Structured Query Language

Contents

In the other chapters of this course consideration is given to producing a good design for a database structure or schema. In this chapter the focus is on applying this schema to a database management system, and then using that DBMS to allow storage and retrieval of data.

To communicate with the database system itself we need a language. SQL is an international standard language for manipulating relational databases. It is based on an IBM product. SQL is short for Structured Query Language.

SQL can create schemas, delete them, and change them. It can also put data into schemas and remove data. It is a data handling language, but it is not a programming language.

SQL is a DSL (Data Sub Language), which is really a combination of two languages. These are the Data Definition Language (DDL) and the Data Manipulation Language (DML). Schema changes are part of the DDL, while data changes are part of the DML. We will consider both parts of the DSL in this discussion of SQL.

Database Models

A data model comprises

  • a data structure
  • a set of integrity constraints
  • operations associated with the data structure

Examples of data models include:

  • hierarchic
  • network
  • relational

Models other than the relational database module used to be quite popular. Each model type is appropriate to particular types of problem. The Relational model type is the most popular in use today, and the other types are not discussed further.

Relational Databases

The relational data model comprises:

  • relational data structure
  • relational integrity constraints
  • relational algebra or equivalent (SQL)
  • SQL is an ISO language based on relational algebra
  • relational algebra is a mathematical formulation
  • Relational Data Structure

    A relational data structure is a collection of tables or relations.

    • A relation is a collection of rows or tuples
    • A tuple is a collection of columns or attributes
    • A domain is a pool of values from which the actual attribute values are taken.
    Tuples and Domains
    Figure : Tuples and Domains

    Domain and Integrity Constraints

    • Domain Constraints
      • limit the range of domain values of an attribute
      • specify uniqueness and `nullness' of an attribute
      • specify a default value for an attribute when no value is provided.
    • Entity Integrity
      • every tuple is uniquely identified by a unique non-null attribute, the primary key.
    • Referential Integrity
      • rows in different tables are correctly related by valid key values (`foreign' keys refer to primary keys).

    Structure of a Table

    In the design process tables are defined, and the relationships between tables identified. Remember a relationship is just a link between two concepts. Consider a table holding "drivers" and a table holding "car" information... Each car is owned by a driver, and therefore there is a link between "car" and "driver" to indicate which driver owns which car.

    In the subsequent pages we will refer back to this driver and car arrangement. To make the examples easier, lets create some example data.

    CAR

    The CAR table has the following structure:

    • REGNO : The registration number of the car
    • MAKE : The manufacturer of the car
    • COLOUR: The colour of the car
    • PRICE : The price of the car when it was bought new

    DRIVER

    The DRIVER table has the following structure:

    • NAME : The full name of the driver
    • DOB : The data of birth of the driver

    Relationship between CAR and DRIVER

    The DRIVER and the CAR has a relationship between them of N:1. This indicates that a CAR can have only 1 DRIVER, but that a DRIVER can own more than 1 CAR simultaneously.

    ER Diagram of DRIVER and CAR
    Figure : ER Diagram of DRIVER and CAR

    In the design section we can see that this requires a FOREIGN KEY in the CAR end of the relationship. This foreign key allows us to implement the relationship in the database. We will call this field OWNER.

    Example Data

    DRIVER

    NAMEDOB
    Jim Smith11 Jan 1980
    Bob Smith23 Mar 1981
    Bob Jones3 Dec 1986

    CAR

    REGNOMAKECOLOURPRICEOWNER
    F611 AAAFORDRED12000Jim Smith
    J111 BBBSKODABLUE11000Jim Smith
    A155 BDEMERCEDESBLUE22000Bob Smith
    K555 GHTFIATGREEN6000Bob Jones
    SC04 BFESMARTBLUE13000

    Columns or Attributes

    Each column is given a name which is unique within a table

    Each column holds data of one specified type. E.g.

     integer               decimal
     character text        data
      -- the range of values can be further constrained
    

    If a column of a row contains no data, we say it is NULL. For example, a car just off the production line might not have an owner in the database until someone buys the car. A NULL value may also indicate that the value is unavailable or inappropriate . This might be the case for a car which is being destroyed or a car where two people are arguing in court that they are both the owner.

    Some important rules:

    • All rows of a table must be different in some way from all other rows.
    • Sometimes a row is referred to as a Tuple.
    • Cardinality is the number of ROWS in a table.
    • Arity is the number of COLUMNS in a table.

    Primary Keys

    A table requires a key which uniquely identifies each row in the table. This is entity integrity.

    The key could have one column, or it could use all the columns. It should not use more columns than necessary. A key with more than one column is called a composite key.

    A table may have several possible keys, the candidate keys, from which one is chosen as the primary key.

    No part of a primary key may be NULL.

    If the rows of the data are not unique, it is necessary to generate an artificial primary key.

    In our example, DRIVER has a primary key of NAME, and CAR has a primary key of REGNO. This database will break if there are two drivers with the same name, but it gives you an idea what the primary key means...

    Note that if for some reason JIM SMITH decided to change his name to "BRIAN SMITH", then not only would this have to be changed in DRIVER, but it would also have to be changed in CAR. If you changed it only in DRIVER, there would be some foreign keys pointing to DRIVER looking for a driver who does not exist. This would be an error called a REFERENTIAL INTEGRITY error, and the DBMS stops you making changes to the database which would result in such an error.

    SQL Basics

    Basic SQL Statements include:

    • CREATE - a data structure
    • SELECT - read one or more rows from a table
    • INSERT - one or more rows into a table
    • DELETE - one or more rows from a table
    • UPDATE - change the column values in a row
    • DROP - a data structure

    In the remainder of this section only simple SELECT statements are considered.

    Simple SELECT

    The syntax of a SELECT statement is :

    SELECT column FROM tablename
    

    This would produce all the rows from the specified table, but only for the particular column mentioned. If you want more than one column shown, you can put in multiple columns separating them with commas, like:

    SELECT column1,column2,column3 FROM tablename
    

    If you want to see all the columns of a particular table, you can type:

    SELECT * FROM tablename
    

    Lets see it in action on CAR...

    SELECT * FROM car;
    
    REGNOMAKECOLOURPRICEOWNER
    F611 AAAFORDRED12000Jim Smith
    J111 BBBSKODABLUE11000Jim Smith
    A155 BDEMERCEDESBLUE22000Bob Smith
    K555 GHTFIATGREEN6000Bob Jones
    SC04 BFESMARTBLUE13000
    SELECT regno FROM car;
    
    REGNO
    F611 AAA
    J111 BBB
    A155 BDE
    K555 GHT
    SC04 BFE
    SELECT colour,owner FROM car;
    
    COLOUROWNER
    REDJim Smith
    BLUEJim Smith
    BLUEBob Smith
    GREENBob Jones
    BLUE

    In SQL, you can put extra space characters and return characters just about anywhere without changing the meaning of the SQL. SQL is also case-insensitive (except for things in quotes). In addition, SQL in theory should always end with a ';' character. You need to include the ';' if you have two different SQL queries so that the system can tell when one SQL statement stops and another one starts. If you forget the ';' the online interface will put one in for you. For these reasons all of the following statements are identical and valid.

    SELECT REGNO FROM CAR;
    
    SELECT REGNO FROM CAR
    
    Select REGNO from CAR
    
    select regno FROM car
    
    
    SELECT 
        regno
     FROM             car;
    

    Comments

    Sometimes you might want to write a comment in somewhere as part of an SQL statement. A comment in this case is a simple piece of text which is meaningful to yourself, but should be ignored by the database. The characters '--', when they appear in a query, indicate the start of a comment. Everything after that point is ignored until the end of that line. The following queries are all equivalent.

    SELECT regno
    FROM   car;
    
    SELECT regno -- The registration number
    FROM   car   -- The car storage table
    ;
    

    Warning: You cannot put a comment immediately after a ';'. Comments are only supported within the text of an SQL statement. The following will cause SQL errors:

    SELECT regno  
    FROM   car;   -- Error here as comment is after the query
    
    -- Error here as comment is before the start of the query
    SELECT regno  
    FROM   car;   
    

    SELECT filters

    Displaying all the rows of a table can be handy, but if we have tables with millions of rows then this type of query could take hours. Instead, we can add "filters" onto a SELECT statement to only show specific rows of a table. These filters are written into an optional part of the SELECT statement, known as a WHERE clause.

    SELECT columns
    FROM table
    WHERE rule
    

    The "rule" section of the WHERE clause is checked for every row that a select statement would normally show. If the whole rule is TRUE, then that row is shown, whereas if the rule is FALSE, then that row is not shown.

    The rule itself can be quite complex. The simplest rule is a single equality test, such as "COLOUR = 'RED'".

    Without the WHERE rule would show:

    SELECT regno from CAR;
    
    REGNO
    F611 AAA
    J111 BBB
    A155 BDE
    K555 GHT
    SC04 BFE

    From the database we know that only F611 AAA is RED, and the rest of the cars are either BLUE or GREEN. Thus a rule COLOUR = 'RED' is only true on the row with F611 AAA, and false elsewhere. With everything in a query:

    SELECT regno from CAR
    WHERE colour = 'RED';
    
    REGNO
    F611 AAA

    An important point to note is that queries are case sensitive between the quotes. Thus 'RED' will work, but 'red' will produce nothing. The case used in the quotes must match perfectly the case stored in the table. SQL is not forgiving and if you forget you can be scratching you head for hours trying to fix it.

    Note also that "colour" does not have to appear on the SELECT line as a column name. It can if you want to see the colour, but there is no requirement for it to be there. Therefore this will work too:

    SELECT regno,colour from CAR
    WHERE colour = 'RED';
    
    REGNOCOLOUR
    F611 AAARED

    Comparisons

    SQL supports a variety of comparison rules for use in a WHERE clause. These include =,!=,<>, <, <=, >, and >=.

    Examples of a single rule using these comparisons are:

    WHERE colour = 'RED'The colour attribute must be RED
    WHERE colour != 'RED'The colour must be a colour OTHER THAN RED
    WHERE colour <> 'RED'The same as !=
    WHERE PRICE > 10000 The price of the car is MORE THAN 10000
    WHERE PRICE >= 10000 The price of the car is EQUAL TO OR MORE THAN 10000
    WHERE PRICE < 10000 The price of the car is LESS THAN 10000
    WHERE PRICE <= 10000 The price of the car is EQUAL TO OR LESS THAN 10000

    Note that when dealing with strings, like RED, you must say 'RED'. When dealing with numbers, like 10000, you can say '10000' or 10000. The choice is yours.

    Dates

    Date rules are some of the hardest rules to get right in writing SQL, yet there is nothing particularly complex about them. The hard part is working out what it means to be GREATER THAN a particular date.

    In date calculations, you can use all the normal comparators.

    SELECT name,dob from driver
    
    NAMEDOB
    Jim Smith11 Jan 1980
    Bob Smith23 Mar 1981
    Bob Jones3 Dec 1986
    SELECT name,dob from driver
    WHERE DOB = '3 Dec 1986'
    
    NAMEDOB
    Bob Jones3 Dec 1986

    In other comparators, it is important to realise that a date gets bigger as you move into the future, and smaller as you move into the past. Thus to say 'DATE1 < DATE2' you are stating that DATE1 occurs before DATE2 on a calender. For example, to find all drivers who were born on or after the 1st Jan 1981 you would do:

    SELECT name,dob from driver
    WHERE DOB >= '1 Jan 1981'
    
    NAMEDOB
    Bob Smith23 Mar 1981
    Bob Jones3 Dec 1986

    The syntax for dates does change slightly on difference database systems, but the syntax '1 Jan 2000' works in general on all systems. Oracle also allows dates like '1-Jan-2000' and '1-Jan-00'. If you specify a year using only the last two digits, Oracle uses the current date to compute the missing parts of the year, converting '00' to '2000'. Do not get confused by saying '87' for '1987' and ending up with '2087'!

    BETWEEN

    Sometimes when you are dealing with dates you want to specify a range of dates to check. The best way of doing this is using BETWEEN. For instance, to find all the drivers born between 1995 and 1999 you could use:

    SELECT name,dob from driver
    WHERE DOB between '1 Jan 1985' and '31 Dec 1999'
    
    NAMEDOB
    Bob Jones3 Dec 1986

    Note that the dates have day of the month and month in them, and not just the year. In SQL, all dates must have a month and a year. If you try to use just a year the query will fail.

    BETWEEN works for other things, not just dates. For instance, to find cars worth between 5000 and 10000, you could execute:

    SELECT regno
    FROM car
    where price between 5000 and 10000;
    
    REGNOPRICE
    K555 GHT

    NULL

    The NULL value indicates that something has no real value. For this reason the normal value comparisons will always fail if you are dealing with a NULL. If you are looking for NULL, for instance looking for cars without owners using OWNER of CAR, all of the following are wrong!

    SELECT regno from CAR WHERE OWNER = NULL            WRONG!
    SELECT regno from CAR WHERE OWNER = 'NULL'          WRONG!
    

    Instead SQL has a special comparison operator called IS which allows us to find NULL values. There is also an opposite to IS, called IS NOT, which finds all the values which are not NULL. So finding all the regnos of cars with current owners would be (note that if they have an owner, then the owner has a value and thus is NOT NULL):

    SELECT REGNO from CAR
    WHERE OWNER is not NULL
    
    REGNO
    F611 AAA
    J111 BBB
    A155 BDE
    K555 GHT

    And finding cars without owners would be:

    SELECT REGNO from CAR
    WHERE OWNER is NULL
    
    REGNO
    SC04 BFE

    LIKE

    When dealing with strings, sometimes you do not want to match on exact strings like ='RED', but instead on partial strings, substrings, or particular patterns. This could allow you, for instance, to find all cars with a colour starting with 'B'. The LIKE operator provides this functionality.

    The LIKE operator is used in place of an '=' sign. In its basic form it is identical to '='. For instance, both of the following statements are identical:

    SELECT regno FROM car WHERE colour = 'RED';
    SELECT regno FROM car WHERE colour LIKE 'RED';
    

    The power of LIKE is that it supports two special characters, '%' and '-'. These are equivalent to the '*' and '?' wildcard characters of DOS. Whenever there is an '-' character in the string, any character will match. Whenever there is an '%' character in the string, 0 or more characters will match. Consider these rules:

    name LIKE 'Jim Smith'Matches 'Jim Smith'
    name LIKE '_im Smith'Matches things like 'Jim Smith' or 'Tim Smith'
    name LIKE '___ Smith'Matches 'Jim Smith' and 'Bob Smith'
    name LIKE '% Smith'Matches 'Jim Smith' and 'Bob Smith'
    name LIKE '% S%'Matches 'Jim Smith' and 'Bob Smith'
    name LIKE 'Bob %'Matches 'Bob Jones' and 'Bob Smith'
    name LIKE '%'Matches anything not null

    Note however that LIKE is more powerful than a simple '=' operator, and thus takes longer to run. If you are not using any wildcard characters in a LIKE operator then you should always replace LIKE with '='.

    Next Page Up One Level