Previous Page Next Page Up One Level

Lecture Slides available: PDF PowerPoint

Logical Operators and Aggregation

Contents

Logical Operators

In the previous section we saw how a single rule could be added to a query using a WHERE clause. While this is useful, usually more than a single rule is required to produce the correct result. To support multiple rules we need to make use of NOT, AND, OR and parentheses.

AND

The basic way of supporting multiple rules in a single query is by making use of AND. AND provides a way of connecting two rules together such that ALL the rules must be true before the row is shown. Lets make use again of the CAR table:

REGNOMAKECOLOURPRICEOWNER
F611 AAAFORDRED12000Jim Smith
J111 BBBSKODABLUE11000Jim Smith
A155 BDEMERCEDESBLUE22000Bob Smith
K555 GHTFIATGREEN6000Bob Jones
SC04 BFESMARTBLUE13000

Consider the case where a police eye witness spots a car driving away from a crime. The witness reports that the car was BLUE and had the character '5' somewhere in the REGNO field. Taking these rules seperately...

SELECT regno from CAR
WHERE colour = 'BLUE';
REGNO
J111 BBB
A155 BDE
SC04 BFE
SELECT regno from CAR
WHERE regno LIKE '%5%'
REGNO
A155 BDE
K555 GHT

We are looking for a REGNO in common to both these results, which means the car we are looking for is 'A155 BDE'. Rather than doing this ourselves we want the computer to identify the right car in a single query. The two rules in question are linked together with an AND.

SELECT regno from CAR
WHERE colour = 'BLUE' AND regno LIKE '%5%'
;
REGNO
A155 BDE

Remember that the layout of the SQL is independent of spaces and newlines, so this query is identical to:

SELECT regno
FROM   CAR
WHERE  colour = 'BLUE' 
AND    regno LIKE '%5%'
;

You can link as many rules together as you like. So for instance if the witness said that the car was BLUE, had a 5 in the registration number, and that someone said the car was owned by Bob, we could write a query:

SELECT regno
FROM   CAR
WHERE  colour = 'BLUE' 
AND    regno LIKE '%5%'
AND    owner LIKE 'Bob %'
;

OR

AND allows us to link rules together such that all rules must be true to see that row. Think of AND as 'As well as'. Sometimes we want to say 'or that' or 'either' rather than 'As well as'. To do this we use OR. For instance, lets say that the police witness said that the car colour was either RED or BLUE, and they were not sure which. If you said:

WHERE colour = 'RED' AND colour = 'BLUE'

then no rows would be produced, as you are saying you want rows where the colour is both RED and BLUE at the same time (RED as well as BLUE). What we need is either RED OR BLUE.

SELECT REGNO,COLOUR from CAR
WHERE colour = 'RED'
OR    colour = 'BLUE';
REGNOCOLOUR
F611 AAARED
J111 BBBBLUE
A155 BDEBLUE
SC04 BFEBLUE

NOT

The NOT operator does the opposite of whatever comparison is being done. NOT is not frequently needed, as there is usually an opposite operator already. For instance, if you wanted the opposite of:

WHERE colour = 'RED'
You could say
WHERE colour != 'RED'
Using NOT you could also say
WHERE NOT colour = 'RED'
While not particularly useful in these simple examples, NOT comes into its own once you start to use parentheses.

Precedence

AND, OR, and NOT become more complex to understand when you mix them together in a single query. The problem is that the rules are combined together, not in the order you write them, but in the order of their precedence. This states that NOT is done first, then AND, and finally OR. This can make a BIG difference to your queries!

Consider the case of the police witness. Lets say that the car being looked for had a 5 in the registration number, and was either RED or BLUE.

SELECT REGNO,COLOUR from CAR
WHERE colour = 'RED'      -- 1
OR    colour = 'BLUE'     -- 2
AND   regno LIKE '%5%'    -- 3
;
REGNOCOLOUR
F611 AAARED
A155 BDEBLUE

In this query, rule 3 and rule 2 and ANDed together first, as they have a higher precedent. Only then is rule 1 ORed in. Thus this query says "The car is BLUE with a 5 in the regno" OR "the car is RED". What was wanted was to have rules 1 and 2 done first, and then 3, so that the query says "The car is either RED or BLUE" AND "the car had a 5 in the regno". To do this we need to use parenthesis.

Parenthesis

Parenthesis, or brackets, are used to instruct the database which rules should be done first. The database uses a simple ruleset to understand brackets. If you have any brackets, then the rule in the brackets is done first. If you have brackets within brackets, then the inner brackets are done first. In the example above, the right query can be generated as:

SELECT REGNO,COLOUR from CAR
WHERE (colour = 'RED'  
OR    colour = 'BLUE')
AND   regno LIKE '%5%'
;
REGNOCOLOUR
A155 BDEBLUE

The following queries are all identical in function to the above query...

SELECT REGNO,COLOUR from CAR
WHERE (colour = 'RED' OR colour = 'BLUE')
AND   regno LIKE '%5%';

SELECT REGNO,COLOUR from CAR
WHERE ( (colour = 'RED' OR colour = 'BLUE')
AND   regno LIKE '%5%');

Do not use brackets where they are not needed, as it makes the query harder for users to understand whats going on.

DISTINCT

Lets say you want a list of all the colours of cars in the database. The COLOUR field of CAR gives you this, and thus:

SELECT colour FROM car;
COLOUR
RED
BLUE
BLUE
GREEN
BLUE

This result was not the ideal one wanted. BLUE for some reason appears 3 times. It does this because BLUE appears 3 times in the original data. Sometimes this duplication is what is wanted, other times we want only to see the colours appearing once. To tell the database to show the rows only once, you can use the keyword DISTINCT. This appears immediately after the word SELECT. DISTINCT effectively means that all rows which appear must be unique, and any duplicate rows will be deleted.

SELECT DISTINCT colour FROM car;
COLOUR
BLUE
GREEN
RED

ORDER BY

When a query is executed the results are displayed in an almost random order. The order is dependent on how the database management system was written. This is fine usually, but sometimes giving the data out in a particular order would make the data must more useful. There is a special clause, ORDER BY, which can be added to the end of a query to give the data a particular order.

SELECT make FROM car;
MAKE
FORD
SKODA
MERCEDES
FIAT
SMART

To order alphabetically (which in SQL is known as ascending or ASC) you can use ORDER BY or ORDER BY ASC.

SELECT make FROM car
ORDER BY make;
MAKE
FIAT
FORD
MERCEDES
SKODA
SMART

This is identical to

SELECT make FROM car
ORDER BY make ASC;

To order things in the reverse ordering, you can use ORDER BY DESC.

SELECT make FROM car
ORDER BY make DESC;
MAKE
SMART
SKODA
MERCEDES
FORD
FIAT

For complex orderings involving more than one column, you can specify multiple columns in the ORDER BY statement, simply by separating each column name with a comma. Thus a query to sort cars by colour and then make would look like:

SELECT make,colour FROM car
ORDER BY colour,make;
MAKECOLOUR
SKODABLUE
SMARTBLUE
MERCEDESBLUE
FIATGREEN
FORDRED

IN

  • IN (list of values) determines whether a specified value is in a set of one or more listed values.

List the registration numbers of cars which are either SKODA or SMART

SELECT regno,make
FROM car
WHERE make = 'SKODA' or make='SMART'
;
REGNOMAKE
J111 BBBSKODA
SC04 BFESMART

This can be rewritten using IN.

SELECT regno,make
FROM car
WHERE make IN ('SKODA','SMART')
;

A good way to think of IN is to consider it as "is one of the following".

Aggregate Functions

Operators exist in SQL to give results based on the statistics of a group of values stored in the database. Such operators include "what is the maximum number" and "what is the average". These functions are called SET or AGGREGATE functions.

AVERAGE

To calculate the average of a column you use the AVG function.

SELECT price FROM car;
PRICE
12000
11000
22000
6000
13000
SELECT avg(price) FROM car;
avg(price)
12800

SUM

To calculate the SUM of all values in a column you use the SUM function.

SELECT sum(price) FROM car;
sum(price)
64000

MAX

To calculate the maximum or biggest value present in a particular column you can use the MAX function.

SELECT max(price) FROM car;
sum(price)
22000

MIN

To calculate the minimum or smallest value present in a particular column you can use the MIN function.

SELECT min(price) FROM car;
sum(price)
6000

COUNT

To work out how many rows are in a particular query result you can use the COUNT function.

Using "count(column)" counts how many rows exist in the answer where that column is NOT NULL. Using "count(*)" counts how many rows exist independent of NULL values.

SELECT count(price) FROM car;
sum(price)
5

In this case, the following SQL produces the same answer.

SELECT count(*) FROM car;

COUNT DISTINCT

Sometimes you do not want to count how many rows are in a particular column, but how many different values are stored in a column. There is a special variation of count which allows you to do that, known as COUNT DISTINCT. Its syntax is a little unusual...

SELECT count(colour) from car;
sum(price)
5
SELECT count(DISTINCT colour) from car;
sum(price)
3

GROUP BY aggregation

The aggregate functions are excellent when all you want is a single number answer. Frequently what is needed is statistical analysis in groups. For instance, what is the maximum cost of a car given its colour. Here we are wanting two columns, one the car colour, and the second column the highest cost. Intuitively one might think:

SELECT colour,max(price)
FROM   car
;

If you were to run this query it would produce a "group by" error.

Instead, what you have to do is consider all aggregate functions in your query, and over which columns they are going to be grouped. In this case we are grouping on colour, and want the maximum price within each "colour" group. To tell the computer this we use GROUP BY.

SELECT colour,price
FROM   car
;
COLOURPRICE
RED12000
BLUE11000
BLUE22000
GREEN6000
BLUE13000
SELECT colour,max(price)
FROM   car
GROUP BY colour
;
COLOURmax(PRICE)
RED12000
BLUE22000
GREEN6000

If you are ever confused by what to put in the GROUP BY, then here is a simple rule which is 99% accurate... If you have a SELECT line with aggregate functions, then you need a GROUP BY listing all the column names from the SELECT line which are not used by the functions. In this example "price" and "colour" are columns from SELECT, but as "price" is used in MAX, only "colour" needs to go into the GROUP BY statement.

HAVING

One annoying feature of SQL is that aggregate functions are executed at almost the last stage of the query process. This makes writing queries like "Which owners own more than 1 car" quite complex. Ideally we would like to write:

SELECT owner from car where count(owner) > 1;
The problem is that this does not work! Aggregate functions cannot appear in a WHERE clause, so this query is illegal... To get around this you can have the HAVING clause. HAVING works in an identical way to WHERE, except that it runs very late in the process and allows aggregate functions. It is also VERY expensive for the database to use, so do not use it until it is absolutely essential.

Our query can now be rewritten thus:

SELECT owner,count(regno)
FROM car
GROUP BY owner
HAVING count(regno)>1;

This query also shows how many cars the owner owns. You do not have to have the function in the HAVING on the SELECT line. The following also works:

SELECT owner
FROM car
GROUP BY owner
HAVING count(regno)>1;

If you remember count(*) counts how many rows there are in the answer. With a GROUP BY, it counts how many rows are in each group. The difference between a count with * or with a column name is that using a column name makes the count ignore NULL entries in that column, whereas with * NULL entries are counted too. In our example, REGNO is never NULL, so the query is also identical to:

SELECT owner
FROM car
GROUP BY owner
HAVING count(*)>1;

Previous Page Next Page Up One Level