Logical Operators and Aggregation

In the previous section we saw how a single rule could be added to a query using a WHERE clause. While this is useful, usually more than a single rule is required to produce the correct result. To support multiple rules we need to make use of NOT, AND, OR and parentheses.

AND

The basic way of supporting multiple rules in a single query is by making use of AND. AND provides a way of connecting two rules together such that ALL the rules must be true before the row is shown. Lets make use again of the CAR table:

Consider the case where a police eye witness spots a car driving away from a crime. The witness reports that the car was BLUE and had the character '5' somewhere in the REGNO field. Taking these rules seperately...

SELECT regno from CAR
WHERE colour = 'BLUE';

REGNO
J111 BBB
A155 BDE
SC04 BFE

SELECT regno from CAR
WHERE regno LIKE '%5%'

REGNO
A155 BDE
K555 GHT

We are looking for a REGNO in common to both these results, which means the car we are looking for is 'A155 BDE'. Rather than doing this ourselves we want the computer to identify the right car in a single query. The two rules in question are linked together with an AND.

SELECT regno from CAR
WHERE colour = 'BLUE' AND regno LIKE '%5%'
;

REGNO
A155 BDE

Remember that the layout of the SQL is independent of spaces and newlines, so this query is identical to:

SELECT regno
FROM   CAR
WHERE  colour = 'BLUE' 
AND    regno LIKE '%5%'
;

You can link as many rules together as you like. So for instance if the witness said that the car was BLUE, had a 5 in the registration number, and that someone said the car was owned by Bob, we could write a query:

SELECT regno
FROM   CAR
WHERE  colour = 'BLUE' 
AND    regno LIKE '%5%'
AND    owner LIKE 'Bob %'
;

AND allows us to link rules together such that all rules must be true to see that row. Think of AND as 'As well as'. Sometimes we want to say 'or that' or 'either' rather than 'As well as'. To do this we use OR. For instance, lets say that the police witness said that the car colour was either RED or BLUE, and they were not sure which. If you said:

then no rows would be produced, as you are saying you want rows where the colour is both RED and BLUE at the same time (RED as well as BLUE). What we need is either RED OR BLUE.

SELECT REGNO,COLOUR from CAR
WHERE colour = 'RED'
OR    colour = 'BLUE';

REGNO	COLOUR
F611 AAA	RED
J111 BBB	BLUE
A155 BDE	BLUE
SC04 BFE	BLUE

NOT

The NOT operator does the opposite of whatever comparison is being done. NOT is not frequently needed, as there is usually an opposite operator already. For instance, if you wanted the opposite of:

Precedence

AND, OR, and NOT become more complex to understand when you mix them together in a single query. The problem is that the rules are combined together, not in the order you write them, but in the order of their precedence. This states that NOT is done first, then AND, and finally OR. This can make a BIG difference to your queries!

Consider the case of the police witness. Lets say that the car being looked for had a 5 in the registration number, and was either RED or BLUE.

SELECT REGNO,COLOUR from CAR
WHERE colour = 'RED'      -- 1
OR    colour = 'BLUE'     -- 2
AND   regno LIKE '%5%'    -- 3
;

REGNO	COLOUR
F611 AAA	RED
A155 BDE	BLUE

In this query, rule 3 and rule 2 and ANDed together first, as they have a higher precedent. Only then is rule 1 ORed in. Thus this query says "The car is BLUE with a 5 in the regno" OR "the car is RED". What was wanted was to have rules 1 and 2 done first, and then 3, so that the query says "The car is either RED or BLUE" AND "the car had a 5 in the regno". To do this we need to use parenthesis.

Parenthesis

Parenthesis, or brackets, are used to instruct the database which rules should be done first. The database uses a simple ruleset to understand brackets. If you have any brackets, then the rule in the brackets is done first. If you have brackets within brackets, then the inner brackets are done first. In the example above, the right query can be generated as:

SELECT REGNO,COLOUR from CAR
WHERE (colour = 'RED'  
OR    colour = 'BLUE')
AND   regno LIKE '%5%'
;

REGNO	COLOUR
A155 BDE	BLUE

SELECT REGNO,COLOUR from CAR
WHERE (colour = 'RED' OR colour = 'BLUE')
AND   regno LIKE '%5%';

SELECT REGNO,COLOUR from CAR
WHERE ( (colour = 'RED' OR colour = 'BLUE')
AND   regno LIKE '%5%');

Do not use brackets where they are not needed, as it makes the query harder for users to understand whats going on.

DISTINCT

Lets say you want a list of all the colours of cars in the database. The COLOUR field of CAR gives you this, and thus:

SELECT colour FROM car;

COLOUR
RED
BLUE
BLUE
GREEN
BLUE

This result was not the ideal one wanted. BLUE for some reason appears 3 times. It does this because BLUE appears 3 times in the original data. Sometimes this duplication is what is wanted, other times we want only to see the colours appearing once. To tell the database to show the rows only once, you can use the keyword DISTINCT. This appears immediately after the word SELECT. DISTINCT effectively means that all rows which appear must be unique, and any duplicate rows will be deleted.

SELECT DISTINCT colour FROM car;

COLOUR
BLUE
GREEN
RED

ORDER BY

When a query is executed the results are displayed in an almost random order. The order is dependent on how the database management system was written. This is fine usually, but sometimes giving the data out in a particular order would make the data must more useful. There is a special clause, ORDER BY, which can be added to the end of a query to give the data a particular order.

SELECT make FROM car;

MAKE
FORD
SKODA
MERCEDES
FIAT
SMART

To order alphabetically (which in SQL is known as ascending or ASC) you can use ORDER BY or ORDER BY ASC.

SELECT make FROM car
ORDER BY make;

MAKE
FIAT
FORD
MERCEDES
SKODA
SMART

SELECT make FROM car
ORDER BY make ASC;

SELECT make FROM car
ORDER BY make DESC;

MAKE
SMART
SKODA
MERCEDES
FORD
FIAT

For complex orderings involving more than one column, you can specify multiple columns in the ORDER BY statement, simply by separating each column name with a comma. Thus a query to sort cars by colour and then make would look like:

SELECT make,colour FROM car
ORDER BY colour,make;

MAKE	COLOUR
SKODA	BLUE
SMART	BLUE
MERCEDES	BLUE
FIAT	GREEN
FORD	RED

SELECT regno,make
FROM car
WHERE make = 'SKODA' or make='SMART'
;

REGNO	MAKE
J111 BBB	SKODA
SC04 BFE	SMART

SELECT regno,make
FROM car
WHERE make IN ('SKODA','SMART')
;

Aggregate Functions

Operators exist in SQL to give results based on the statistics of a group of values stored in the database. Such operators include "what is the maximum number" and "what is the average". These functions are called SET or AGGREGATE functions.

AVERAGE

SELECT price FROM car;

PRICE
12000
11000
22000
6000
13000

SELECT avg(price) FROM car;

avg(price)
12800

SUM

SELECT sum(price) FROM car;

sum(price)
64000

MAX

To calculate the maximum or biggest value present in a particular column you can use the MAX function.

SELECT max(price) FROM car;

sum(price)
22000

MIN

To calculate the minimum or smallest value present in a particular column you can use the MIN function.

SELECT min(price) FROM car;

sum(price)
6000

COUNT

To work out how many rows are in a particular query result you can use the COUNT function.

Using "count(column)" counts how many rows exist in the answer where that column is NOT NULL. Using "count(*)" counts how many rows exist independent of NULL values.

SELECT count(price) FROM car;

sum(price)
5

In this case, the following SQL produces the same answer.

SELECT count(*) FROM car;

COUNT DISTINCT

Sometimes you do not want to count how many rows are in a particular column, but how many different values are stored in a column. There is a special variation of count which allows you to do that, known as COUNT DISTINCT. Its syntax is a little unusual...

SELECT count(colour) from car;

sum(price)
5

SELECT count(DISTINCT colour) from car;

sum(price)
3

GROUP BY aggregation

The aggregate functions are excellent when all you want is a single number answer. Frequently what is needed is statistical analysis in groups. For instance, what is the maximum cost of a car given its colour. Here we are wanting two columns, one the car colour, and the second column the highest cost. Intuitively one might think:

SELECT colour,max(price)
FROM   car
;

Instead, what you have to do is consider all aggregate functions in your query, and over which columns they are going to be grouped. In this case we are grouping on colour, and want the maximum price within each "colour" group. To tell the computer this we use GROUP BY.

SELECT colour,price
FROM   car
;

COLOUR	PRICE
RED	12000
BLUE	11000
BLUE	22000
GREEN	6000
BLUE	13000

SELECT colour,max(price)
FROM   car
GROUP BY colour
;

COLOUR	max(PRICE)
RED	12000
BLUE	22000
GREEN	6000

If you are ever confused by what to put in the GROUP BY, then here is a simple rule which is 99% accurate... If you have a SELECT line with aggregate functions, then you need a GROUP BY listing all the column names from the SELECT line which are not used by the functions. In this example "price" and "colour" are columns from SELECT, but as "price" is used in MAX, only "colour" needs to go into the GROUP BY statement.

HAVING

One annoying feature of SQL is that aggregate functions are executed at almost the last stage of the query process. This makes writing queries like "Which owners own more than 1 car" quite complex. Ideally we would like to write:

SELECT owner,count(regno)
FROM car
GROUP BY owner
HAVING count(regno)>1;

This query also shows how many cars the owner owns. You do not have to have the function in the HAVING on the SELECT line. The following also works:

SELECT owner
FROM car
GROUP BY owner
HAVING count(regno)>1;

If you remember count(*) counts how many rows there are in the answer. With a GROUP BY, it counts how many rows are in each group. The difference between a count with * or with a column name is that using a column name makes the count ignore NULL entries in that column, whereas with * NULL entries are counted too. In our example, REGNO is never NULL, so the query is also identical to:

SELECT owner
FROM car
GROUP BY owner
HAVING count(*)>1;

Logical Operators and Aggregation

Contents