When joining tables on common fields, if you’re not diligent in the use of AND-OR combinations in the criteria clause, be prepared for unexpected results! Learn how to break filtering criteria into individual steps and how to group AND-OR statements to most effectively isolate the data that you wish to retrieve.
The following code can be used to create and
populate the employees and shops tables. If you’ve downloaded
the table data from my recent articles, you don’t need these as they are
the same:
# # Table structure for table 'employees' # CREATE TABLE /*!32312 IF NOT EXISTS*/ "employees" ( "id" int(11) NOT NULL AUTO_INCREMENT, "shop_id" int(11) NOT NULL DEFAULT '0', "gender" enum('m','f') NOT NULL, "name" varchar(32) NOT NULL, "salary" int(11) DEFAULT NULL, PRIMARY KEY ("id","shop_id"), KEY "Foreign Key" ("shop_id") ) AUTO_INCREMENT=15; # # Data for table 'employees' # LOCK TABLES "employees" WRITE; /*!40000 ALTER TABLE "employees" DISABLE KEYS;*/ INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (1,1,'m','Jon Simpson',4500); INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (2,1,'f','Barbara Breitenmoser',NULL); INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (3,2,'f','Kirsten Ruegg',5600); INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (4,3,'m','Ralph Teller',5100); INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (5,3,'m','Peter Jonson',5200); /*!40000 ALTER TABLE "employees" ENABLE KEYS;*/ UNLOCK TABLES; # # Table structure for table 'shops' # CREATE TABLE /*!32312 IF NOT EXISTS*/ "shops" ( "shop_id" int(11) NOT NULL AUTO_INCREMENT, "shop" varchar(32) DEFAULT NULL, PRIMARY KEY ("shop_id") ) AUTO_INCREMENT=4; # # Data for table 'shops' # LOCK TABLES "shops" WRITE; /*!40000 ALTER TABLE "shops" DISABLE KEYS;*/ INSERT INTO "shops" ("shop_id", "shop") VALUES (1,'Zurich'); INSERT INTO "shops" ("shop_id", "shop") VALUES (2,'New York'); INSERT INTO "shops" ("shop_id", "shop") VALUES (3,'London'); /*!40000 ALTER TABLE "shops" ENABLE KEYS;*/ UNLOCK TABLES;
Here’s a simple enough query that attempts to seek employees
who work in Zurich or have a monthly salary in excess of $5,500:
SELECT name, gender, CONCAT('$', FORMAT(salary, 2)) AS 'Monthly Salary' FROM employees, shops WHERE employees.shop_id = shops.shop_id AND shops.shop = 'Zurich' OR employees.salary > 5500;
This result set is probably not what the query writer had
in mind (notice the three rows of ‘Kristen Ruegg’):
name |
gender |
Monthly Salary |
Jon |
m |
$4,500.00 |
Barbara |
m |
(NULL) |
Kirsten |
f |
$5,600.00 |
Kirsten |
f |
$5,600.00 |
Kirsten |
f |
$5,600.00 |
So, what went wrong?
It all comes down to something called
Operator Precedence, which is a concept that applies to any discipline that
utilizes operators; the two main ones being mathematics and computer languages. Precedence
rules, also known as the order of operations, are used to clarify which procedures
should be performed first. In math for
instance, multiplication and division take precedence over addition and subtraction. Hence,
the expression 10 + 100 / 5 equals 30 and not 22. The same idea applies
to SQL, where precedence rules govern which operator will be applied in a
given expression before the others. Here is the operator list for MySQL,
from first to last:
INTERVAL BINARY, COLLATE ! - (unary minus), ~ (unary bit inversion) ^ *, /, DIV, %, MOD -, + <<, >> & | =, <=>, >=, >, <=, <, <>, !=, IS, LIKE, REGEXP, IN BETWEEN, CASE, WHEN, THEN, ELSE NOT &&, AND XOR ||, OR :=
Please before Thank You, AND Before OR
While both are quite low on the precedence scale, the AND
does come just before the Exclusive OR (XOR) and OR operators. So what does
that mean for our query above? To answer that question, let’s take a look
at the statement again, but with the implied parentheses:
SELECT name, gender, CONCAT('$', FORMAT(salary, 2)) AS 'Monthly Salary' FROM employees, shops WHERE (employees.shop_id = shops.shop_id AND shops.shop = 'Zurich') OR employees.salary > 5500;
In last week’s Identifying
and Eliminating the Dreaded Cartesian Product article, we saw how improper
or missing table joins can wreak devastating effects on our results. And
that’s exactly what is happening here. The AND clause is sticking to the
line above it, causing the join to be limited to it, while leaving the OR
part of the statement left standing on its own. After including the implied
parentheses around the first two criteria statements, we can see that the
query is fetching employees that work in the Zurich office, but it is also
retrieving an employee for every row in the shops table whose salary is over
$5500. Careful analysis of the data confirms that “Kirsten Ruegg” is in
fact the only employee who makes that amount. The fact that she works in
New York is irrelevant because the OR statement is not linked to the shops
table. That, my friends, is a Cartesian Product.
Linking the OR statement to the
shops table removes the Cartesian Product and returns the correct results:
SELECT name, gender, CONCAT('$', FORMAT(salary, 2)) AS 'Monthly Salary' FROM employees, shops WHERE (employees.shop_id = shops.shop_id AND shops.shop = 'Zurich') OR (employees.shop_id = shops.shop_id AND employees.salary > 5500;
name |
gender |
Monthly Salary |
Jon |
m |
$4,500.00 |
Barbara |
m |
(NULL) |
Kirsten |
f |
$5,600.00 |
An easier way to express the same criteria statement is to
enclose the filtering criteria in parentheses so that it is separate from
the table join:
SELECT name, gender, CONCAT('$', FORMAT(salary, 2)) AS 'Monthly Salary' FROM employees, shops WHERE employees.shop_id = shops.shop_id AND (shops.shop = 'Zurich' OR employees.salary > 5500);
Here again are the correct results:
name |
gender |
Monthly Salary |
Jon |
m |
$4,500.00 |
Barbara |
m |
(NULL) |
Kirsten |
f |
$5,600.00 |
The Proper Use of Parentheses
The moral of this story is clear: Always Enclose Criteria
Selections in Parentheses. In fact, the more filtering criteria that
you have, the more important that it becomes to use parentheses to isolate
specific criteria, especially when mixing operators of differing precedence. The
following SELECT query also includes the IS and NOT operators.
SELECT name, gender, CONCAT('$', FORMAT(salary, 2)) AS 'Salary' FROM employees, shops WHERE employees.shop_id = shops.shop_id AND shops.shop = 'Zurich' AND employees.salary IS NOT NULL OR employees.salary > 5500;
name |
gender |
Monthly Salary |
Jon |
m |
$4,500.00 |
Kirsten |
f |
$5,600.00 |
Kirsten |
f |
$5,600.00 |
Kirsten |
f |
$5,600.00 |
While the above statement does succeed in eliminating
the row that contained the NULL salary, it still produces the same Cartesian
Product as the original query:
Unfortunately, knowing that we need parentheses
is only half the battle. The other half is to know where to put them! Guess
wrong, and you’ll retrieve the wrong data. This is where it helps to be
really clear on what exactly you want to see. Watch what happens when we
include different lines within the parentheses. First, the same AND-OR combination
as the previous example:
SELECT name, gender, CONCAT('$', FORMAT(salary, 2)) AS 'Salary' FROM employees, shops WHERE employees.shop_id = shops.shop_id AND shops.shop = 'Zurich' AND (employees.salary IS NOT NULL OR employees.gender = 'f');
The above statement says:
“Show me the employee info for those
who work in the Zurich shop.
Of those, only include employees
whose salary is above $5500
OR whose gender is female.”
We can follow the order of operations
to see what should come back. To begin, we’ll execute the first statement
above to see the employees who work in the Zurich shop. The easiest way
to accomplish this is to comment out the lines that we don’t want. There
are two kinds of comments to choose from:
A double dash comments out one line: –This
is a comment.
/* and */ enclose one or more lines
of comments. Here’s the same query as above with the last two lines commented
out:
SELECT name, gender, CONCAT('$', FORMAT(salary, 2)) AS 'Salary' FROM employees, shops WHERE employees.shop_id = shops.shop_id AND shops.shop = 'Zurich' /* AND (employees.salary IS NOT NULL OR employees.gender = 'f'); */
The query now returns only employees
who work in the Zurich office:
name |
gender |
Salary |
Jon |
m |
$4,500.00 |
Barbara |
f |
(NULL) |
Now we’ll run the same query with
the shop filter commented out instead:
SELECT name, gender, CONCAT('$', FORMAT(salary, 2)) AS 'Salary' FROM employees, shops WHERE employees.shop_id = shops.shop_id -- AND shops.shop = 'Zurich' AND (employees.salary IS NOT NULL OR employees.gender = 'f');
That produces a list of employees
who either have any non-null salary value OR are female:
name |
gender |
Salary |
Jon |
m |
$4,500.00 |
Barbara |
f |
(NULL) |
Kirsten |
f |
$5,600.00 |
Ralph |
m |
$5,100.00 |
Peter |
m |
$5,200.00 |
Since the two filtering criteria
are joined by an AND, only records that are found in both record sets show
up in the final results:
name |
gender |
Salary |
Jon |
m |
$4,500.00 |
Barbara |
f |
(NULL) |
The alternative is to include the
three filtering criteria within parentheses:
SELECT name, gender, CONCAT('$', FORMAT(salary, 2)) AS 'Salary' FROM employees, shops WHERE employees.shop_id = shops.shop_id AND (shops.shop = 'Zurich' AND employees.salary IS NOT NULL OR employees.gender = 'f');
Due to the higher operator precedent
of the AND keyword, this SELECT statement says:
“Show me the employee info for those
who work in the Zurich shop
AND whose salary is any non-null value.
In addition to those, show me employees
whose gender is female.”
This causes Kirsten Ruegg to be picked up, as
she satisfies the female gender criteria, even though she does not work in
the Zurich shop:
name |
gender |
Salary |
Jon |
m |
$4,500.00 |
Barbara |
f |
(NULL) |
Kirsten |
f |
$5,600.00 |
As you can see, operator precedence can have
subtle but important effects on your SELECT query result sets. It pays to
go through the trouble of breaking complex criteria into individual steps
and to include parentheses wherever you want to group multiple criteria together.