SHARE

Three Ways to Identify Non-matching Records in MySQL

Written By

May 6, 2011

4 minute read

There are many times as a database developer and administrator that you’ll want to know what entities are being excluded from a straight table join. For instance, it would be instructive for management to identify sales people who haven’t received any bonuses, as these might be deemed to be the underperformers. To provide the requestor with the statistics, you’ll have to determine which sales people are absent from the bonuses table. This is commonly referred to as a Non-matching query, although it is sometimes called a Subtract or Set Difference query as well. MySQL provides not one but three standard means of producing non-matching results. I will be presenting each here, along with another that you might not have considered.

Here are the two sample tables that we’ll execute our queries against:

employees table:

id	dept_id	gender	name	salary	emp_id_number
1	2	m	Jon Simpson	4500	1234
2	4	f	Barbara Breitenmoser	(NULL)	9999
3	3	f	Kirsten Ruegg	5600	3462
4	1	m	Ralph Teller	5100	6543
5	2	m	Peter Jonson	5200	9747
6	2	m	Allan Smithie	4900	6853
7	4	f	Mary Parker	5600	2223
8	3	f	Kirsten Ruegg	4400	2765

bonuses table:

bonus_id	emp_id	amount	award_date
1	3	500	2009-10-28
2	5	1000	2009-01-03
3	4	250	2009-04-22
4	2	900	2010-01-14
5	1	450	2009-01-03
6	5	1000	2009-10-28
7	8	300	2007-10-01
8	8	300.5	2009-08-12
9	2	250	2009-03-16
10	2	500	2009-07-30
11	6	350	2009-02-12
12	5	300	2009-01-15

While there is only one instance of each employee in the employees table, there are potentially multiple instances of each in the bonuses table, depending on how many bonuses he or she has received. What we want to see is a list of names of employees’ IDs that do not appear in the bonuses table.

Solution 1: LEFT JOIN / IS NULL

One way to select values present in one table but missing in another is to use a combination of a Left Join with an “IS NULL” test. Here’s the syntax for that:

SELECT field list

FROM left_table

LEFT JOIN

right_table

ON right_table.id = left_table.id

WHERE right_table.id IS NULL

Plugging in our data produces the following Select query:

SELECT emp.name

FROM employees emp

LEFT JOIN

bonuses b

ON b.emp_id = emp.id

WHERE b.emp_id IS NULL;

Running the above statement confirms that Mary Parker is the only employee who has not received a bonus.

Solution 2: NOT IN

The second way to find non-matching records between tables is to use NOT in conjunction with the IN operator. The IN operator allows you to specify multiple values in a WHERE clause, much like a string of ORs chained together. It accepts either a comma-delimited list (value1, value2, etc…) or the results from another Select statement. The latter is what we will be using here.

The syntax for that form is as follows:

SELECT field list

FROM left_table

WHERE left_table.id NOT IN

(

SELECT right_table.id

FROM right_table

)

Here it is again with our specific criteria:

SELECT name

FROM employees

WHERE id NOT IN

(

SELECT emp_id

FROM bonuses

);

It also shows that Mary Parker is the only employee who has not received a bonus.

Solution 3: NOT EXISTS

The main difference between the IN and EXISTS predicate in subquery is the way in which the query gets executed.

IN: The inner query is executed first and the list of values obtained in the result is used by the outer query. The inner query is executed only once.
EXISTS: Each row from the outer query result is selected, and compared against the inner query. Hence, the inner query is executed as many times as there are rows in the outer query result.

Here is the syntax for the Not Exists Select statement:

SELECT field list

FROM left_table

WHERE NOT EXISTS

(

right_table.id

FROM right_table

WHERE right_table.id = left_table.id

)

We would plug in our criteria like so:

SELECT name

FROM employees emp

WHERE NOT EXISTS

(

SELECT emp_id

FROM bonuses b

WHERE b.emp_id = emp.id

);

Once again, poor Mary Parker is without a single bonus. This may not bode well for her next performance evaluation.

Bonus Solution: Group By with Union All

As promised, here is a fourth solution, using Group By with UNION ALL. The first step is to Union the two tables that we want to compare using Union ALL (the ALL retains duplicates). Then we group the result on the column that we want to compare. This will cause columns that appear in both inner SELECTS to have a count of two while unmatched records will only be counted once.

Here is the syntax for the Not Exists Select statement:

SELECT field list

FROM (

SELECT DISTINCT field list FROM left_table

UNION ALL

SELECT DISTINCT field list FROM right_table

) AS tbl

GROUP BY field list HAVING COUNT(*) = 1

Be sure to group by the ID field so that different employees with the same name won’t be treated as the same person:

SELECT name

FROM (

SELECT DISTINCT id, name FROM employees

UNION ALL

SELECT DISTINCT emp_id, ” FROM bonuses

) AS tbl

GROUP BY id HAVING COUNT(*) = 1;

You guessed it; Mary Parker is once again left out in the cold.

I have no specific statistics that show how fast this style performs but anecdotal data would suggest that it compares quite favourably.

Which Style Should I Use?

Because of how the MySQL optimizer handles EXISTS, that method is about 30% less efficient than the others. If speed is your primary concern, then the LEFT JOIN / IS NULL and NOT IN styles are your best bets. Otherwise, you really can’t go wrong with whichever style suits you.

See all articles by Robert Gravelle

Robert Gravelle

Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Three Ways to Identify Non-matching Records in MySQL

Robert Gravelle

Company

Categories