Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MySQL

Posted May 6, 2011

Three Ways to Identify Non-matching Records in MySQL

By Rob Gravelle

There are many times as a database developer and administrator that you'll want to know what entities are being excluded from a straight table join. For instance, it would be instructive for management to identify sales people who haven't received any bonuses, as these might be deemed to be the underperformers. To provide the requestor with the statistics, you'll have to determine which sales people are absent from the bonuses table. This is commonly referred to as a Non-matching query, although it is sometimes called a Subtract or Set Difference query as well. MySQL provides not one but three standard means of producing non-matching results. I will be presenting each here, along with another that you might not have considered.

Here are the two sample tables that we'll execute our queries against:

 

employees table:

id

dept_id

gender

name

salary

emp_id_number

1

2

m

Jon Simpson

4500

1234

2

4

f

Barbara Breitenmoser

(NULL)

9999

3

3

f

Kirsten Ruegg

5600

3462

4

1

m

Ralph Teller

5100

6543

5

2

m

Peter Jonson

5200

9747

6

2

m

Allan Smithie

4900

6853

7

4

f

Mary Parker

5600

2223

8

3

f

Kirsten Ruegg

4400

2765

 

bonuses table:

bonus_id

emp_id

amount

award_date

1

3

500

2009-10-28

2

5

1000

2009-01-03

3

4

250

2009-04-22

4

2

900

2010-01-14

5

1

450

2009-01-03

6

5

1000

2009-10-28

7

8

300

2007-10-01

8

8

300.5

2009-08-12

9

2

250

2009-03-16

10

2

500

2009-07-30

11

6

350

2009-02-12

12

5

300

2009-01-15

 

While there is only one instance of each employee in the employees table, there are potentially multiple instances of each in the bonuses table, depending on how many bonuses he or she has received. What we want to see is a list of names of employees' IDs that do not appear in the bonuses table.

Solution 1: LEFT JOIN / IS NULL

One way to select values present in one table but missing in another is to use a combination of a Left Join with an "IS NULL" test.  Here's the syntax for that:

SELECT  field list

FROM    left_table

LEFT JOIN   

       right_table

       ON  right_table.id = left_table.id

WHERE   right_table.id IS NULL

 

Plugging in our data produces the following Select query:

 

SELECT  emp.name

FROM    employees emp

LEFT JOIN   

       bonuses b

       ON   b.emp_id = emp.id

WHERE   b.emp_id IS NULL;

 

Running the above statement confirms that Mary Parker is the only employee who has not received a bonus.

Solution 2: NOT IN

The second way to find non-matching records between tables is to use NOT in conjunction with the IN operator.  The IN operator allows you to specify multiple values in a WHERE clause, much like a string of ORs chained together.  It accepts either a comma-delimited list (value1, value2, etc...) or the results from another Select statement.  The latter is what we will be using here. 

The syntax for that form is as follows:

 

SELECT field list

FROM    left_table

WHERE   left_table.id NOT IN      

(        

       SELECT  right_table.id   

       FROM    right_table       

)

 

Here it is again with our specific criteria:

 

SELECT  name

FROM    employees

WHERE   id NOT IN      

(        

       SELECT  emp_id    

       FROM    bonuses       

);

 

It also shows that Mary Parker is the only employee who has not received a bonus.

Solution 3: NOT EXISTS

The main difference between the IN and EXISTS predicate in subquery is the way in which the query gets executed.

  • IN: The inner query is executed first and the list of values obtained in the result is used by the outer query. The inner query is executed only once.
  • EXISTS: Each row from the outer query result is selected, and compared against the inner query. Hence, the inner query is executed as many times as there are rows in the outer query result.  

Here is the syntax for the Not Exists Select statement:

 

SELECT  field list

FROM    left_table

WHERE   NOT EXISTS     

(     

       right_table.id   

       FROM    right_table       

       WHERE   right_table.id = left_table.id     

)

 

We would plug in our criteria like so:

 

SELECT  name

FROM    employees emp

WHERE   NOT EXISTS       

(     

       SELECT  emp_id    

       FROM    bonuses b   

       WHERE   b.emp_id = emp.id      

);

 

Once again, poor Mary Parker is without a single bonus.  This may not bode well for her next performance evaluation.

Bonus Solution: Group By with Union All

As promised, here is a fourth solution, using Group By with UNION ALL. The first step is to Union the two tables that we want to compare using Union ALL (the ALL retains duplicates). Then we group the result on the column that we want to compare. This will cause columns that appear in both inner SELECTS to have a count of two while unmatched records will only be counted once.

Here is the syntax for the Not Exists Select statement:

 

SELECT field list

FROM (

       SELECT DISTINCT field list FROM left_table

       UNION ALL

       SELECT DISTINCT field list FROM right_table

) AS tbl

GROUP BY field list HAVING COUNT(*) = 1

 

Be sure to group by the ID field so that different employees with the same name won't be treated as the same person:

 

SELECT name

FROM (

       SELECT DISTINCT id, name FROM employees

       UNION ALL

       SELECT DISTINCT emp_id, '' FROM bonuses

) AS tbl

GROUP BY id HAVING COUNT(*) = 1;

 

You guessed it; Mary Parker is once again left out in the cold.

I have no specific statistics that show how fast this style performs but anecdotal data would suggest that it compares quite favourably.

Which Style Should I Use?

Because of how the MySQL optimizer handles EXISTS, that method is about 30% less efficient than the others. If speed is your primary concern, then the LEFT JOIN / IS NULL and NOT IN styles are your best bets. Otherwise, you really can't go wrong with whichever style suits you.

See all articles by Robert Gravelle



MySQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MySQL Forum
Topic By Replies Updated
MySQL in high availability and transction secure banking application klamor 1 August 28th, 10:24 AM
MySQL rollback UAL225 1 August 28th, 10:15 AM
Browsing a DB file that uses MySql finleytech 1 July 26th, 10:51 AM
php cookie won't pass variable to next page ITdevGirl 0 June 22nd, 12:13 PM