Many existing queries make use of what are called subselects (selects within selects). For example, try the following query, which returns all employees who've worked with a tourgroup of more than 10 people:
SELECT employee_name
FROM tour_guides
WHERE employee_number IN
(SELECT employee_name
FROM tour_expeditions
WHERE tourgroup_size>10)
The results are
| employee_name |
| Siyabonge Nomvete |
| Mathew Booth |
This query resolves in 2 steps - first the inner query (which returns 923 and 978) is resolved. Then we are left with
SELECT employee_name
FROM tour_guides
WHERE employee_number IN
(923,978)
which resolves to the results above. But I've just demonstrated another way to do this, and which is usually a better way - the join. You can rewrite this query as:
SELECT employee_name
FROM tour_guides,tour_expeditions
WHERE tourgroup_size>10 AND
tour_guides.employee_number=tour_expeditions.employee_number
Why do I say this is better? 2 reasons. One is that many DBMS's (such as early versions of MySQL) do not support nested selects. And the second reason is that more often they can be rewritten as a join, and the join is usually more efficient. On those big, heavily used tables, where performance is vital, you will want to do without nested selects as much as possible.
Let's take another example. How could we find all tour_guides who have not yet given a tour? We could write
SELECT employee_name
FROM tour_guides
WHERE employee_number NOT IN
(SELECT employee_number
FROM tour_expeditions)
And this would return
| employee_name |
| Nelson Jiranga |
But, using the same principle as before, we could rewrite this as a join, in this case a LEFT JOIN (remembering that LEFT JOINS return values that are not present). Try the following:
SELECT employee_name
FROM tour_guides
LEFT JOIN tour_expeditions ON
tour_guides.employee_number = tour_expeditions.employee_number
WHERE tour_expeditions.employee_number IS NULL
Now we see an advantage of declaring employee_number NOT NULL. It allows us to use this kind of query, which is often more efficient than the nested select, and it also saves space (the DBMS does not have to waste space telling if the field is NULL or not--by definition it's not)