Joins
Up till now we have focussed on grabbing
data from a single table in the database. However, the true
power of the relational database comes from the fact that
data is stored in multiple tables that are related by values
within them. Thus, performing a uni-table SELECT is only the
most trivial of operations afforded by SQL.
The true power of the SQL database lies in
its ability to create views based upon the data in multiple tables
using the JOIN operation.
To create a JOIN, you simply define
two or more tables in your SELECT-FROM statement. For example, consider
the following which joins the
CLIENTS and
SALES tables into a single view:
SELECT C_NAME,
S_NUM,
S_QUANTITY
FROM SALES, CLIENTS;
The previous SQL code would yield the following
results:
C_NAME S_NUM S_QUANTITY
---------------------------------
Jason Lim 001 1
Rick Tan 001 1
Stephen Petersen 001 1
Jason Lim 002 2
Rick Tan 002 2
Stephen Petersen 002 2
Jason Lim 003 1
Rick Tan 003 1
Stephen Petersen 003 1
---------------------------------
Notice that the results will contain every
combination of the two tables.
But what happens if two tables share the same
name? SQL assigns name space by specifying that when two column
names are the same
that you should specify the table name along with the column name
with a period in between. Thus, consider the following example in which
column C_NUM is shared by both the CLIENTS and the SALES tables.
SELECT C_NAME,
S_NUM,
SALES.C_NUM
FROM SALES, CLIENTS;
In this case, SQL will draw the data from the C_NUM
column in the SALES table and not from the CLIENTS table. Being able
to specify columns exactly is important because it helps us sculpt
complex and more useful joins using the
WHERE clause.
As you saw above, a raw JOIN returns perhaps too
much data. For example, you might want to see unique sales information
tied to the name of the client. In this case, you would use the
WHERE clause to limit the results as follows:
SELECT C_NAME,
S_NUM,
S_AMOUNT
FROM SALES, CLIENTS
WHERE CLIENTS.C_NUM = SALES.C_NUM;
C_NAME S_NUM S_AMOUNT
---------------------------------
Rick Tan 001 99.99
Jason Lim 002 199.98
Stephen Petersen 003 865.99
---------------------------------
Note that all of the operators for
SELECT can work
in JOINS including
LIKE/NOT LIKE,
IN/NOT IN,
NULL/NOT NULL,
AND/OR/NOT,
COUNT,
AVG, etc....
Subqueries
As we have already discussed, relational databases
are table based. That is, all SQL commands are applied
to the contents of tables and then those results are displayed
as tables themselves. One great consequence of that is that
it is fairly easy to perform queries upon the virtual tables created by
other queries. Or, in jargonese, to perform subqueries.
Consider this example in which we want
to get a listing of all the
clients
who made a transaction of greater than $150.00.
SELECT DISTINCT C_NAME, C_PHONE
FROM CLIENTS
WHERE EMP_NUM IN
(
SELECT C_NUM
FROM SALES
WHERE S_AMOUNT > 150.00
);
We would expect the following results:
C_NAME C_PHONE
--------------------------
Jason Lim 456-7890
Stephen Petersen 167-3333
--------------------------
As you can see, the subquery in the
WHERE clause would return a set
containing "101" and "103". Next, the top-level query would return
the two clients. In this case, the
DISTINCT operator was not necessary
since we don't have many sales in our SALES table. However, for most
subqueries, it is useful to include the distinct so that you do not
get a client repeated for multiple sales. Of course, you
might want that. :)