To minimize headaches that can accompany long field lists, DBAs employ a range of techniques. The specific techniques covered today are used to disambiguate between duplicate column names. Don’t be too quick to judge this as basic stuff. You just might find that there are a few things you don’t know on this subject.
In queries
involving multiple tables, there’s often more to column selection than listing
their names. To minimize headaches that can accompany long field lists,
database administrators employ a range of techniques. The specific techniques
that we will look at today are used to disambiguate between duplicate column
names. Think that this is basic stuff? You might be right, but don’t be too quick
to judge. You just might find that there are a few things you don’t know on
this subject.
The Test Data
Of course,
you can’t try your hand at SELECT queries without some data. For consistency,
you should use the same data that I am. The following code will create and
populate two tables named employees and shops. You may already
have this table data if you’ve read any of my recent articles.
# # Table structure for table 'employees' # CREATE TABLE /*!32312 IF NOT EXISTS*/ "employees" ( "id" int(11) NOT NULL AUTO_INCREMENT, "shop_id" int(11) NOT NULL DEFAULT '0', "gender" enum('m','f') NOT NULL, "name" varchar(32) NOT NULL, "salary" int(11) DEFAULT NULL, PRIMARY KEY ("id","shop_id"), KEY "Foreign Key" ("shop_id") ); # # Data for table 'employees' # LOCK TABLES "employees" WRITE; /*!40000 ALTER TABLE "employees" DISABLE KEYS;*/ INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (1,1,'m','Jon Simpson',4500); INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (2,1,'f','Barbara Breitenmoser',NULL); INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (3,2,'f','Kirsten Ruegg',5600); INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (4,3,'m','Ralph Teller',5100); INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (5,3,'m','Peter Jonson',5200); INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") VALUES (15,2,'f',' Kirsten Ruegg',5550); /*!40000 ALTER TABLE "employees" ENABLE KEYS;*/ UNLOCK TABLES; # # Table structure for table 'shops' # CREATE TABLE /*!32312 IF NOT EXISTS*/ "shops" ( "shop_id" int(11) NOT NULL AUTO_INCREMENT, "shop" varchar(32) DEFAULT NULL, PRIMARY KEY ("shop_id") ) AUTO_INCREMENT=4; # # Data for table 'shops' # LOCK TABLES "shops" WRITE; /*!40000 ALTER TABLE "shops" DISABLE KEYS;*/ INSERT INTO "shops" ("shop_id", "shop") VALUES (1,'Zurich'); INSERT INTO "shops" ("shop_id", "shop") VALUES (2,'New York'); INSERT INTO "shops" ("shop_id", "shop") VALUES (3,'London'); /*!40000 ALTER TABLE "shops" ENABLE KEYS;*/ UNLOCK TABLES;
Have you
ever executed a query, only to receive the following error?
MySQL>SELECT name, --> shop_id --> FROM employees, --> shops --> WHERE employees.shop_id = shops.shop_id --> AND shops.shop = 'New York'; MySQL>SQL Error: Column 'shop_id' in field list is ambiguous.
This
happens because the shop_id field appears in both the target tables. Therefore,
we need a way to tell the database which field we want. I’ll tell you how in a
moment, but first, I’d like to provide some background as to why tables
sometimes share the same column name.
Establishing Table
Relationships
It is
common practice to assign the same name to fields involved in table joins to
better demonstrate a relationship. Thus, the shop_id field appears in both the
employees and shops tables because both columns refer to the same data. The
shops and employees tables have a parent/child relationship in that the
employees table depends on the values that are stored in the shops table. They
also possess what is called a “one-to-many” relationship. That is to say that
a shop_id from the shops table may appear multiple times in the employees
table. In fact, the type of relationship that is shared by the two tables is
largely determined by the role of the shop_id column of the shops table. It is
the primary key. That field must belong to the parent table because the primary
key must be a unique field. The following diagram shows how each unique
shop_id in the shops table relates to numerous records of the employees table:
Table Referencing
Now that
we’ve gone over why tables may share the same column name, let’s examine what
that means for the column list of our SELECT statements.
Wherever
you reference a common field in the column list, you have to preface it with
the table name and a dot (.). Here’s the same query again, but with the
employees table provided as the source table for the shop_id field:
MySQL>SELECT name, --> employees.shop_id --> FROM employees, --> shops --> WHERE employees.shop_id = shops.shop_id --> AND shops.shop = 'New York';
Executing
the above code produces the following result set:
name |
shop_id |
Kirsten |
2 |
I chose
the employees table because we are looking for employee details, but note that,
because the name in question refers to the same data, either table would
produce the same results. Go ahead and try it! Although it’s easy for us to
deduce, the database can’t determine the fields’ relationship from the SQL
alone. Furthermore, we can’t just assume that they are related. I have seen databases
that use the same name for completely unrelated fields. The bigger your
database, the more chances of redundancy there are! Even the use of good
naming conventions can’t always prevent redundancies in column naming to occur.
Column Redundancy and the
Asterisk (*) Symbol
As you may
already be aware, the asterisk (*) symbol is short form for “give me all of the
columns in the table list”, a simple enough affair for one table:
SELECT * FROM shops;
…produces:
shop_id |
shop |
1 |
Zurich |
2 |
New |
3 |
London |
Things
start to get a lot more interesting when we introduce more tables, as seen here.
SELECT * FROM employees, shops WHERE employees.shop_id = shops.shop_id;
This time
the database doesn’t complain that the shop_id field appears in both tables.
Instead, it handles the duplicate by appending a “_1” to the field name:
id |
shop_id |
gender |
name |
salary |
shop_id_1 |
shop |
1 |
1 |
m |
Jon |
4500 |
1 |
Zurich |
2 |
1 |
f |
Barbara |
(NULL) |
1 |
Zurich |
3 |
2 |
f |
Kirsten |
5600 |
2 |
New |
15 |
2 |
f |
Kirsten |
5550 |
2 |
New |
4 |
3 |
m |
Ralph |
5100 |
3 |
London |
5 |
3 |
m |
Peter |
5200 |
3 |
London |
There
would be no point in differentiating between the two fields because the
asterisk indicates that both are to be displayed.
Having
said that, you can still effectuate some degree of control over the display of
duplicate fields by using the asterisk in conjunction with a table prefix. To
eliminate one of the shop_ids from the result set, we can select all fields
from the employees table along with the shop field from the shops table. It’s
better to use the asterisk with the table with more fields as that will create
a shorter field list:
SELECT employees.*, shops.shop FROM employees, shops WHERE employees.shop_id = shops.shop_id;
No more
shop_id_1 field:
id |
shop_id |
gender |
name |
salary |
shop |
1 |
1 |
m |
Jon |
4500 |
Zurich |
2 |
1 |
f |
Barbara |
(NULL) |
Zurich |
3 |
2 |
f |
Kirsten |
5600 |
New |
15 |
2 |
f |
Kirsten |
5550 |
New |
4 |
3 |
m |
Ralph |
5100 |
London |
5 |
3 |
m |
Peter |
5200 |
London |
Name Aliasing
It’s a
good habit to always preface field names by their table. Even if there are no
naming conflicts, it’ll be a lot easier to identify where the fields are coming
from. With a maximum name length of 64 characters, there may be a lot of
typing involved, and who wants that? Luckily, you can use aliases for both the
table and column names to reduce they amount of typing. All you need to do is
include the “AS” keyword followed by the alias name. Here is a query that
displays pertinent information for each employee, including the name of the
shop at which they work:
SELECT emp.shop_id AS 'Shop ID', emp.name AS 'Full Name', emp.gender AS 'Gender', emp.salary AS 'Rate', sh.shop AS 'Store' FROM employees AS emp, shops AS sh WHERE emp.shop_id = sh.shop_id;
Once you
use a table alias, you must use it everywhere that you would normally use the
full table name, including in the WHERE clause. Here are the results of the
above statement:
Shop ID |
Full Name |
Gender |
Rate |
Store |
1 |
Jon |
m |
4500 |
Zurich |
1 |
Barbara |
f |
(NULL) |
Zurich |
2 |
Kirsten |
f |
5600 |
New |
3 |
Ralph |
m |
5100 |
London |
3 |
Peter |
m |
5200 |
London |
Following
these simple tips will help you to better identify which fields you are
referring to in your SELECT queries as well as eliminate duplicates that you
don’t want to see in your result sets.