Disambiguating between Duplicate Column Names in MySQL


To minimize headaches that can accompany long field lists, DBAs employ a range of techniques. The specific techniques covered today are used to disambiguate between duplicate column names. Don’t be too quick to judge this as basic stuff. You just might find that there are a few things you don’t know on this subject.

In queries
involving multiple tables, there’s often more to column selection than listing
their names. To minimize headaches that can accompany long field lists,
database administrators employ a range of techniques. The specific techniques
that we will look at today are used to disambiguate between duplicate column
names. Think that this is basic stuff? You might be right, but don’t be too quick
to judge. You just might find that there are a few things you don’t know on
this subject.

The Test Data

Of course,
you can’t try your hand at SELECT queries without some data. For consistency,
you should use the same data that I am. The following code will create and
populate two tables named employees and shops. You may already
have this table data if you’ve read any of my recent articles.

#
# Table structure for table 'employees'
#
CREATE TABLE /*!32312 IF NOT EXISTS*/ "employees" (
  "id" int(11) NOT NULL AUTO_INCREMENT,
  "shop_id" int(11) NOT NULL DEFAULT '0',
  "gender" enum('m','f') NOT NULL,
  "name" varchar(32) NOT NULL,
  "salary" int(11) DEFAULT NULL,
  PRIMARY KEY ("id","shop_id"),
  KEY "Foreign Key" ("shop_id")
);
#
# Data for table 'employees'
#
LOCK TABLES "employees" WRITE;
/*!40000 ALTER TABLE "employees" DISABLE KEYS;*/
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (1,1,'m','Jon Simpson',4500);
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (2,1,'f','Barbara Breitenmoser',NULL);
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (3,2,'f','Kirsten Ruegg',5600);
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (4,3,'m','Ralph Teller',5100);
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (5,3,'m','Peter Jonson',5200);
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (15,2,'f',' Kirsten Ruegg',5550);

/*!40000 ALTER TABLE "employees" ENABLE KEYS;*/
UNLOCK TABLES;
#
# Table structure for table 'shops'
#
CREATE TABLE /*!32312 IF NOT EXISTS*/ "shops" (
  "shop_id" int(11) NOT NULL AUTO_INCREMENT,
  "shop" varchar(32) DEFAULT NULL,
  PRIMARY KEY ("shop_id")
) AUTO_INCREMENT=4;

#
# Data for table 'shops'
#
LOCK TABLES "shops" WRITE;
/*!40000 ALTER TABLE "shops" DISABLE KEYS;*/
INSERT INTO "shops" ("shop_id", "shop") 
    VALUES (1,'Zurich');
INSERT INTO "shops" ("shop_id", "shop") 
    VALUES (2,'New York');
INSERT INTO "shops" ("shop_id", "shop") 
    VALUES (3,'London');
/*!40000 ALTER TABLE "shops" ENABLE KEYS;*/
UNLOCK TABLES;

Have you
ever executed a query, only to receive the following error?

MySQL>SELECT name, 
-->          shop_id
-->   FROM   employees,
-->          shops
-->   WHERE  employees.shop_id = shops.shop_id
-->   AND    shops.shop = 'New York';
MySQL>SQL Error: Column 'shop_id' in field list is ambiguous.

This
happens because the shop_id field appears in both the target tables. Therefore,
we need a way to tell the database which field we want. I’ll tell you how in a
moment, but first, I’d like to provide some background as to why tables
sometimes share the same column name.

Establishing Table
Relationships

It is
common practice to assign the same name to fields involved in table joins to
better demonstrate a relationship. Thus, the shop_id field appears in both the
employees and shops tables because both columns refer to the same data. The
shops and employees tables have a parent/child relationship in that the
employees table depends on the values that are stored in the shops table. They
also possess what is called a “one-to-many” relationship. That is to say that
a shop_id from the shops table may appear multiple times in the employees
table. In fact, the type of relationship that is shared by the two tables is
largely determined by the role of the shop_id column of the shops table. It is
the primary key. That field must belong to the parent table because the primary
key must be a unique field. The following diagram shows how each unique
shop_id in the shops table relates to numerous records of the employees table:

Table Referencing

Now that
we’ve gone over why tables may share the same column name, let’s examine what
that means for the column list of our SELECT statements.

Wherever
you reference a common field in the column list, you have to preface it with
the table name and a dot (.). Here’s the same query again, but with the
employees table provided as the source table for the shop_id field:

MySQL>SELECT name, 
-->          employees.shop_id
-->   FROM   employees,
-->          shops
-->   WHERE  employees.shop_id = shops.shop_id
-->   AND    shops.shop = 'New York';

Executing
the above code produces the following result set:

name

shop_id

Kirsten
Ruegg

2

I chose
the employees table because we are looking for employee details, but note that,
because the name in question refers to the same data, either table would
produce the same results. Go ahead and try it! Although it’s easy for us to
deduce, the database can’t determine the fields’ relationship from the SQL
alone. Furthermore, we can’t just assume that they are related. I have seen databases
that use the same name for completely unrelated fields. The bigger your
database, the more chances of redundancy there are! Even the use of good
naming conventions can’t always prevent redundancies in column naming to occur.

Column Redundancy and the
Asterisk (*) Symbol

As you may
already be aware, the asterisk (*) symbol is short form for “give me all of the
columns in the table list”, a simple enough affair for one table:

SELECT * FROM shops; 

…produces:

shop_id

shop

1

Zurich

2

New
York

3

London

Things
start to get a lot more interesting when we introduce more tables, as seen here.

SELECT *
FROM   employees,
       shops
WHERE  employees.shop_id = shops.shop_id;

This time
the database doesn’t complain that the shop_id field appears in both tables.
Instead, it handles the duplicate by appending a “_1” to the field name:

id

shop_id

gender

name

salary

shop_id_1

shop

1

1

m

Jon
Simpson

4500

1

Zurich

2

1

f

Barbara
Breitenmoser

(NULL)

1

Zurich

3

2

f

Kirsten
Ruegg

5600

2

New
York

15

2

f

Kirsten
Ruegg

5550

2

New
York

4

3

m

Ralph
Teller

5100

3

London

5

3

m

Peter
Jonson

5200

3

London

There
would be no point in differentiating between the two fields because the
asterisk indicates that both are to be displayed.

Having
said that, you can still effectuate some degree of control over the display of
duplicate fields by using the asterisk in conjunction with a table prefix. To
eliminate one of the shop_ids from the result set, we can select all fields
from the employees table along with the shop field from the shops table. It’s
better to use the asterisk with the table with more fields as that will create
a shorter field list:

SELECT employees.*,
       shops.shop
FROM   employees,
       shops
WHERE  employees.shop_id = shops.shop_id;

No more
shop_id_1 field:

id

shop_id

gender

name

salary

shop

1

1

m

Jon
Simpson

4500

Zurich

2

1

f

Barbara
Breitenmoser

(NULL)

Zurich

3

2

f

Kirsten
Ruegg

5600

New
York

15

2

f

Kirsten
Ruegg

5550

New
York

4

3

m

Ralph
Teller

5100

London

5

3

m

Peter
Jonson

5200

London

Name Aliasing

It’s a
good habit to always preface field names by their table. Even if there are no
naming conflicts, it’ll be a lot easier to identify where the fields are coming
from. With a maximum name length of 64 characters, there may be a lot of
typing involved, and who wants that? Luckily, you can use aliases for both the
table and column names to reduce they amount of typing. All you need to do is
include the “AS” keyword followed by the alias name. Here is a query that
displays pertinent information for each employee, including the name of the
shop at which they work:

SELECT emp.shop_id AS 'Shop ID',
       emp.name    AS 'Full Name',
       emp.gender  AS 'Gender', 
       emp.salary  AS 'Rate',
       sh.shop     AS 'Store'
FROM   employees   AS emp,
       shops       AS sh
WHERE  emp.shop_id = sh.shop_id;

Once you
use a table alias, you must use it everywhere that you would normally use the
full table name, including in the WHERE clause. Here are the results of the
above statement:

Shop ID

Full Name

Gender

Rate

Store

1

Jon
Simpson

m

4500

Zurich

1

Barbara
Breitenmoser

f

(NULL)

Zurich

2

Kirsten
Ruegg

f

5600

New
York

3

Ralph
Teller

m

5100

London

3

Peter
Jonson

m

5200

London

Following
these simple tips will help you to better identify which fields you are
referring to in your SELECT queries as well as eliminate duplicates that you
don’t want to see in your result sets.

»


See All Articles by Columnist

Rob Gravelle

Robert Gravelle
Robert Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles