dcsimg

Disambiguating between Duplicate Column Names in MySQL

September 21, 2010

To minimize headaches that can accompany long field lists, DBAs employ a range of techniques. The specific techniques covered today are used to disambiguate between duplicate column names. Don’t be too quick to judge this as basic stuff. You just might find that there are a few things you don’t know on this subject.

In queries involving multiple tables, there’s often more to column selection than listing their names. To minimize headaches that can accompany long field lists, database administrators employ a range of techniques. The specific techniques that we will look at today are used to disambiguate between duplicate column names. Think that this is basic stuff? You might be right, but don’t be too quick to judge. You just might find that there are a few things you don’t know on this subject.

The Test Data

Of course, you can’t try your hand at SELECT queries without some data. For consistency, you should use the same data that I am. The following code will create and populate two tables named employees and shops. You may already have this table data if you’ve read any of my recent articles.

#
# Table structure for table 'employees'
#
CREATE TABLE /*!32312 IF NOT EXISTS*/ "employees" (
  "id" int(11) NOT NULL AUTO_INCREMENT,
  "shop_id" int(11) NOT NULL DEFAULT '0',
  "gender" enum('m','f') NOT NULL,
  "name" varchar(32) NOT NULL,
  "salary" int(11) DEFAULT NULL,
  PRIMARY KEY ("id","shop_id"),
  KEY "Foreign Key" ("shop_id")
);
#
# Data for table 'employees'
#
LOCK TABLES "employees" WRITE;
/*!40000 ALTER TABLE "employees" DISABLE KEYS;*/
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (1,1,'m','Jon Simpson',4500);
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (2,1,'f','Barbara Breitenmoser',NULL);
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (3,2,'f','Kirsten Ruegg',5600);
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (4,3,'m','Ralph Teller',5100);
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (5,3,'m','Peter Jonson',5200);
INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
    VALUES (15,2,'f',' Kirsten Ruegg',5550);

/*!40000 ALTER TABLE "employees" ENABLE KEYS;*/
UNLOCK TABLES;
#
# Table structure for table 'shops'
#
CREATE TABLE /*!32312 IF NOT EXISTS*/ "shops" (
  "shop_id" int(11) NOT NULL AUTO_INCREMENT,
  "shop" varchar(32) DEFAULT NULL,
  PRIMARY KEY ("shop_id")
) AUTO_INCREMENT=4;

#
# Data for table 'shops'
#
LOCK TABLES "shops" WRITE;
/*!40000 ALTER TABLE "shops" DISABLE KEYS;*/
INSERT INTO "shops" ("shop_id", "shop") 
    VALUES (1,'Zurich');
INSERT INTO "shops" ("shop_id", "shop") 
    VALUES (2,'New York');
INSERT INTO "shops" ("shop_id", "shop") 
    VALUES (3,'London');
/*!40000 ALTER TABLE "shops" ENABLE KEYS;*/
UNLOCK TABLES;

Have you ever executed a query, only to receive the following error?

MySQL>SELECT name, 
-->          shop_id
-->   FROM   employees,
-->          shops
-->   WHERE  employees.shop_id = shops.shop_id
-->   AND    shops.shop = 'New York';
MySQL>SQL Error: Column 'shop_id' in field list is ambiguous.

This happens because the shop_id field appears in both the target tables. Therefore, we need a way to tell the database which field we want. I’ll tell you how in a moment, but first, I’d like to provide some background as to why tables sometimes share the same column name.

Establishing Table Relationships

It is common practice to assign the same name to fields involved in table joins to better demonstrate a relationship. Thus, the shop_id field appears in both the employees and shops tables because both columns refer to the same data. The shops and employees tables have a parent/child relationship in that the employees table depends on the values that are stored in the shops table. They also possess what is called a “one-to-many” relationship. That is to say that a shop_id from the shops table may appear multiple times in the employees table. In fact, the type of relationship that is shared by the two tables is largely determined by the role of the shop_id column of the shops table. It is the primary key. That field must belong to the parent table because the primary key must be a unique field. The following diagram shows how each unique shop_id in the shops table relates to numerous records of the employees table:

Table Referencing

Now that we’ve gone over why tables may share the same column name, let’s examine what that means for the column list of our SELECT statements.

Wherever you reference a common field in the column list, you have to preface it with the table name and a dot (.). Here’s the same query again, but with the employees table provided as the source table for the shop_id field:

MySQL>SELECT name, 
-->          employees.shop_id
-->   FROM   employees,
-->          shops
-->   WHERE  employees.shop_id = shops.shop_id
-->   AND    shops.shop = 'New York';

Executing the above code produces the following result set:

name

shop_id

Kirsten Ruegg

2

I chose the employees table because we are looking for employee details, but note that, because the name in question refers to the same data, either table would produce the same results. Go ahead and try it! Although it’s easy for us to deduce, the database can’t determine the fields’ relationship from the SQL alone. Furthermore, we can’t just assume that they are related. I have seen databases that use the same name for completely unrelated fields. The bigger your database, the more chances of redundancy there are! Even the use of good naming conventions can’t always prevent redundancies in column naming to occur.

Column Redundancy and the Asterisk (*) Symbol

As you may already be aware, the asterisk (*) symbol is short form for “give me all of the columns in the table list”, a simple enough affair for one table:

SELECT * FROM shops; 

...produces:

shop_id

shop

1

Zurich

2

New York

3

London

Things start to get a lot more interesting when we introduce more tables, as seen here.

SELECT *
FROM   employees,
       shops
WHERE  employees.shop_id = shops.shop_id;

This time the database doesn’t complain that the shop_id field appears in both tables. Instead, it handles the duplicate by appending a “_1” to the field name:

id

shop_id

gender

name

salary

shop_id_1

shop

1

1

m

Jon Simpson

4500

1

Zurich

2

1

f

Barbara Breitenmoser

(NULL)

1

Zurich

3

2

f

Kirsten Ruegg

5600

2

New York

15

2

f

Kirsten Ruegg

5550

2

New York

4

3

m

Ralph Teller

5100

3

London

5

3

m

Peter Jonson

5200

3

London

There would be no point in differentiating between the two fields because the asterisk indicates that both are to be displayed.

Having said that, you can still effectuate some degree of control over the display of duplicate fields by using the asterisk in conjunction with a table prefix. To eliminate one of the shop_ids from the result set, we can select all fields from the employees table along with the shop field from the shops table. It’s better to use the asterisk with the table with more fields as that will create a shorter field list:

SELECT employees.*,
       shops.shop
FROM   employees,
       shops
WHERE  employees.shop_id = shops.shop_id;

No more shop_id_1 field:

id

shop_id

gender

name

salary

shop

1

1

m

Jon Simpson

4500

Zurich

2

1

f

Barbara Breitenmoser

(NULL)

Zurich

3

2

f

Kirsten Ruegg

5600

New York

15

2

f

Kirsten Ruegg

5550

New York

4

3

m

Ralph Teller

5100

London

5

3

m

Peter Jonson

5200

London

Name Aliasing

It’s a good habit to always preface field names by their table. Even if there are no naming conflicts, it’ll be a lot easier to identify where the fields are coming from. With a maximum name length of 64 characters, there may be a lot of typing involved, and who wants that? Luckily, you can use aliases for both the table and column names to reduce they amount of typing. All you need to do is include the “AS” keyword followed by the alias name. Here is a query that displays pertinent information for each employee, including the name of the shop at which they work:

SELECT emp.shop_id AS 'Shop ID',
       emp.name    AS 'Full Name',
       emp.gender  AS 'Gender', 
       emp.salary  AS 'Rate',
       sh.shop     AS 'Store'
FROM   employees   AS emp,
       shops       AS sh
WHERE  emp.shop_id = sh.shop_id;

Once you use a table alias, you must use it everywhere that you would normally use the full table name, including in the WHERE clause. Here are the results of the above statement:

Shop ID

Full Name

Gender

Rate

Store

1

Jon Simpson

m

4500

Zurich

1

Barbara Breitenmoser

f

(NULL)

Zurich

2

Kirsten Ruegg

f

5600

New York

3

Ralph Teller

m

5100

London

3

Peter Jonson

m

5200

London

Following these simple tips will help you to better identify which fields you are referring to in your SELECT queries as well as eliminate duplicates that you don’t want to see in your result sets.

» See All Articles by Columnist Rob Gravelle








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers