Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum Rss Feed

» Database Journal Home
» Database Articles
» Database Tutorials
MS Access
SQL Scripts & Samples
» Database Forum
» Slideshows
Free Newsletters:

News Via RSS Feed

Rss Feed

Database Journal |DBA Support |SQLCourse |SQLCourse2

Featured Database Articles


Posted Sep 11, 2009

Tips for Managing Complex Queries

By Rob Gravelle

When I first took on the role of reports designer for our systems, my first instinct was to try fetching all of the required fields in one huge SQL statement. I soon learned that there are better and easier ways to structure a query that will minimize the complexity while raising our confidence in the data returned. I’ll be sharing a few techniques here that will help simplify the writing of complex queries.

Part 1: Database Design

Structuring the database for easy querying should always be one of your primary design goals. Unfortunately, if your IT shop is anything like mine, data extraction seems to be the last thing on anybody’s mind when the database is being designed. Rather, all the decisions are based on how to best normalize the data structures and facilitate saving the data from systems that interface with the database. Under those circumstances, it’s only after the database design has long been finalized and the tables have been populated with data that you realize the price that you are now paying for leaving data extraction until the last task. Consequently, query designers have to perform back flips to present the data in a meaningful way.

Here are a few database design suggestions for making querying a little easier:

Identify Every Entity by a Unique Numeric Code

Examine the following table designs and notice that each has a numeric key code as the first column. These codes act as a unique identifier for each row (distinct entity) in the table. While not as intuitive to humans as text, numbers are a lot quicker for the database to index and link related values together:

Employee Table
emp_cd		last_name	first_name	phone_number	

Dependent Table
dependent_cd	last_name	first_name	date_of_birth

Employee Dependent Relationship Table
emp_cd		relationship_type_code	dependent_cd

Relationship Type Table
relationship_type_cd	english_desc	english_short_desc

To define a relationship for the Employee and Dependent tables, we would need two additional ones: the Relationship table and one containing the Relationship Types. You could store the relationship information in either or both tables, but that may lead to problems. First, if you only include relationship in one of the tables, you won’t be able to get back to the other one! To illustrate, if you were to add an emp_cd column to the Dependent table, you would be easily able to retrieve the guardians of the dependents, but you would be at a disadvantage to locate the dependents of a particular guardian as it would not contain any information about related dependants. The second issue is that including the foreign key in the table itself can cause a lot of duplication since the only different field would be the foreign key. When you consider that the Employee table might contain a dozen fields or more, that could quickly lead to Gigabytes of wasted storage space, not to mention an overburdened server! For this reason, it’s preferable to create a separate table containing only code values that define the relationship.

The Relationship Type table below is what is known as a code or lookup table. This is where common (shared ) information is stored. You could think of this table as a kind of data dictionary. Some of the advantages to using code tables include:

  • allowing us to use numeric code values in the relationship table
  • promoting easy maintainability
  • providing a centralized place to store common data

Here are the same four tables with some sample data:

Employee Table
emp_cd			last_name	first_name	phone_number
1			Smith		Roger		555-1234
2			Blattel		Mary		555-8877
3			Mercer		Steven		555-9980

Employee Dependent Relationship Table
emp_cd			relationship_type_code		dependent_cd
3			1				1
3			1				2
1			3				3

Dependent Table
dependent_cd		last_name	first_name	date_of_birth
1			Mercer		Joey		Feb 12, 1989
2			Mercer		Cindy		Aug 4, 1987
3			Smith		Amy		Dec 1, 1985

Relationship Type Table
relationship_type_cd	english_desc	english_short_desc
1			father		dad
2			mother		mom
3			grand-father	grandpa
4			grand-mother	grandma

Consider Including Text Keys in Addition to Codes

After all that talk about using numeric codes, I’m now going to encourage you to use text descriptor keys in your queries! While numeric codes improve database retrieval performance, they are equally detrimental to SQL development, as they hamper code readability. Take the following query:

select  english_desc
from    vcode_table_data
where   code_table_name  =   'routing_point'
and     code_value_name  =   'routing_point_usa_port_of_entry'

Even without knowing anything about the database contents, we can easily ascertain that the above statement is retrieving the description for the USA Port of Entry from the Routing Point code table.

The descriptor fields described here should not be confused with data fields, which contain values. Never use a data field to link tables. Clients are notorious for changing the names of things and adding alternate descriptions. In the Relationship Type table above, the description fields look harmless enough. It may seem almost pointless to add another field that duplicates the “father” and “mother” values. But were you to bend the rules and use the english_desc field in your query to join tables instead, you can be assured that the following week, a client will decide to capitalize the descriptions and do so without your knowledge. Then, before you know it, another client will complain that reports for a high level politician are failing! Perhaps by the vivid details, you can tell that I’ve been there before. It bears repeating that you should never use description fields to link tables, and by “should” I mean “don’t”!

Part 2: Query Syntax

Use Table Aliases

The use of table aliases in queries is a hotly debated topic among database administrators, but most agree that they do make complex queries more readable. Which would you prefer to read: dozens of "TT."s or "dbo.tblTrabeculectomyTechniques"? I thought so!

Beyond the readability issue, there are times that you must use table aliases. The most common is where multiple referenced tables contain the same column:

FROM inventory 	inv, 
	part 		p
WHERE 	inv.prod_id 	= 183 AND    
	inv.prod_id 	= p.prod_id;

Another, less common, use for table aliases are powerful for complex queries that need to refer to the same table twice. A reason to access a table twice would be to find customers who share the same surname:

SELECT	c1.cust_id, 
FROM	customer c1, 
	customer c2
WHERE 	c1.surname  = c2.surname AND 
	c1.cust_id != c2.cust_id;

The last line is important and makes sure that we’re not pointing back to the same person.

Avoid Table Scans by Properly Utilizing Index Fields

Indexes are used to find rows with specific column values quickly. Without an index, MySQL has to scan every row in the table to find the relevant rows. If a table has 1,000 rows, using an index is at least 100 times faster than reading sequentially.

To properly utilize an indexed column, don't include it within an expression that refers to only part of the column value. Instead, use the entire column for comparison. For example, one way to select rows containing date values from the year 1986 and up is as follows:

SELECT * FROM t WHERE YEAR(d) >= 1986;

In this case, the value of YEAR(d) must be evaluated for every row in the table, so the index cannot be used. Instead, write the query like this:

SELECT * FROM t WHERE YEAR(d) >= '1986-01-01';

In the rewritten expression, the indexed column stands by itself on one side of the comparison and MySQL can apply the index to optimize the query.

I hope that the tips presented here today will steer you in the right direction. This list is by no means exhaustive. In fact, it’s only really the tip of the iceberg. We’ll be returning to this subject at a later date, to supplement this list.

» See All Articles by Columnist Rob Gravelle

MySQL Archives

Latest Forum Threads
MySQL Forum
Topic By Replies Updated
MySQL in high availability and transction secure banking application klamor 1 August 28th, 10:24 AM
MySQL rollback UAL225 1 August 28th, 10:15 AM
Browsing a DB file that uses MySql finleytech 1 July 26th, 10:51 AM
php cookie won't pass variable to next page ITdevGirl 0 June 22nd, 12:13 PM