T-SQL Best Practices
June 29, 2009
Application performance is driven by a number of different factors. One of those items that typically affect performance is how long it takes SQL Server to process your T-SQL statements. Sometimes the database design and the complicated query requirements hinder the speed at which your T-SQL statements can run. Other times the way the T-SQL statements are coded cause SQL Server to perform extra work to resolve a query. By rewriting the statements differently, you can help the SQL Server engine optimize your queries and improve performance. This article will be the first in a series of articles where I will be discussing T-SQL coding best practices. These practices will help you understand how to write your queries to optimize SQL Server resources and improve performance.
Explicitly Name Columns in Your SELECT Statements
How many times have you coded a statement similar to this?
SELECT * FROM MyTable;
Using the asterisk (*) nomenclature tells the database engine that you want to return all columns from the table or tables identified in the FROM clause. It is not a good practice to do this even if you want all the columns returned to your application. It is better to explicitly name the columns within your table or tables like this:
SELECT ID, Description, DateModified FROM MyTable;
Using explicit names of columns in your SELECT statements within your code has a number of advantages. First, SQL Server is only returning the data your application needs, and not a bunch of additional data that your application will not use. By returning only the data you need you are optimizing the amount of work SQL Server needs to do to gather all the columns of information you require. Also by not using the asterisk (*) nomenclature you are also minimizing the amount of network traffic (number of bytes) required to send the data associated with your SELECT statement to your application.
Additionally by explicitly naming your columns, you are insulating your application from potential failures related to some database schema change that might happen to any table you reference in your SELECT statement. If you were to use the asterick (*) nomenclature and someone was to add a new column to a table, your application would start receiving data for this additional column of data, even without changing your application code. If your application were expecting only a specific number of columns to be returned, then it would fail as soon as someone added an additional column to one of your referenced tables. Therefore, by explicitly naming columns in your SELECT statement your application will always get the same number of columns returned, even if someone adds a new column to any one of the tables referenced in your SELECT statement.
Identify Column Names in Your INSERT statements
Just like the above best practice, you should explicitly identify the column names for the data you are inserting with an INSERT statement. Dont code your INSERT statements like this:
INSERT INTO MyTable VALUES ('A','B','C');
When you use this coding style, SQL Server requires that only three columns be defined in MyTable, and the value A would go in the first column, B in the second column and C in the last column. If someone adds a new column to MyTable your application code will break with the following error:
Msg 213, Level 16, State 1, Line 1 Column name or number of supplied values does not match table definition.
Therefore, instead of using the above coding style for your INSERT statements you should code them like this:
INSERT INTO MyTable(FirstCode, SecondCode, ThirdCode) VALUES ('A','B','C');
By doing this when someone adds a new column named FourthCode to MyTable the above INSERT statement will continue to work, provided the FourthCode column was create with a DEFAULT value or allows NULLS.
Speed Up Your Searches by Prefixing Wild Card References
Appropriate use of wild card reference can improve the performance of your queries. Say you want to search the AdventureWorks.Person.Contact table for all the LastNames ending in sen. For a moment, lets assume that you have also built an index on the LastName column. If you code your search like so:
SELECT Distinct LastName FROM Person.Contact WHERE LastName LIKE '%sen'
The code uses the wild card percent (%) character to match zero to many characters followed by the string sen in the LastName field. This will cause SQL Server to perform an index scan operation looking for all the names that end in sen to resolve this query. This makes sense because until the entire table is read (scanned) SQL Server cant guarantee that it has found all the records where the LastName ends in sen.
In addition, if you were searching for LastNames that where exactly six characters long and ended in sen you could code your wild card search like this:
SELECT Distinct LastName FROM Person.Contact WHERE LastName LIKE '___sen'
Here I have used the underbar (_) wild card character. This wild card character is used to match a single character. This coding example is similar to the prior example, and uses an index scan operation to resolve it. Once again, the SQL Engine knows it has to scan the complete index before it knows it has found all the six character names that end in sen in the Person.Contact table.
SQL Server can return your results faster if it doesnt have to read the entire index using a scan operation. SQL Server is smart enough to know when you place a prefix of some kind in front of your percent (%) and/or underbar (_) wild card characters that it can use an index seek operation to resolve the wild card search criteria. Here is an example of a statement that will return all the LastNames that start with A and end in sen:
SELECT Distinct LastName FROM Person.Contact WHERE LastName LIKE 'A%sen'
By putting the A character in front of the percent (%) sign in the search criteria SQL Server is now able to tell that it can use an index seek operation to resolve this query. This make sense because once SQL Server has gotten to the end of the A for last name , it knows there are no more last names that start A so it can stop processing.
Not all wildcard characters need to be prefixed in order to make SQL Server use an index seek operation to resolve a query. Here is an example of where I can use a range wildcard expression and still have SQL Server resolve this query using an index seek operation:
SELECT Distinct LastName FROM Person.Contact WHERE LastName LIKE '[A-M]%sen'
In this T-SQL statement, Im looking for all the LastNames that start with anything between an A through M and end with sen. Other wild card syntax that identifies a specific set of characters can also invoke an index scan operation to resolve a wild card search criteria.
Only Use DISTINCT If You Need It
Placing the DISTINCT clause on your SELECT statement takes the results of your query and removes the duplicates. It takes SQL Server an extra SORT operation to sort the data so it can identify and remove the duplicates. Therefore, if you already know that results will not contain duplicates than dont put the DISTINCT clause on your T-SQL statement. By putting the DISTINCT clause on your query, you are telling SQL Server to perform the sort and un-duplication process. This is extra work for SQL Server and provides no value when your result set only contains unique sets of records in the first place.
Only Use UNION If You Need It
The UNION clause removes duplicate records between the two sets it is union-ing together. Just like in the prior example, in this example the UNION operator requires a SORT Operation so SQL Server can remove any duplicates. If you know that neither set contains duplicate sets of records then the extra effort that a UNION requires to make SQL Server execute a sort and un-duplication process is extra work that SQL Server doesnt need to do. Therefore, when you need to use the UNION operator to concatenate two record sets together, where both sets as a whole contain a unique set of data, it is better to use the UNION ALL operator. The UNION ALL operator does not remove duplicates therefore it uses less SQL Server backend processing to perform the union operation because no sort/dedup process has to be executed. Less work for SQL Server means your union operation will be performed quicker using the UNION ALL operator.
Faster Code by Using Best Practices
There are many reasons to follow best practices. Some best practices help your application code not break when schema changes are made, while other best practices improve the performance of your queries, by minimizing CPU, I/O and network bandwidth. This set of best practices covered some simple things you can do to minimize the resources used by SQL Server when processing your SQL Server statements.