Application performance is driven by a number of different
factors. One of those items that typically affect performance is how long it
takes SQL Server to process your T-SQL statements. Sometimes the database
design and the complicated query requirements hinder the speed at which your
T-SQL statements can run. Other times the way the T-SQL statements are coded
cause SQL Server to perform extra work to resolve a query. By rewriting the
statements differently, you can help the SQL Server engine optimize your queries and improve performance. This article
will be the first in a series of articles where I will be discussing T-SQL
coding best practices. These practices will help you understand how to write
your queries to optimize SQL Server resources and improve performance.
Explicitly Name Columns in Your SELECT Statements
How many times have you coded a statement similar to this?
SELECT * FROM MyTable;
Using the asterisk (*) nomenclature tells the database engine that you
want to return all columns from the table or tables identified in the FROM
clause. It is not a good practice to do this even if you want all the columns
returned to your application. It is better to explicitly name the columns
within your table or tables like this:
SELECT ID, Description, DateModified FROM MyTable;
Using explicit names of columns in your SELECT statements
within your code has a number of advantages. First, SQL Server is only
returning the data your application needs, and not a bunch of additional data
that your application will not use. By returning only the data you need you
are optimizing the amount of work SQL Server needs to do to gather all the
columns of information you require. Also by not using the asterisk (*)
nomenclature you are also minimizing the amount of network traffic (number of
bytes) required to send the data associated with your SELECT statement to your
Additionally by explicitly naming your columns, you are
insulating your application from potential failures related to some database
schema change that might happen to any table you reference in your SELECT
statement. If you were to use the asterick (*) nomenclature and someone was to
add a new column to a table, your application would start receiving data for
this additional column of data, even without changing your application code.
If your application were expecting only a specific number of columns to be
returned, then it would fail as soon as someone added an additional column to
one of your referenced tables. Therefore, by explicitly naming columns in your
SELECT statement your application will always get the same number of columns
returned, even if someone adds a new column to any one of the tables referenced
in your SELECT statement.
Identify Column Names in Your INSERT statements
Just like the above best practice, you should explicitly
identify the column names for the data you are inserting with an INSERT
statement. Dont code your INSERT statements like this:
INSERT INTO MyTable VALUES ('A','B','C');
When you use this coding style, SQL Server requires that
only three columns be defined in MyTable, and the value A would go in the
first column, B in the second column and C in the last column. If someone
adds a new column to MyTable your application code will break with the
Msg 213, Level 16, State 1, Line 1
Column name or number of supplied values does not match table definition.
Therefore, instead of using the above coding style
for your INSERT statements you should code them like this:
INSERT INTO MyTable(FirstCode, SecondCode, ThirdCode) VALUES ('A','B','C');
By doing this when someone adds a new column named FourthCode
to MyTable the above INSERT statement will continue to work, provided the FourthCode
column was create with a DEFAULT value or allows NULLS.
Speed Up Your Searches by Prefixing Wild Card References
Appropriate use of wild card reference can improve the
performance of your queries. Say you want to search the AdventureWorks.Person.Contact
table for all the LastNames ending in sen. For a moment, lets assume that
you have also built an index on the LastName column. If you code your search
SELECT Distinct LastName
WHERE LastName LIKE '%sen'
The code uses the wild card percent (%) character to match
zero to many characters followed by the string sen in the LastName field.
This will cause SQL Server to perform an index scan operation looking for all
the names that end in sen to resolve this query. This makes sense because
until the entire table is read (scanned) SQL Server cant guarantee that it has
found all the records where the LastName ends in sen.
In addition, if you were searching for LastNames that where
exactly six characters long and ended in sen you could code your wild card
search like this:
SELECT Distinct LastName
WHERE LastName LIKE '___sen'
Here I have used the underbar (_) wild card character. This
wild card character is used to match a single character. This coding example
is similar to the prior example, and uses an index scan operation to resolve
it. Once again, the SQL Engine knows it has to scan the complete index before
it knows it has found all the six character names that end in sen in the Person.Contact
SQL Server can return your results faster if it doesnt have
to read the entire index using a scan operation. SQL Server is smart enough to
know when you place a prefix of some kind in front of your percent (%) and/or underbar
(_) wild card characters that it can use an index seek operation to resolve the
wild card search criteria. Here is an example of a statement that will return
all the LastNames that start with A and end in sen:
SELECT Distinct LastName
WHERE LastName LIKE 'A%sen'
By putting the A character in front of the percent (%)
sign in the search criteria SQL Server is now able to tell that it can use an
index seek operation to resolve this query. This make sense because once SQL
Server has gotten to the end of the A for last name , it knows there are no
more last names that start A so it can stop processing.
Not all wildcard characters need to be prefixed in order to
make SQL Server use an index seek operation to resolve a query. Here is an
example of where I can use a range wildcard expression and still have SQL
Server resolve this query using an index seek operation:
SELECT Distinct LastName
WHERE LastName LIKE '[A-M]%sen'
In this T-SQL statement, Im looking for all the LastNames
that start with anything between an A through M and end with sen. Other
wild card syntax that identifies a specific set of characters can also invoke an
index scan operation to resolve a wild card search criteria.
Only Use DISTINCT If You Need It
Placing the DISTINCT clause on your SELECT statement takes
the results of your query and removes the duplicates. It takes SQL Server an
extra SORT operation to sort the data so it can identify and remove the
duplicates. Therefore, if you already know that results will not contain
duplicates than dont put the DISTINCT clause on your T-SQL statement. By
putting the DISTINCT clause on your query, you are telling SQL Server to
perform the sort and un-duplication process. This is extra work for SQL Server
and provides no value when your result set only contains unique sets of records
in the first place.
Only Use UNION If You Need It
The UNION clause removes duplicate records between the two
sets it is union-ing together. Just like in the prior example, in this
example the UNION operator requires a SORT Operation so SQL Server can remove
any duplicates. If you know that neither set contains duplicate sets of
records then the extra effort that a UNION requires to make SQL Server execute
a sort and un-duplication process is extra work that SQL Server doesnt need to
do. Therefore, when you need to use the UNION operator to concatenate two
record sets together, where both sets as a whole contain a unique set of data,
it is better to use the UNION ALL operator. The UNION ALL operator does not
remove duplicates therefore it uses less SQL Server backend processing to
perform the union operation because no sort/dedup process has to be executed.
Less work for SQL Server means your union operation will be performed quicker
using the UNION ALL operator.
Faster Code by Using Best Practices
There are many reasons to follow best practices. Some best
practices help your application code not break when schema changes are made,
while other best practices improve the performance of your queries, by
minimizing CPU, I/O and network bandwidth. This set of best practices covered
some simple things you can do to minimize the resources used by SQL Server when
processing your SQL Server statements.
See All Articles by Columnist Gregory A. Larsen