An important checklist for a client server (VB - SQL Server) Project.
January 17, 1999
Ten nine eight seven only five days left for the launch of your death march project. When the time allocated for the project is in hours and then programmers are dragged, crying and kicking in to extra features and extended functionality, they have very less time to look for the speed limit, and they often cross orange lights while zooming towards the dead lines. If these points kept in prespective, it may save you some speeding tickets from the methodology police of the project.
Once the application is tested and about to go on line with real users, few of the things should be checked to make your application more robust, fast and sleek. In the development and testing phase you were more concerned with the applications basic working functionality and the bugs and debugging, that you have hardly any time to think about the other jazz and frills of the system. Check these nuts and bolts; if they are tightening properly your basic car can perform as a soupe up Porsche.
The first and foremost important thing for the horsepower and speed for your application is Indexing. An improper indexing is like driving your car with your hand breaks on; it can not be emphasized more. You have to pay close attention for the Indexing, Clustered or non-clustered Indexing. It is important to understand the difference between the two.
When a Clustered Index is created on a table, the data in the table is physically sorted in clustered Index order. SQL server allows only one clustered index per table because there is only one way physically to sort the data in the table.
A nonclustered index is a separate index structure independent of the physical sort order of the data. You can think of a nonclustered Index like an index on the back of a book. The key words are not located in any sorted order through out the book, actually they are spread quite randomly all over the book. To find a specific word, you look up the keyword in the index that contains all the key words in an alphabetically order. For the word you are looking for, you find a page number.
Indexes can help you speed up the data retrieval. However indexes has an adverse effect on the update performance because the indexes need to be maintained by SQL server to keep the proper sorted order. Before you get down to tuning the engine, you should know few of the subtle intricacies of the engine.
In Clustered Index, during inserts, the row must be inserted into the sorted order and at the appropriate location to maintain the sort order. This is an overhead on the system for insert operation.
Because clustered indexes maintain a sort order and insert data into the appropriate location with in a table, clustered indexes make use of free space in page throughout the table, resulting in less wasted space. They can help improve performance for certain types of queries, such as range retrieval and queries with order by. They take less space than non-clustered indexes defined on the same column. They provide a faster access path to the data than a similarly created non- clustered index, unless the non-clustered index can cover the query. Clustered indexes are updated less often that non- clustered index. Every time you perform some operation on a data row, you had to update the non-clustered indexes. However clustered indexes needed to be updated only if delete or modify the key field.
Most often clustered indexes are assign to the primary key of the table. This is ok if the primary key is the main access path for the that table, but other good candidates for the clustered indexes are, Range searches, columns containing number of duplicate values, columns frequently referenced in an order by query. Column other than primary key referenced in join clauses. In most applications, the primary key column on a table is almost always retrieve in a single row lookup. For single row lookups, a non-clustered index usually cost you only one more I/O more than a clustered index. The difference which you will never notice. Clustered indexes can improve performance for range retrieval because the clustered index can be used to set the bounds of a search, even if the query involves a large percentage of the rows in the table. Because the data is in sorted order. The same concept holds true for indexes on columns with a large number of duplicates. With a clustered index, the duplicate values are grouped together, minimize the number of pages that would need to be searched. Another good candidate for a clustered index is a column used frequently in queries for sorting the result set.
Also, when defining non-clustered indexes, you typically want to define indexes on columns with a low number of duplicates (that is, with low-density values) so that they can be used effectively by the optimizer. A high number of duplicates on a non-clustered index can often make it more expensive (in term of I/O) to process the query using the non-clustered index than a table scan.
If you have a search clause on the table that will be satisfied by a non-clustered index, but are ordering by the clustered index column, SQL server will need to use a worktable to sort the result. This is because the data in this case is being retrieved via non clustered index order, not clustered index order. Therefore rows arent sorted in the order specified by the order by clause when retrieved.
With or without a clustered index on a table, every time you insert, delete or update a row causing the row to move with in the table, all non clustered indexes on the table need to be updated to reflect the new row location. With a large number of indexes on a table, the overhead during data modification may become excessive.
The most important aspect to improving SQL server performance is proper indexing.
Now these facts can take you back to the indexes design board. You would like to reorganize you indexing. For e.g. Suppose if you have a form in which you have to see all the items from a inventory of millions items on some particular quality, or grade, it will be a good idea to create a clustered index on that field as the retrieval be really fast.
Another important thing to remember, take it with some salt. Of course indexes are taxing for the update and insert, but whom you would you like to wait an obscure data entry operator, whose second name no bodies knows or your boss, or the CEO of the company, who uses your system to check some reports.
You need not be good engineer to understand the proper oiling is very important for the functioning of your engine. Update statistics is like getting new oil for your car every after 2000 Mils. Run this command often, It will keep your index statistics uptodate.
If youve updated statistics for a table, and want the query optimizer to reexamine the new statistics and generate new query plans for all the stored procedure that reference that table, use the sp_recompile stored procedure e.g.
You can also use sp_recompile when an index has been added to a table and you want the optimizer to consider the new index and generate a new query plan for the stored procedure.
A good understanding of the SQL server query optimizer is needed to know how it uses indexes and index statistics to develop query plan.
You should have a good feel for a good SARG (Search Arguments). The general goal is to match a SARG with an index to avoid a table scan. A search argument is defined as a where clause comparing a column to a constant.
Valid operator for a SARG are any one of =,>,<,>= and <=. The inequality operator (!= or <> ) isnt a valid operator for a SARG. If you have an inequality operator, the optimizer ignores that as a search statement because it cant be used to match a value against an index. If you have a search clause containing an inequality operator, try to rewrite it as a SARG so that it can be recognized as a search argument by the query optimizer, for example, consider the following query:
Select title from titles where price != 0
Can be written as
Select title from titles where price > 0
Both will return the same result, but the second version has valid SARG that the optimizer will recognize and consider for matching with an index to satisfy the query. You have to keep an eye out for invalid SARGs. Look for the word reformatting in query execution plan.
For a big report writing a big query with many join and using many table can be more time consuming than filling the report and getting the other values from other table on the fly. Query written in development phase looks quite fast but in production can go flat.
On the first execution, the optimizer generates a query plan for all select statements in the stored procedure, regardless of the conditional branching. The query plan for each select statement is based upon the passed-in parameters.
Having distinctly different queries in the stored procedure could result in the wrong query plan being generated for one of the select statements, as its query plan might be based upon inappropriate values for that query.
A better approach would be to break the different select statements into two separate stored procedures and execute the appropriate stored procedure.
Always use servercursors that allow an application to fetch a small subset or block of rows from an arbitrarily large result set. But if all the results from a given SELECT statement are going to be used in your application, a server cursor is always going to be more expensive than a default result set. . If a particular task requests a single row by primary key, use a default result set. If another task requires an unpredictably large or updateable result set, use a server cursor and fetch rows in reasonably sized blocks (for example, one screen of rows at a time).
Commit transactional changes as soon as possible, Committing changes as quickly as possible means that locks are released as quickly as possible. Because SQL Server uses page locks for most operations, releasing locks quickly minimizes their impact on other users who may need access to rows on the same data pages. Locking mechanism is significantly changed in SQL 7, but still this advice will hold well.
Minimize network roundtrips to the server during a transaction. This "fat server" implementation is like adding 1000 horse power in your car, it will yield outstanding performance. Move more of the code from client to Server as stored Procedure.
Check constraints for ensuring that column values are valid. DEFAULT and NOT NULL constraints avoid the complexities caused by missing column values. Normalizing the data Removing the redundancies from the data.
Dont join more than 4 or 5 tables better still; break up the query in multiple queries.
The distinct keyword causes a worktable to be created in tempdb for sorting and removing of duplicate row, even if the unique index is present on the column. Avoid using distinct as much as possible, it do make a difference on the performance.
If you want to check the existence of a row in a table without needing to know the exact count, use the if exist statement, rather than count (*) . The if exist statement stop the processing as soon as it meet the matching row where as count (*) continues processing to look for all the matching rows.
Use if update (columnname) in stored procedures to save unnecessary processing in the table triggers. Your update trigger will fire every time, but you may want to do certain thing only if certain column has changed, so save all the processing time by testing the particular column change early in the trigger.
I am sure most of the programmers are already familiar with these points, but unfortunatily they can be easily overlooked. You got to check these points before you will begin your journey to best from better. Enjoy!