Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Feb 5, 2001

Join Strategies in SQL Server 7

By Sergey Vartanyan

Introduction
Nested loops joins
Merge joins
Hash joins

Introduction

SQL Server 7.0 uses the new join techniques to find the best available plan for your queries. As a SQL Server DBA or programmer, you should know these new join techniques to choose the appropriate join operation, but in most cases, you can rely on SQL Server query optimizer decisions.

SQL Server 7.0 supports three types of join operations:

  • Nested-Loop joins
  • Merge joins
  • Hash joins

In this article, I want to tell you about these join operations, and when SQL Server query optimizer will use them.

Nested loops joins

When it is used, then one table will be selected as outer table and the other as inner table. SQL Server will scan the outer table row by row and for each row in the outer table, it will scan the inner table, looking for matching rows.

Nested-Loop join is used, when one of the tables is small and the other table has an index on the column that joins the tables.

There are three variants of the Nested-Loop join, according to SQL Server Books Online:

  • Naive Nested-loop join
  • Index Nested-loop join
  • Temporary index nested-loop join

Naive Nested-loop join is a search that scans an entire table or index.

Index Nested-loop join is a search that performs lookups in an index to fetch rows.

Temporary index nested-loop join is a search that uses temporary index (index was created by query optimizer and was destroyed when the query was completed).

Because the query optimizer usually selects the best execution plan for a given select statement, you can rely on SQL Server query optimizer decisions, but you can set the desirable join type by yourself.

This is the example to enforce Nested-Loop join:

USE pubs
GO
SELECT au_fname, title_id FROM authors JOIN titleauthor
  ON authors.au_id = titleauthor.au_id OPTION (LOOP JOIN)
GO

Merge joins

This is the most effective method to join the tables. The merge join will be used, if both inputs are sorted on the merge columns. The best case, if the tables have a clustered index on the column that joins these tables.

For example, if you join Table1 table (contains n1 rows) with Table2 table (contains n2 rows), then in the worse case SQL Server query optimizer will scan n1xn2 rows to return a result set (for each row from the outer table the inner table will be completely scanned).

In the best case, if both tables have a clustered index on the column that joins the tables, and there is one-to-many relationship between these tables, then Merge join will be used, and SQL Server query optimizer will scan only n1 + n2 rows to return a result set.

This is the description of the algorithm of the Merge join (Courtesy of Alexander Chigrik SQL Server 7.0: Merge Joins)

while (not Table1.eof) and (not Table2.eof) do
  begin
    while Table2.Table1_id > Table1.Table1_id do Table1.MoveToNextRecord();
    value = Table1.Table1_id;
    while Table2.Table1_id < value do Table2.MoveToNextRecord();
    RID = Table1.RowID();
    while Table2.Table1_id = value do
      begin
        while Table1.Table1_id = value do
          begin
            < SELECT Table1.Table1_id, Table2.Table1_id >
            Table1.MoveToNextRecord();
          end
        Table1.MoveTo(RID);
        Table2.MoveToNextRecord();
      end
  end

Because the query optimizer usually selects the best execution plan for a given select statement, you can rely on SQL Server query optimizer decisions, but you can set the desirable join type by yourself.

This is the example to enforce Merge join:

USE pubs
GO
SELECT au_fname, title_id FROM authors JOIN titleauthor
  ON authors.au_id = titleauthor.au_id OPTION (MERGE JOIN)
GO

Hash joins

The Hash join is used in a worst situation: when there are no adequate indexes on the join columns, and for large, unsorted inputs. Hash join is made in two phases: build and probe, and has two inputs: the build input and the probe input.

The smaller table will be the build input, the other table will be the probe input. The column that joins the tables is called hash key. On the build phase, hash table will be created by scanning each value in the build input and applying the hashing algorithm to the key.

The hash table consists of linked lists called hash buckets. The result of using a hash function on a hash key is called hash value.

On the probe phase, the query processor scans each row from the probe input and computes the same hash value on the hash key to find any matches in the corresponding hash bucket.

Hash join is most efficient, when one of the tables is significantly differ in size than another one, and there are no adequate indexes on the join columns.

This is the example to enforce Hash join:

USE pubs
GO
SELECT au_fname, title_id FROM authors JOIN titleauthor
  ON authors.au_id = titleauthor.au_id OPTION (HASH JOIN)
GO


MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM