Introduction
SQL Server 7.0 uses the new join techniques to find the best available
plan for your queries. As a SQL Server DBA or programmer, you should
know these new join techniques to choose the appropriate join operation,
but in most cases, you can rely on SQL Server query optimizer decisions.
SQL Server 7.0 supports three types of join operations:
- Nested-Loop joins
- Merge joins
- Hash joins
In this article, I want to tell you about these join operations,
and when SQL Server query optimizer will use them.
Nested loops joins
When it is used, then one table will be selected as outer table and
the other as inner table. SQL Server will scan the outer table row
by row and for each row in the outer table, it will scan the inner
table, looking for matching rows.
Nested-Loop join is used, when one of the tables is small and the
other table has an index on the column that joins the tables.
There are three variants of the Nested-Loop join, according to
SQL Server Books Online:
- Naive Nested-loop join
- Index Nested-loop join
- Temporary index nested-loop join
Naive Nested-loop join is a search that scans an entire table or index.
Index Nested-loop join is a search that performs lookups in an index
to fetch rows.
Temporary index nested-loop join is a search that uses temporary index
(index was created by query optimizer and was destroyed when the query
was completed).
Because the query optimizer usually selects the best execution plan
for a given select statement, you can rely on SQL Server query optimizer
decisions, but you can set the desirable join type by yourself.
This is the example to enforce Nested-Loop join:
USE pubs
GO
SELECT au_fname, title_id FROM authors JOIN titleauthor
ON authors.au_id = titleauthor.au_id OPTION (LOOP JOIN)
GO
Merge joins
This is the most effective method to join the tables. The merge join
will be used, if both inputs are sorted on the merge columns.
The best case, if the tables have a clustered index on the column
that joins these tables.
For example, if you join Table1 table (contains n1 rows) with Table2
table (contains n2 rows), then in the worse case SQL Server query
optimizer will scan n1xn2 rows to return a result set (for each row
from the outer table the inner table will be completely scanned).
In the best case, if both tables have a clustered index on the column
that joins the tables, and there is one-to-many relationship between
these tables, then Merge join will be used, and SQL Server query
optimizer will scan only n1 + n2 rows to return a result set.
This is the description of the algorithm of the Merge join
(Courtesy of Alexander Chigrik
SQL Server 7.0: Merge Joins)
while (not Table1.eof) and (not Table2.eof) do
begin
while Table2.Table1_id > Table1.Table1_id do Table1.MoveToNextRecord();
value = Table1.Table1_id;
while Table2.Table1_id < value do Table2.MoveToNextRecord();
RID = Table1.RowID();
while Table2.Table1_id = value do
begin
while Table1.Table1_id = value do
begin
< SELECT Table1.Table1_id, Table2.Table1_id >
Table1.MoveToNextRecord();
end
Table1.MoveTo(RID);
Table2.MoveToNextRecord();
end
end
Because the query optimizer usually selects the best execution plan
for a given select statement, you can rely on SQL Server query optimizer
decisions, but you can set the desirable join type by yourself.
This is the example to enforce Merge join:
USE pubs
GO
SELECT au_fname, title_id FROM authors JOIN titleauthor
ON authors.au_id = titleauthor.au_id OPTION (MERGE JOIN)
GO
Hash joins
The Hash join is used in a worst situation: when there are no adequate
indexes on the join columns, and for large, unsorted inputs. Hash join
is made in two phases: build and probe, and has two inputs: the build
input and the probe input.
The smaller table will be the build input, the other table will be
the probe input. The column that joins the tables is called hash key.
On the build phase, hash table will be created by scanning each value
in the build input and applying the hashing algorithm to the key.
The hash table consists of linked lists called hash buckets. The result
of using a hash function on a hash key is called hash value.
On the probe phase, the query processor scans each row from the probe
input and computes the same hash value on the hash key to find any
matches in the corresponding hash bucket.
Hash join is most efficient, when one of the tables is significantly
differ in size than another one, and there are no adequate indexes
on the join columns.
This is the example to enforce Hash join:
USE pubs
GO
SELECT au_fname, title_id FROM authors JOIN titleauthor
ON authors.au_id = titleauthor.au_id OPTION (HASH JOIN)
GO