SQL Server 7.0 supports three types of join operations:
Nested-Loop joins
Merge joins
Hash joins
In this article I want to tell you about Merge joins and when SQL Server
will choose this kind of join operation.
This is a most effective method to join the tables. The merge join will
be used, if both inputs are sorted on the merge columns. The best way,
if the tables have a clustered index on the column that joins the tables.
If you have two tables (Table1 and Table2), where n1 – the count of rows
in the Table1 table, and n2 – the count of rows in the Table2 table, and
there is no index on the column that joins this tables, then in the worse
case (with Nested-Loop join) SQL Server will scan n1xn2 rows to return a
results set (for each row from the outer table the inner table will be
completely scanned).
In the best case, if Table1 table has a clustered index on the column that
joins the tables, and Table2 table has a clustered index on the column that
joins the tables, and there is one-to-many relationship between Table1 and
Table2, then Merge join will be used, and SQL Server will scan n1 + n2 rows
to return a results set.
This is the example:
if object_id(‘dbo.Table1’) is not null drop table Table1
GO
CREATE TABLE Table1 (Table1_id int primary key CLUSTERED, name char(10))
GO
if object_id(‘dbo.Table2’) is not null drop table Table2
GO
CREATE TABLE Table2 (
Table2_id int primary key NONCLUSTERED,
Table1_id int,
name char(10))
GO
CREATE CLUSTERED INDEX indTable2 ON Table2 (Table1_id)
GO
DECLARE @i int
SELECT @i = 1
WHILE @i < 1000
BEGIN
INSERT INTO Table1 VALUES (@i, LTRIM(str(@i)))
SELECT @i = @i + 1
END
GO
DECLARE @i int
SELECT @i = 1
WHILE @i < 1000
BEGIN
INSERT INTO Table2 VALUES (@i, @i, LTRIM(str(@i)))
SELECT @i = @i + 1
END
GO
SET SHOWPLAN_TEXT ON
GO
SELECT a.Table1_id, b.Table1_id FROM Table1 a INNER JOIN Table2 b
ON a.Table1_id = b.Table1_id
GO
SET SHOWPLAN_TEXT OFF
GO
|
These are the results:
StmtText
——————————————————————————————————————-
|–Merge Join(Inner Join, MERGE:([a].[Table1_id])=([b].[Table1_id]), RESIDUAL:([a].[Table1_id]=[b].[Table1_id]))
|–Clustered Index Scan(OBJECT:([Test].[dbo].[Table1].[PK__Table1__1F4E99FE] AS [a]), ORDERED)
|–Clustered Index Scan(OBJECT:([Test].[dbo].[Table2].[indTable2] AS [b]), ORDERED)
|
This is a algorithm of the Merge join (description of its work in general
case, for many-to-many relationship):
while (not Table1.eof) and (not Table2.eof) do
begin
while Table2.Table1_id > Table1.Table1_id do Table1.MoveToNextRecord();
value = Table1.Table1_id;
while Table2.Table1_id < value do Table2.MoveToNextRecord();
RID = Table1.RowID();
while Table2.Table1_id = value do
begin
while Table1.Table1_id = value do
begin
< SELECT Table1.Table1_id, Table2.Table1_id >
Table1.MoveToNextRecord();
end
Table1.MoveTo(RID);
Table2.MoveToNextRecord();
end
end
|
NOTE. If the joined tables are small (contain only one data page for
example), and at least one of the joined tables have index on the
column that joins the tables, then SQL Server will use Nested-Loop
join instead of Merge join or Hash join (usually).
Because the query optimizer usually selects the best execution plan
for a given select statement, it is not necessary to change the kind
of join, but sometimes it can be useful. You can enforce the desirable
join type with OPTION clause.
This is the example to enforce Merge join:
USE pubs
GO
SET SHOWPLAN_TEXT ON
GO
SELECT a.au_id FROM authors a JOIN titleauthor b
ON a.au_id = b.au_id OPTION (MERGE JOIN)
GO
SET SHOWPLAN_TEXT OFF
GO
|
These are the results:
StmtText
————————————————————————————————
SELECT a.au_id FROM authors a JOIN titleauthor b
ON a.au_id = b.au_id OPTION (MERGE JOIN)
(1 row(s) affected)
StmtText
—————————————————————————————————
|–Merge Join(Inner Join, MERGE:([a].[au_id])=([b].[au_id]), RESIDUAL:([a].[au_id]=[b].[au_id]))
|–Clustered Index Scan(OBJECT:([pubs].[dbo].[authors].[UPKCL_auidind] AS [a]), ORDERED)
|–Index Scan(OBJECT:([pubs].[dbo].[titleauthor].[auidind] AS [b]), ORDERED)
(3 row(s) affected)
|
»
See All Articles by Columnist Alexander Chigrik