SQL Server 7.0: Merge Joins


SQL Server 7.0 supports three types of join operations:

  • Nested-Loop joins

  • Merge joins

  • Hash joins
  • In this article I want to tell you about Merge joins and when SQL Server
    will choose this kind of join operation.

    This is a most effective method to join the tables. The merge join will
    be used, if both inputs are sorted on the merge columns. The best way,
    if the tables have a clustered index on the column that joins the tables.

    If you have two tables (Table1 and Table2), where n1 – the count of rows
    in the Table1 table, and n2 – the count of rows in the Table2 table, and
    there is no index on the column that joins this tables, then in the worse
    case (with Nested-Loop join) SQL Server will scan n1xn2 rows to return a
    results set (for each row from the outer table the inner table will be
    completely scanned).

    In the best case, if Table1 table has a clustered index on the column that
    joins the tables, and Table2 table has a clustered index on the column that
    joins the tables, and there is one-to-many relationship between Table1 and
    Table2, then Merge join will be used, and SQL Server will scan n1 + n2 rows
    to return a results set.

    This is the example:


    if object_id(‘dbo.Table1’) is not null drop table Table1
    GO
    CREATE TABLE Table1 (Table1_id int primary key CLUSTERED, name char(10))
    GO
    if object_id(‘dbo.Table2’) is not null drop table Table2
    GO
    CREATE TABLE Table2 (
    Table2_id int primary key NONCLUSTERED,
    Table1_id int,
    name char(10))
    GO
    CREATE CLUSTERED INDEX indTable2 ON Table2 (Table1_id)
    GO

    DECLARE @i int
    SELECT @i = 1
    WHILE @i < 1000 BEGIN INSERT INTO Table1 VALUES (@i, LTRIM(str(@i))) SELECT @i = @i + 1 END GO DECLARE @i int SELECT @i = 1 WHILE @i < 1000 BEGIN INSERT INTO Table2 VALUES (@i, @i, LTRIM(str(@i))) SELECT @i = @i + 1 END GO SET SHOWPLAN_TEXT ON GO SELECT a.Table1_id, b.Table1_id FROM Table1 a INNER JOIN Table2 b ON a.Table1_id = b.Table1_id GO SET SHOWPLAN_TEXT OFF GO


    These are the results:


    StmtText
    ——————————————————————————————————————-
    |–Merge Join(Inner Join, MERGE:([a].[Table1_id])=([b].[Table1_id]), RESIDUAL:([a].[Table1_id]=[b].[Table1_id]))
    |–Clustered Index Scan(OBJECT:([Test].[dbo].[Table1].[PK__Table1__1F4E99FE] AS [a]), ORDERED)
    |–Clustered Index Scan(OBJECT:([Test].[dbo].[Table2].[indTable2] AS [b]), ORDERED)

    This is a algorithm of the Merge join (description of its work in general
    case, for many-to-many relationship):


    while (not Table1.eof) and (not Table2.eof) do
    begin
    while Table2.Table1_id > Table1.Table1_id do Table1.MoveToNextRecord();
    value = Table1.Table1_id;
    while Table2.Table1_id < value do Table2.MoveToNextRecord(); RID = Table1.RowID(); while Table2.Table1_id = value do begin while Table1.Table1_id = value do begin < SELECT Table1.Table1_id, Table2.Table1_id >
    Table1.MoveToNextRecord();
    end
    Table1.MoveTo(RID);
    Table2.MoveToNextRecord();
    end
    end

    NOTE. If the joined tables are small (contain only one data page for
    example), and at least one of the joined tables have index on the
    column that joins the tables, then SQL Server will use Nested-Loop
    join instead of Merge join or Hash join (usually).

    Because the query optimizer usually selects the best execution plan
    for a given select statement, it is not necessary to change the kind
    of join, but sometimes it can be useful. You can enforce the desirable
    join type with OPTION clause.

    This is the example to enforce Merge join:


    USE pubs
    GO
    SET SHOWPLAN_TEXT ON
    GO
    SELECT a.au_id FROM authors a JOIN titleauthor b
    ON a.au_id = b.au_id OPTION (MERGE JOIN)
    GO
    SET SHOWPLAN_TEXT OFF
    GO

    These are the results:


    StmtText
    ————————————————————————————————
    SELECT a.au_id FROM authors a JOIN titleauthor b
    ON a.au_id = b.au_id OPTION (MERGE JOIN)

    (1 row(s) affected)

    StmtText
    —————————————————————————————————
    |–Merge Join(Inner Join, MERGE:([a].[au_id])=([b].[au_id]), RESIDUAL:([a].[au_id]=[b].[au_id]))
    |–Clustered Index Scan(OBJECT:([pubs].[dbo].[authors].[UPKCL_auidind] AS [a]), ORDERED)
    |–Index Scan(OBJECT:([pubs].[dbo].[titleauthor].[auidind] AS [b]), ORDERED)

    (3 row(s) affected)



    »


    See All Articles by Columnist
    Alexander Chigrik

    Alexander Chigrik
    I am the owner of MSSQLCity.Com - a site dedicated to providing useful information for IT professionals using Microsoft SQL Server. This site contains SQL Server Articles, FAQs, Scripts, Tips and Test Exams.
    Previous articleMoving a Database
    Next articleSimple SQL: Pt. 2

    Latest Articles