SQL Server 7.0: Merge Joins

August 31, 2000


SQL Server 7.0 supports three types of join operations:

  • Nested-Loop joins
  • Merge joins
  • Hash joins
  • In this article I want to tell you about Merge joins and when SQL Server will choose this kind of join operation.

    This is a most effective method to join the tables. The merge join will be used, if both inputs are sorted on the merge columns. The best way, if the tables have a clustered index on the column that joins the tables.

    If you have two tables (Table1 and Table2), where n1 - the count of rows in the Table1 table, and n2 - the count of rows in the Table2 table, and there is no index on the column that joins this tables, then in the worse case (with Nested-Loop join) SQL Server will scan n1xn2 rows to return a results set (for each row from the outer table the inner table will be completely scanned).

    In the best case, if Table1 table has a clustered index on the column that joins the tables, and Table2 table has a clustered index on the column that joins the tables, and there is one-to-many relationship between Table1 and Table2, then Merge join will be used, and SQL Server will scan n1 + n2 rows to return a results set.

    This is the example:

    if object_id('dbo.Table1') is not null drop table Table1
    GO
    CREATE TABLE Table1 (Table1_id int primary key CLUSTERED, name char(10))
    GO
    if object_id('dbo.Table2') is not null drop table Table2
    GO
    CREATE TABLE Table2 (
      Table2_id int primary key NONCLUSTERED, 
      Table1_id int,
      name char(10))
    GO
    CREATE CLUSTERED INDEX indTable2 ON Table2 (Table1_id)
    GO
    
    DECLARE @i int
    SELECT @i = 1
    WHILE @i < 1000
      BEGIN
        INSERT INTO Table1 VALUES (@i, LTRIM(str(@i)))
        SELECT @i = @i + 1
      END
    GO
    
    DECLARE @i int
    SELECT @i = 1
    WHILE @i < 1000
      BEGIN
        INSERT INTO Table2 VALUES (@i, @i, LTRIM(str(@i)))
        SELECT @i = @i + 1
      END
    GO
    
    SET SHOWPLAN_TEXT ON
    GO
    SELECT a.Table1_id, b.Table1_id FROM Table1 a INNER JOIN Table2 b
      ON a.Table1_id = b.Table1_id
    GO
    SET SHOWPLAN_TEXT OFF
    GO
    

    These are the results:

    StmtText
    -------------------------------------------------------------------------------------------------------------------
      |--Merge Join(Inner Join, MERGE:([a].[Table1_id])=([b].[Table1_id]), RESIDUAL:([a].[Table1_id]=[b].[Table1_id]))
           |--Clustered Index Scan(OBJECT:([Test].[dbo].[Table1].[PK__Table1__1F4E99FE] AS [a]), ORDERED)
           |--Clustered Index Scan(OBJECT:([Test].[dbo].[Table2].[indTable2] AS [b]), ORDERED)
    

    This is a algorithm of the Merge join (description of its work in general case, for many-to-many relationship):

    while (not Table1.eof) and (not Table2.eof) do
      begin
        while Table2.Table1_id > Table1.Table1_id do Table1.MoveToNextRecord();
        value = Table1.Table1_id;
        while Table2.Table1_id < value do Table2.MoveToNextRecord();
        RID = Table1.RowID();
        while Table2.Table1_id = value do
          begin
            while Table1.Table1_id = value do
              begin
                < SELECT Table1.Table1_id, Table2.Table1_id > 
                Table1.MoveToNextRecord();
              end
            Table1.MoveTo(RID);
            Table2.MoveToNextRecord();
          end
      end
    

    NOTE. If the joined tables are small (contain only one data page for example), and at least one of the joined tables have index on the column that joins the tables, then SQL Server will use Nested-Loop join instead of Merge join or Hash join (usually).

    Because the query optimizer usually selects the best execution plan for a given select statement, it is not necessary to change the kind of join, but sometimes it can be useful. You can enforce the desirable join type with OPTION clause.

    This is the example to enforce Merge join:

    USE pubs
    GO
    SET SHOWPLAN_TEXT ON
    GO
    SELECT a.au_id FROM authors a JOIN titleauthor b
       ON a.au_id = b.au_id OPTION (MERGE JOIN)
    GO
    SET SHOWPLAN_TEXT OFF
    GO
    

    These are the results:

    StmtText
    ------------------------------------------------------------------------------------------------
    SELECT a.au_id FROM authors a JOIN titleauthor b
       ON a.au_id = b.au_id OPTION (MERGE JOIN)
    
    (1 row(s) affected)
    
    StmtText
    ---------------------------------------------------------------------------------------------------
      |--Merge Join(Inner Join, MERGE:([a].[au_id])=([b].[au_id]), RESIDUAL:([a].[au_id]=[b].[au_id]))
           |--Clustered Index Scan(OBJECT:([pubs].[dbo].[authors].[UPKCL_auidind] AS [a]), ORDERED)
           |--Index Scan(OBJECT:([pubs].[dbo].[titleauthor].[auidind] AS [b]), ORDERED)
    
    (3 row(s) affected)
    


    » See All Articles by Columnist Alexander Chigrik









    The Network for Technology Professionals

    Search:

    About Internet.com

    Legal Notices, Licensing, Permissions, Privacy Policy.
    Advertise | Newsletters | E-mail Offers