SQL Server 7.0: Hash Joins


SQL Server 7.0 supports three types of join operations:

  • Nested-Loop joins

  • Merge joins

  • Hash joins
  • In this article I want to tell you about Hash joins, what kinds
    of Hash joins exist, and when SQL Server will choose this kind
    of join operation.

    The Hash join will be used, if there are no adequate indexes on
    the join columns. This is a worst situation. In that case, hash
    table will be created. Hash join is most efficient when one of
    the tables is significantly differ in size than another one.

    The query optimizer make a Hash join in two phases: build and
    probe. So Hash join has two inputs: the build input and the probe
    input.

    On the build phase, hash table will be created by scanning each
    value in the build input and applying the hashing algorithm to
    the key.

    Let me to describe Hash join on the example with two tables.

    Look at this example:


    if object_id(‘dbo.Table1’) is not null drop table Table1
    GO
    CREATE TABLE Table1 (id int, Name char(10))
    GO
    if object_id(‘dbo.Table2’) is not null drop table Table2
    GO
    CREATE TABLE Table1 (id int, Name char(20))
    GO

    DECLARE @i int
    SELECT @i = 1
    WHILE @i < 1000 BEGIN INSERT INTO Table1 VALUES (@i, LTRIM(str(@i))) SELECT @i = @i + 1 END GO DECLARE @i int SELECT @i = 1 WHILE @i < 1000 BEGIN INSERT INTO Table2 VALUES (@i, LTRIM(str(@i))) SELECT @i = @i + 1 END GO SET SHOWPLAN_TEXT ON GO SELECT a.Name FROM Table1 a INNER JOIN Table2 b ON a.Name = b.Name GO SET SHOWPLAN_TEXT OFF GO


    These are the results:


    StmtText
    ———————————————————————————————-
    |–Hash Match(Inner Join, HASH:([a].[Name])=([b].[Name]),
    RESIDUAL:([a].[Name]=[b].[Name]))
    |–Table Scan(OBJECT:([pubs].[dbo].[Table1] AS [a]))
    |–Table Scan(OBJECT:([pubs].[dbo].[Table2] AS [b]))

    The smaller table will be build input, the other – probe input.
    Field Name (column that joins the tables) is called hash key.
    The hash table consists of linked lists called hash buckets.
    The result of using a hash function on a hash key is called
    hash value. Hash value and RID (row identifier) will be placed
    into hash table.

    Hash value must be smaller than hash key. So, query processor
    economies on the size of the hash table. The real example of
    hashing is notebook. You can open notebook on the appropriate
    letter and scan all surnames on this letter to find necessary
    ones. So, notebook is the example of hash table, and pages on
    the appropriate letter is the example of hash bucket.

    During the probe phase, the entire probe input is scanned,
    and for each probe row computes the same hash value on the
    hash key to find any matches in the corresponding hash bucket.

    There are two main kinds of Hash join:

  • In-memory Hash join

  • Grace Hash join
  • In-memory Hash Join will be used if entire build input can be
    placed into memory.

    Grace Hash join will be used if your server have not enough
    memory to hold the entire build input. In that case, query
    processor proceeds Hash Join in several steps (hash table
    will be divided into multiple partitions and relevant
    partition will be loaded as need).

    Because the query optimizer usually selects the best execution plan
    for a given select statement, it is not necessary to change the kind
    of join, but sometimes it can be useful. You can enforce the desirable
    join type with OPTION clause.

    This is the example to enforce Hash join:


    USE pubs
    GO
    SET SHOWPLAN_TEXT ON
    GO
    SELECT a.au_id FROM authors a JOIN titleauthor b
    ON a.au_id = b.au_id OPTION (HASH JOIN)
    GO
    SET SHOWPLAN_TEXT OFF
    GO

    These are the results:


    StmtText
    ———————————————————————————————–
    SELECT a.au_id FROM authors a JOIN titleauthor b
    ON a.au_id = b.au_id OPTION (HASH JOIN)

    (1 row(s) affected)

    StmtText
    ————————————————————————————————–
    |–Hash Match(Inner Join, HASH:([a].[au_id])=([b].[au_id]),
    RESIDUAL:([a].[au_id]=[b].[au_id]))
    |–Index Scan(OBJECT:([pubs].[dbo].[authors].[aunmind] AS [a]))
    |–Index Scan(OBJECT:([pubs].[dbo].[titleauthor].[titleidind] AS
    [b]))

    (3 row(s) affected)



    »


    See All Articles by Columnist
    Alexander Chigrik

    Alexander Chigrik
    Alexander Chigrik
    I am the owner of MSSQLCity.Com - a site dedicated to providing useful information for IT professionals using Microsoft SQL Server. This site contains SQL Server Articles, FAQs, Scripts, Tips and Test Exams.

    Latest Articles