SqlCredit, Part 18: Exploring the Performance of SQL 2005’s OUTPUT Clause

Download the files for this article.

This is part 18 of a series. If you have not read part 17, you can find it here. That article is a good introduction to SQL Server 2005’s OUTPUT clause.

What We’re Testing

Part 17 introduced SQL Server 2005’s OUTPUT clause. Here, we will look at multiple ways to do the same thing. The scenario is that you have a pre-loaded staging table, and you need to load a destination table and a change-tracking “history” table. If there are n records in the staging table, you will load n records to the destination table and n records to the history table.

We will test multiple ways to accomplish this, two of which use the OUTPUT clause.

The Tables

For this test, we’ve made two main changes to the basic design from Part 17: added more columns and changed all tables to regular tables (not variables). There are now 15 bit columns. The only table that remains a table variable is the OUTPUT table variable (@insertedSet). We’ve also created two versions (“Base” and “Trig”) of the destination table so that the trigger added to DestinationTableTrig will not affect DestinationTableBase. The history tables also have two versions (“Base” and “Trig”) to allow a foreign key from the history table to the destination table.

  CREATE TABLE dbo.StagingTable (
    ColID   int  NOT NULL    IDENTITY,
    X16384  bit  NOT NULL,
    ...
    X00001  bit  NOT NULL
    
    CONSTRAINT PK_StagingTable
    PRIMARY KEY CLUSTERED (ColID)
  )
  
  CREATE DestinationTable[Base|Trig] TABLE (
    ColID   int  NOT NULL    IDENTITY,
    X16384  bit  NOT NULL,
    ...
    X00001  bit  NOT NULL,
    ComputedNum AS (
      (X16384 * 16384) +
      (X00001 *     1)
    ),
    CreateDate  datetime          NOT NULL
      DEFAULT GETDATE(),
    ColGUID     uniqueidentifier  NOT NULL
      DEFAULT NEWSEQUENTIALID(),
    
    CONSTRAINT PK_DestinationTable[Base|Trig]
    PRIMARY KEY CLUSTERED (ColID)
  )
  
  CREATE TABLE dbo.HistoryTable[Base|Trig] (
    HistoryID          int  NOT NULL
      IDENTITY(-1, -1),
    SourceID           int  NOT NULL,
    ComputedNum        int  NOT NULL,
    SourceCreateDate   datetime  NOT NULL,
    SourceGUID         uniqueidentifier  NOT NULL,
    HistoryCreateDate  datetime  NOT NULL
      DEFAULT GETDATE()
    
    CONSTRAINT PK_HistoryTable[Base|Trig]
    PRIMARY KEY CLUSTERED (SourceCreateDate, HistoryID)
  )
  GO
  ALTER TABLE ... ADD CONSTRAINT ...
  FOREIGN KEY(SourceID)
  REFERENCES dbo.DestinationTable[Base|Trig] (ColID)
  GO

The Tests

Set-up: Populating the Staging Table

Before starting any of the tests, we create the database, create the tables, create the trigger, create the stored procedures, and populate the staging table with n records.

Test 1a: Set-Based INSERTs Using OUTPUT (no TRAN)

For the first test, we use code based on Part 17’s OUTPUT code. Here is the basic code:

  INSERT INTO dbo.DestinationTableBase ...
  OUTPUT ... INTO <tableVariable>
  SELECT ...
  FROM dbo.StagingTable
  
  INSERT INTO dbo.HistoryTableBase ...
  SELECT ...
  FROM <tableVariable>

This inserts the whole set of records from the staging table within an implicit transaction and then writes the history table records from the OUTPUT table variable.

Test 1b: Set-Based INSERTs Using OUTPUT (TRAN)

Test 1a but with a BEGIN TRAN/COMMIT wrapping the two INSERT statements.

Test 2: Set-Based INSERTs Using an INSERT Trigger

For the second test, we add an INSERT/UPDATE trigger to the destination table (DestinationTableTrig). This simple trigger inserts into the history table based on the data inserted into the destination table. The guts of the trigger look like this:

  INSERT INTO HistoryTableTrig ...
  SELECT ...
  FROM Inserted

The INSERT code is then quite simple.

  INSERT INTO dbo.DestinationTableTrig ...
  SELECT ...
  FROM dbo.StagingTable

Test 3: Iterative INSERTs Using Straight SQL

For the third test, we’re using very 100-level SQL. A real beginner would probably bring up BOL and struggle through running this in a cursor, but using a counter and a WHILE loop is so much less painful. Here’s the basic plan:

  WHILE @ct < <StagingTable's recordCount> BEGIN
    INSERT INTO dbo.DestinationTableBase ...
    SELECT ...
    FROM dbo.StagingTable
    WHERE ColID = @ct
    
    SELECT @thisID = SCOPE_IDENTITY()
    
    INSERT INTO dbo.HistoryTableBase ...
    SELECT ...
    FROM dbo.DestinationTableBase
    WHERE ColID = @thisID
  END

Test 4: Iterative INSERTs Using an INSERT Trigger

The fourth test uses the trigger but iterates through the table one record at a time. The INSERT code is again fairly simple.

  WHILE @ct < <StagingTable's recordCount> BEGIN
    INSERT INTO dbo.DestinationTableTrig ...
    SELECT ...
    FROM dbo.StagingTable
    WHERE ColID = @ct
  END

Test 5a: Set-Based INSERTs Using OUTPUT in Stored Procedure (no TRAN)

The fifth test uses code similar to test 1, but inside of a stored procedure.

Test 5b: Set-Based INSERTs Using OUTPUT in Stored Procedure (TRAN)

Test 5a but with a BEGIN TRAN/COMMIT wrapping the two INSERT statements inside the stored procedure.

The Results

The tests were run with differing record counts in the staging table. The time to run each test was recorded.

 

 

Records:
512

Records:
32,768

Records:
1,048,576

Test

Tran

Elapsed ms

Elapsed ms

Elapsed ms

1a (Set-Based INSERTs, OUTPUT)

N

16

820

35,953

1b

Y

16

793

36,306

2 (Set-Based INSERTs, Trigger)

I*

30

593

19,923

3 (Iterative INSERTs, Straight SQL)

N

190

12,426

376,663

4 (Iterative INSERTs, Trigger)

I

140

8,016

244,296

5a (Set-Based INSERTs, OUTPUT, in SP)

N

16

1,056

29,370

5b

Y

13

730

27,830

* N: Multiple statements without an explicit transaction. Y: Multiple statements with an explicit transaction. I: Single statement with an implicit transaction.

Note that I ran the whole test script twice each time and only recorded the second set of timings.

Clear the Table Variable?

One unexpected bit I learned during this test is illustrated with this simple test.

  DECLARE @t1         TABLE (ColID tinyint IDENTITY, Col1 varchar(11))
  DECLARE @t1inserted TABLE (ColID tinyint,          Col1 varchar(11))
  
  -- Insert 'Hello World' once
  INSERT INTO @t1
  OUTPUT Inserted.ColID, Inserted.Col1 INTO @t1inserted
  VALUES ('Hello World')
  
  -- Clear the "base" table
  DELETE FROM @t1
  
  -- Insert 'Hello World' once
  INSERT INTO @t1
  OUTPUT Inserted.ColID, Inserted.Col1 INTO @t1inserted
  VALUES ('Hello World')
  
  SELECT * FROM @t1
  SELECT * FROM @t1inserted

Contents of @t1:

  ColID Col1
  ----- -----------
  2     Hello World

Contents of @t1inserted (the table variable):

  ColID Col1
  ----- -----------
  1     Hello World
  2     Hello World

Notice that the table variable retains its records from one call to the next. It makes perfect sense once you think about it, and it is the correct behavior, but you have to be aware of it when using the OUTPUT clause. You may need to clear the table variable at times to make sure it is empty before doing the next INSERT INTO … OUTPUT … INTO <tableVariable> call. I had to do that between calls in the test script.

The code for this simple test is in the attached HelloWorld_InsertMultiple.sql file.

Conclusion

The OUTPUT clause is very useful for pulling “automatic” values out of an INSERT statement (or UPDATE). What I was looking at here was how it performs compared to other options.

To no one’s surprise, the set-based solutions are far superior to the record-at-a-time solutions. As more and more records are involved, the record-at-a-time solutions get even worse.

It is important to see that wrapping the creation of the base records and the history records is faster using an explicit transaction (whether coded directly or wrapped in a stored procedure). It is also important to note that none of these solutions is as fast as a set-based trigger.

Many SQL Server developers believe that triggers are generally evil, and I tend to fall into that camp. There are two places where I think triggers are appropriate. One is the creation of history records, which is the exact case we have here. The other is for directly populating tables that are used to populate a data warehouse.

So, after building and optimizing the OUTPUT clause, the best-performing option for this particular scenario is a trigger. Whether you think it’s evil or not, it does perform well.

I may have missed some optimization here. Do you see something that would make the OUTPUT clause work as fast as the trigger? If so, please drop a comment in the forum

Download the files for this article.

» See All Articles by Columnist Rob Garrison

Rob Garrison
Rob Garrison
Rob Garrison is a database architect at Fiserv in Portland, Oregon specializing in OLTP design and development. He has over twenty years of IT experience, with over ten years in database development and architecture.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles