SQL Server 2005 Integration Services – Part 38 – Pivot Transformation

Extensive ETL (Extraction, Transformation, and Loading) capabilities that
SQL Server 2005 Integration Services are based on, deliver such essential
functionality as the combination and cleanup of data originating from
heterogeneous sources or scheduling and coordination of activities that
frequently take place beyond the boundaries of database management systems.
While a substantial number of these features help with traditional database
administration tasks, there are also a few intended primarily for assisting
with data analysis. In this article, we will cover Pivot transformation, which
is one of the more popular choices in this category.

Pivot operation (just like its T-SQL equivalent) modifies the way in which a
recordset is presented; typically by rotating row data into columns, (SSIS also
offers Unpivot transformation, which reverses this process). Even though these
changes do not introduce any new data, they tend to enhance the ability to
analyze existing content by simplifying comparisons and uncovering less
apparent trends. In order to help you understand this concept, we will present
an example illustrating pivot operation. As our data source, we will use sample
spreadsheets available on the Microsoft Web site in the form of Excel
2002 Sample: PivotTable Reports
which you need to extract to an arbitrary
folder by running the downloadable Report.exe. The target location will host
SampleSalespersonReports.xls (which we will manipulate throughout the course of
this article), SampleProductReports.xls, SampleOrderReports.xls, and
SampleCustomerReports.xls Excel workbooks. Even though they were designed with
Excel Pivot Table functionality in mind, we will be able to leverage them for
the purpose of our demonstration.

The spreadsheet serving as our data source (‘Source Data’ in the
SampleSalespersonReports.xls) contains inventory of orders handled between July
2003 and May 2005 by nine salespeople located in the USA and the UK. Our
intention is to convert it into a recordset that would allow us to easily
determine the total value of orders for each salesperson during each year. More
specifically, we want the outcome to consist of five columns – Salesperson,
Country, 2003 Orders Amount, 2004 Orders Amount, and 2005 Orders Amount. Since
the values stored in the last three need to be calculated by adding individual
order amounts on per salesperson and per year basis, we will use the Derived
Column and Aggregate transformations for this purpose. Once the summarized data
is available (still in the original format), we will reorganize it by applying
Pivot. The final result will be saved in a spreadsheet by using the Excel
Destination Data Flow component.

To accomplish this, start by initiating a new project of Integration
Services type in the Business Intelligence Development Studio. Add to the newly
created project a Data Flow task (by dragging its icon from the Toolbox onto
Designer interface) and double-click on it to switch to its tabbed area. Create
Excel Source (listed under Data Flow sources in the Toolbox) and display its
Editor (by selecting the Edit… entry from its context sensitive menu). Within
the Editor window, click on the New… command button to provide parameters for
Connection Manager, pointing to SampleSalespersonReports.xls. Ensure that the
"Table or view" option appears in the "Data access mode"
listbox and pick ‘Source Data$’ as Name of the Excel sheet. Switch to the Column
section within the Editor window and mark Country, Salesperson, Order Date, and
Order Amount in the Available External Columns listing. Once you complete these
steps, click on the OK button to close the Editor window.

Next, drag Derived Column transform from the Toolbox and connect the output
of our Excel Source with its input. Launch its Editor window and define a new
derived column named Order Year, calculated using the YEAR([Order Date]) expression (set its
data type to two byte unsigned integer). Accept the changes by clicking on the
OK button. Once back to the Data Task tab, add Aggregate transform to the Data
Flow task area and drag the green arrow originating from the Derived Column to
its input. On the Aggregation tab of its Editor window, select Country,
Salesperson, Order Year and Order Amount in the Available Input Columns box and
ensure that the first three are listed with "Group by" operation and
the last one has "Sum" applied to it. Close the Editor window and
return to the Data Flow task tab.

Next transformation that needs to be included in our package is Pivot. Once
you have dropped it onto the Data Flow area from the Toolbox, connect the output
of Aggregate to its input and select Edit or Show Advanced Editor –
interestingly both present you with the same Advanced Editor for Pivot window.
Once there, review the Component Properties tab and switch to the Input Columns
tab. Ensure that all available input columns (Salesperson, Order Amount,
Country, and Order Year) are selected and switch to the Input and Output
Properties. This is where the majority of configuration takes place.

As mentioned before, our goal is to display the outcome in the specific
format, with three extra columns (2003 Orders Amount, 2004 Orders Amount, and
2005 Orders Amount), in addition to the two original ones – Salesperson and
Country. With the assistance of Derived Column and Aggregate, we have so far
managed to create a recordset with Salesperson, Country, Order Year, and Order
Amount fields, which contains the total amount of orders for a specific
salesperson in a given year, giving us 27 rows (9 salespeople times 3 years) –
down from 799 rows in the SampleSalespersonReports.xls spreadsheet. At this
point, we want to rearrange records in such way that instead of Order Year and
Order Amount columns, we will have three columns, one per each year covered by
our sales inventory (giving us a table with 9 rows and 5 columns – with a
single row for each salesperson) listing the amount of sales for an individual
salesperson in that year. According to pivot nomenclature, Salesperson and
Country function as SetKeys (values in these input columns identify records
that need to be grouped together in the same output row), Order Year serves the
role of the PivotKey (column which values are used to determine additional
columns in the resulting recordset), and Order Amount contains PivotedValues
(which are copied to the new columns created by pivot). Keep in mind that
entries in SetKey and PivotKey columns have to be unique on the per-row basis
(which is the case, since the data has been aggregated prior to applying the
pivot).

Continue our configuration by expanding the Pivot Default Input node, which
lists all input columns. For each, you need to define its role in the pivot
process, by setting the PivotUsage custom property, which can take on one of
the following values:

  • 0 – indicates that content of the column is simply copied to the
    output,

  • 1 – designates column participating in KeySet (this value should
    be assigned for Salesperson and Country),

  • 2 – identifies the PivotKey column (Order Year in our case),

  • 3 – intended for PivotedValues (Order Amount).

For all input columns, take a note of the values of their LineageID
property, since you will need to know them to proceed with the next step. Once
completed, switch to the Pivot Default Output node and create the following output
columns:

  • Salesperson – set its SourceColumn custom property to match the LineageID
    parameter of the Salesperson input column,

  • Country – set its SourceColumn custom property to match the LineageID
    parameter of the Country input column,

  • 2003 Orders – set its SourceColumn custom property to match the LineageID
    parameter of the Order Amount input column and its PivotKeyValue to the number
    2003 (needs to be equal to "2003" integer value in Order Year
    column),

  • 2004 Orders – set its SourceColumn custom property to match the LineageID
    parameter of the Order Amount input column and its PivotKeyValue to the number
    2004 (needs to be equal to "2004" integer value in Order Year
    column),

  • 2005 Orders – set its SourceColumn custom property to match the LineageID
    parameter of the Order Amount input column and its PivotKeyValue to the number
    2005 (needs to be equal to "2005" integer value in Order Year
    column).

Confirm your choices by clicking on the OK button and return to the Data
Flow tab area. To capture the results, create an Excel Destination, connect its
input with the output of the Pivot transformation and specify the target
spreadsheet by assigning appropriate values in its Excel Connection Manager.
The outcome should contain five columns and nine rows, listing aggregate order
values for each salesperson in each of the three years covered by the
SampleSalespersonReports.xls.

It is important to remember that correct output requires that SetKeys
entries containing identical values appear in adjacent input rows. In our
example, this was handled by the Aggregate component (grouping all records by
salesperson), however in cases where this operation is not needed, make sure
you introduce Sort transformation prior to performing pivot. Otherwise, you will
end up with a separate row for each non-adjacent value in the SortKey column
(and NULLs entries in some of pivoted columns for this row). For example, if
three rows for a given salesperson were not grouped together in our Pivot input
data, we would end up with three output rows sharing the same SetKey value
(i.e. for the total of 11 rows in the output recordset). One of them would
contain total sales in the 2003 Orders column as well as two NULLs under 2004
and 2005 Orders, while the remaining two would have a single value in the 2004
Orders and 2005 Orders columns, respectively (and NULLs in the other two
columns).

»


See All Articles by Columnist
Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles