SSIS Script Task and Microsoft Office Automation


While using Microsoft Office Excel to store a variety of semi-structured data tends to be more economical and convenient than SQL Server, it lacks data mining and data warehousing capabilities. This article explores importing spreadsheets onto SQL Server and taking advantage of Office Automation through managed code running inside a Script Task.

Despite the technological superiority of
the SQL Server product line, Microsoft Office Excel remains a
widely popular application used to store a variety of semi-structured data.
While this approach tends to be (at least in a short term) more economical and
convenient, it lacks mining and warehousing capabilities provided by database
management systems. These shortcomings can be remediated by importing
spreadsheets into a relational database. SQL Server 2008 Integration Services simplifies the
task of collating their content into the appropriate format, making them
suitable for transformation and loading. In this article, we will explore one
of several methods that provide this functionality, which takes advantage of Office Automation through managed
code running inside a Script
Task.

The features we are interested in are
exposed by Microsoft.Office.Interop.Excel namespace, incorporated
into the Microsoft
Office
installation and made available via the Office Primary Interop
Assemblies
download. They provide the ability to interact with Excel (as well as
other Microsoft
Office
applications) from within Visual Studio Tools for Applications. Since this
technique is intended primarily for client-side programming, you need to be
aware of its limitations (especially in regard to supportability) as well as
additional caveats that need to be taken into account when employing it to
implement server-based COM code running in
a security context of a non-interactive user (for example as a scheduled task
or a SQL Server
Agent
job). As the Microsoft
Technet
article, Considerations for
server-side Automation of Office
explains in detail, this
involves potentially unstable, insecure, and inconsistent behavior, related to
dependencies on such factors as user identity, interactivity with the desktop,
reentrancy, and scalability.

Keeping these warnings in mind and
heeding the consequences associated with them, let’s examine what type of
actions can be carried out by leveraging types and members of the Microsoft.Office.Interop.Excel
namespace. To illustrate these capabilities, we will present a generic
scenario, in which we enumerate Excel workbooks residing in an arbitrarily
chosen location, then identify their worksheets and named ranges for each of
them, and finally display the content of their individual cells. The code will
be implemented using Visual
Studio Tools for Applications accessible via SSIS Script Task.

To start, launch Business Intelligence
Development Studio
and create a new project based on the Integration Services template. Drag the Script Task icon from the Toolbox and drop it on
the Designer interface.
Display its Editor dialog box,
designate Visual
Basic .NET 2008
as the ScriptLanguage, and confirm
your choice by clicking on the OK
command button. Use the Variables window to
define a new variable (we will call it FolderName) of String type and set its value to the name of a
folder where the Excel spreadsheets
reside. Next, activate the Script Task Editor dialog box (accessible via the Edit… option in its
context sensitive menu), add the User::FolderName to the ReadOnlyVariables entry of the Script section, and
click on the Edit
Script…
button to display Visual
Studio Tools for Applications.

In order to facilitate interaction with
the Excel object model,
we need to add references to the relevant library (which you can identify by
referring to the Microsoft
Knowledge Base
article 219151
). To
accomplish this, select the Properties
entry from the context-sensitive menu of the project displayed in the Solution Explorer window within
the Visual
Studio Tools for Applications Tools interface. Switch to the References section, click
on the Add… command button,
display content of COM tab, highlight
the appropriate object library entry (which, in our example, happens to be Microsoft Excel 12.0 Object
Library,
corresponding to Excel
2007),
and confirm your choice by clicking on the OK command button. Once you return to the ScriptMain.vb window, add Imports
Microsoft.Office.Interop.Excel and Imports System.IO (used for file system operations)
statements to its (General)(Declarations) section.
Finally, copy the following code as the content of Public Sub Main():

Public Sub Main()
 
Dts.TaskResult = ScriptResults.Success
 
Dim strFile As String
Dim objExcelApp As New Microsoft.Office.Interop.Excel.Application
Dim objExcelWbk As Microsoft.Office.Interop.Excel.Workbook
Dim objExcelSheet As Microsoft.Office.Interop.Excel.Worksheet
Dim objExcelCell As Microsoft.Office.Interop.Excel.Range
Dim intSheet As Integer
Dim intRow As Integer
Dim intCol As Integer
 
Try
 
    Dim strFolder = Dts.Variables("FolderName").Value.ToString
    Dim colFiles As String() = Directory.GetFiles(strFolder, "*.xlsx")
 
    For Each strFile In colFiles
        MessageBox.Show(strFile)
        objExcelWbk = objExcelApp.Workbooks.Open(strFile)
        MessageBox.Show(objExcelWbk.Worksheets.Count)
        For intSheet = 1 To objExcelWbk.Worksheets.Count
            objExcelSheet = objExcelWbk.Worksheets(intSheet)
            MessageBox.Show(objExcelSheet.Name.ToString)
            For intRow = 1 To objExcelSheet.UsedRange.Rows.Count
               For intCol = 1 To objExcelSheet.UsedRange.Columns.Count
                   objExcelCell = objExcelSheet.UsedRange.Cells(intRow, intCol)
                   If Not (objExcelCell.Value = Nothing) Then
                       MessageBox.Show(objExcelCell.Value.ToString, "Content of cell " + _
                       objExcelWbk.Name.ToString + "." + objExcelSheet.Name.ToString + _
                       "." + intRow.ToString + ":" + intCol.ToString, MessageBoxButtons.OK)
                   End If
               Next
            Next
        Next
        objExcelWbk.Close(False)
    Next
    objExcelSheet = Nothing
    objExcelWbk = Nothing
 
Catch ex As Exception
 
    Dts.TaskResult = ScriptResults.Failure
    MessageBox.Show(ex.Message.ToString(), "Exception", MessageBoxButtons.OK)
 
Finally
 
    objExcelApp.Workbooks.Close()
    objExcelApp.Quit()
 
End Try
 
End Sub

Our script enumerates all files with the
extension .xlsx residing in the
folder, whose name is obtained by referencing the FolderName SSIS variable. For
each of them, we drill down through Workbooks and Worksheets collections (accessible via Microsoft.Office.Interop.Excel.Application namespace)
using their Open method, down to
the individual cell level (taking advantage of the Microsoft.Office.Interop.Excel.Range object, its Rows and Columns properties, as
well as their respective collections).

»


See All Articles by Columnist

Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles