SQL Server 2005 Integration Services – Part 41 – XML Task

The importance of XML-based technologies is steadily increasing and this
trend is reflected by a variety of new features in this category introduced in SQL
Server 2005 Integration Services. One of the most prominent ones is the XML
Control Flow task, which provides assistance with processing XML-formatted
documents. In this installment of our series, we will describe its properties
and functionality. Note that in order to better understand each of the topics
presented here, you should familiarize yourself with basic XML-related
concepts. You can find their overview on Microsoft XML Development
Center Web site
(we also have discussed some
of them
in the context of SQL Server 2000).

In short, XML (an acronym for the term eXtensible Markup Language) serves as
an implementation-independent markup language, that can be easily employed in
cross-platform data exchanges. Among the most distinguishable characteristics
of XML-formatted documents is their tree-based, self-documenting structure,
consisting of multi-level nodes, representing primary syntactical XML
components, such as elements, attributes, and values (with the header
consisting of an optional XML declaration, which lists the XML version and
character encoding type, followed by XML namespace specifications and
stylesheet processing instructions). Formal definition of these components
(including permitted data types) is accomplished through an associated XML
schema. The most popular choice in this category is currently XML Schema
Definition (XSD), which left behind earlier contenders, such as the oldest and
widely supported (due to its inclusion in XML 1.0 standard), but lacking
support for a number of newer features (such as XML namespaces or custom data
types), Document Type Definition (DTD) or XML Data Reduced (XDR) schema (introduced
by Microsoft in their early implementations of Microsoft XML Core Services).

Due to importance and omnipresent nature of XML, its support has been
improved in SQL Server 2005 Integration Services (comparing with Data
Transformation Services in earlier versions of SQL Server). Even though
implementation of the XML Data Flow Destination is painfully absent (you have
to emulate it through scripting), you can take advantage of built-in the XML
Data Flow source, as well as versatile XML Control Flow task. To get familiar
with the latter, launch Business Intelligence Development Studio and initiate a
new Integration Services project. Drag XML Task from the Toolbox and drop it on
the Control Flow tab of the Designer interface. Launch its Editor window by selecting
the Edit… item from its context sensitive menu. The General area is divided
into several sections, although their exact content depends largely on the
value assigned to the OperationType property. Choosing one of its six possible
options (which you can view in the drop-down list in the Input section and
which we will be describing in more detail next) affects the range of the remaining
properties that need to be configured in order to deliver desired
functionality. All of the operations require two operands (where the first one
is referred to as Source and contains an original XML file against which
operations are carried out) with independently configurable types (in the majority
of cases, both of them can be either entered directly in the XML Task Editor
interface or stored in a file or a variable). We will start with two of them
(and cover the remaining four in the next article of this series):

  • Validate – its purpose is to verify syntactical
    correctness of an XML document by comparing its content against the XML Schema
    Definition (XSD) or Document Type Definition (DTD), depending on the value
    assigned to the ValidationType property (XSD or DTD). This comes in handy in
    situations where you want to load output of XML Data Flow Source into a data
    store (such as a database table), but you are not entirely certain that it
    follows the required specifications. For the sake of simplicity, we will test
    it using Direct input for both the source and the second operand (which, in
    this case, is the schema definition), based on the examples included in A Quick
    Guide to XML Schema
    on the MSDN Web site (which gives you a fairly
    comprehensive introduction to DTD and XSD schemas). We will start with the
    ones listed at the beginning of the section "Elements and Attributes".
    Our sample XML document will look as follows:

    <tns:employee xmlns_tns="http://example.org/employee/"
      tns_id="555-12-3434">
      <tns:name>Monica</tns:name>
      <tns:hiredate>1997-12-02</tns:hiredate>
      <tns:salary>42000.00</tns:salary>
    </tns:employee>
    

    Copy its content directly into the
    Source text box (using Source Editor) in the Input section of the XML Task
    Editor window, with SourceType set to Direct input. In the same window, set the
    ValidationType to XSD, assign Direct input value to the SecondOperandType
    property, and copy the following entries (representing XSD schema of the sample
    XML document) into the Second Operand text box:

    <xsd:schema xmlns_xsd="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://example.org/employee/">
      <xsd:element name="employee"/>
      <xsd:element name="name"/>
      <xsd:element name="hiredate"/>
      <xsd:element name="salary"/>
      <xsd:attribute name="id"/>
    </xsd:schema>
    

    Once you have the Source and SecondOperand
    properties configured, ensure that FailOnValidationFail is assigned True value
    (in order to detect whether validation has been successful). Execute the
    package, confirming whether our XML document conforms with the schema defined
    for it (which should be the case providing you copied their content correctly).

    Similarly, you can validate an equivalent
    (practically identical, with the exception of the DOCTYPE declaration) XML
    document that was written according to DTD specifications:

    <!DOCTYPE employee SYSTEM "employee.dtd">
    <tns:employee xmlns_tns="http://example.org/employee/"
      tns_id="555-12-3434">
      <tns:name>Monica</tns:name>
      <tns:hiredate>1997-12-02</tns:hiredate>
      <tns:salary>42000.00</tns:salary>
    </tns:employee>
    

    Enter its content into the Source
    textbox in the Input section of the XML Task Editor window. Next change the ValidationType
    property value to DTD and note that this will alter a number of available
    entries. In particular, you will no longer have the option to specify the
    second operand and its type (this will be accompanied by a validation warning
    stating that DTD Validation will rely on the DTD file defined in the DOCTYPE
    line in the XML document). To address this issue and provide DTD content (as
    the second operand for validation), copy the following DTD structure:

    <!-- employee.dtd -->
    <!ELEMENT employee (name, hiredate, salary)>
    <!ATTLIST employee
              id CDATA #REQUIRED>
    <!ELEMENT name (#PCDATA)>
    <!ELEMENT hiredate (#PCDATA)>
    <!ELEMENT salary (#PCDATA)>
    

    Create a text file (its location
    and name are arbitrary).  Point Flat File Connection Manager to it, with
    the name matching the one assigned to the DTD file in the DOCTYPE declaration
    of the XML document (i.e. "employee.dtd" in our case). This is
    accomplished by right-clicking on the tabbed area of Connection Managers and
    selecting the New Flat File Connection… entry. In the resulting dialog box,
    name the newly created connection manager as indicated (employee.dtd) and point
    to the DTD file using the Browse… button next to the File name label. Once
    you confirm that appropriate entries appear in the Columns section, close the
    Editor window and execute the package to verify that our XML document conforms
    to the DTD specifications.

  • XSLT (an acronym derived from the term eXtensible Stylesheet
    Language Transformation) – utilizes an XSLT stylesheet in order to transform a
    source XML document into a new format. This is performed routinely in order to
    change its visual appearance by adding HTML-specific tags (effectively producing
    an HTML web page) or modify its content by adding, removing or reordering its
    elements. The format of XSLT documents follow XML specifications, typically contain
    a number XSL processing statements (that result in appropriate HTML tags being
    added into the generated output), and employ XPath-based mechanisms in order to
    identify a portion of the XML document to which each tag should be applied (or
    which simply should be copied to the output without any changes).

    XPath (an abbreviation for the term XML Path
    Language) is a notation used for locating specific components within XML
    documents. By describing these components as nodes of a tree, XPath provides a
    way to traverse its structure between any of them. The root of this tree is
    designated by the "/" symbol. To reach a desired element (from the
    root or any other element of the same tree), you need to specify its path,
    consisting of all elements encountered on the way to it. While elements are
    referenced simply using their name, attributes are distinguished by the
    "@" prefix. Paths can be absolute (starting from the root – with the
    "/" as its first character) or relative (from one non-root element to
    another). XPath also includes constructs, that are common to programming
    languages, such as loops or conditional statements that allow you to expand and
    customize its capabilities. For more information regarding this topic
    (including some straightforward examples), refer to one of
    our earlier articles
    .

    We will review operations of XSLT functionality
    within the SSIS XML Task by taking advantage of an example posted in the Hello, World!
    (XSLT)
    article on the MSDN Web site (it is a well established tradition to
    use a very simple "Hello World" program as a way of helping coding
    neophytes shed their fears when entering the treacherous, yet extremely boring
    world of programming). The XML document that we will process has the following
    format:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="hello.xsl"?>
    <hello-world>   
       <greeter>An XSLT Programmer</greeter>   
       <greeting>Hello, World!</greeting>
    </hello-world>
    

    Return to the SSIS project we
    created earlier and bring up the XML Task Editor interface. Select the XSLT
    option as the value of the OperationType property in the Input section of the
    General node. Once this is completed, ensure that SourceType is set to Direct
    input and paste the content of the XML document listed above to the Source
    entry (via Source Editor dialog box). Next, choose Direct input as the value of
    the SecondOperandType and paste the following XSLT stylesheet as its content:

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns_xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      <xsl:template match="/hello-world">
        <HTML>
          <HEAD>
            <TITLE></TITLE>
          </HEAD>
          <BODY>
            <H1>
              <xsl:value-of select="greeting"/>
            </H1>
            <xsl:apply-templates select="greeter"/>
          </BODY>
        </HTML>
      </xsl:template>
      <xsl:template match="greeter">
        <DIV>from <I><xsl:value-of select="."/></I></DIV>
      </xsl:template>
    </xsl:stylesheet>
    

    Finally, in the Output section,
    under the OperationResult subnode, select File Connection as DestinationType
    and create a new File Connection Manager pointing to an HTML file (e.g.
    XMLTaskOutput.htm) and assign it to the Destination property (with DestinationType
    set to File Connection). Close the XML Task Editor window and execute the package.
    This should populate the destination HTML file with the following content:

    <HTML>
      <HEAD>
        <META http-equiv="Content-Type" content="text/html; charset=utf-8">
        <TITLE>
        </TITLE>
      </HEAD>
      <BODY>
        <H1>Hello, World!</H1>
        <DIV>from <I>An XSLT Programmer</I></DIV>
      </BODY>
    </HTML>
    

When viewed via Internet Explorer,
this displays the familiar "Hello, World!" message in bold font,
followed by "from An XSLT Programmer" line below.

In our next article, we will continue our review of the remaining types of
SSIS Control Flow XML Task operations.

»


See All Articles by Columnist
Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles