SQL Server 2005 Integration Services – Part 43 – XML Task continued

In this article of our series dedicated to SQL Server 2005
Integration Services, we are continuing coverage of features provided by XML
Control Flow task. So far, we have discussed validate,
XSLT
, and XPATH
options (configured by assigning appropriate values to OperationType property
in the XML Task Editor window). The remaining features of SSIS XML Control Flow
task, which we will describe here, are merge, diff, and patch (as mentioned
earlier, we will include elementary information about each of these XML
operations necessary for basic understanding of their characteristics and
functionality, but we encourage readers to explore each of these topics in more
detail):

  • Merge – imagine a scenario, in which you want to collect
    inventory information by extracting content of individual records from an
    external source and appending each (in one-by-one fashion) to an existing XML
    file. If the input is formatted according to XML syntax (for example, when
    processing it with the XML Source Data Flow component), then choosing the Merge
    option as XML Task OperationType property will yield the required results. In
    order to familiarize yourself with details of this setup, launch Business
    Intelligence Development Studio, initiate a new project of Integration Services
    type, and, in its Control Flow Designer interface, create an instance of XML
    Task. Display its editor window (by selecting Edit… from its context
    sensitive menu) and set the OperationType property to Merge. Note that this
    selection changes the number of parameters you need to configure, adding to
    Source and SecondOperand items (representing, respectively, an existing XML
    document and an XML fragment to be merged) another one, which takes the form of
    XPath string and indicates where the merge operation should take place (for
    more information on the XPath, refer to the previous article of our
    series
    ). In our example, we will use sample data from the Microsoft Knowledge Base
    article 308066
    , with direct input option for each of the property values
    (for the sake of simplicity). Our Source will consists of:


    <?xml version=”1.0″?>
    <catalog>
    <book id=”bk101″>
    <author>Gambardella, Matthew</author>
    <title>XML Developer’s Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    </book>
    <book id=”bk102″>
    <author>Ralls, Kim</author>
    <title>Midnight Rain</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    </book>
    </catalog>

    with the SecondOperand equal to
    (note that you need to exclude the <?xml
    version="1.0"?>
    declaration):


    <book id=”bk106″>
    <author>Randall, Cynthia</author>
    <title>Lover Birds</title>
    <genre>Romance</genre>
    <price>4.95</price>
    </book>

    and XPathStringSource property
    assigned /catalog value. Set
    SaveOperationResult to True (so you can examine the outcome) and create a new
    File Connection Manager, pointing to a flat file (with arbitrarily chosen
    location and name) via the OperationResult property listbox. Once you execute
    the task, the file you designated in this manner should contain the following
    output:


    <?xml version=”1.0″?>
    <catalog>
    <book id=”bk101″>
    <author>Gambardella, Matthew</author>
    <title>XML Developer’s Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    </book>
    <book id=”bk102″>
    <author>Ralls, Kim</author>
    <title>Midnight Rain</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    </book>
    <book id=”bk106″>
    <author>Randall, Cynthia</author>
    <title>Lover Birds</title>
    <genre>Romance</genre>
    <price>4.95</price>
    </book>
    </catalog>

    We could use a similar approach if one of the
    entries in our original XML document was missing. For example, if the price of
    the book by Matthew Gambardella did not exist in the source, we could set the XPathStringSource
    property to /catalog/book[@id="bk101"]
    and the SecondOperand to <price>44.95</price>.

  • Diff – determines differences between two XML documents and
    records them in a file, called diffgram, formatted according to syntactical
    rules of the XML Diffgram standard (which we described briefly in one of our
    earlier articles Using
    DiffGrams for XML data modifications
    ). The resulting diffgram can be used
    to display differences in an easily readable format (in case you want to simply
    evaluate their scope) or to convert one of the documents into the other (this
    procedure, called patching, is implemented by the last of XML Task operation
    types that is described in the next section). To examine its operations, launch
    another Integration Services project in the Business Intelligence Development
    Studio and add an XML Task to its Control Flow Designer area. From the task’s
    context sensitive menu, select the Edit option to display its editor window.
    Choosing Diff as the value of the OperationType property will alter the
    appearance by adding the Diff Options section. Its content will allows you to
    manage following parameters:

    • DiffAlgorithm – takes on one of three possible values: Precise,
      Fast, and Auto – with the first one resulting in an in-depth, but potentially
      slow (when dealing with larger XML trees) comparison algorithm, the second
      performing more superficial, but faster analysis, and the third one choosing
      automatically betwen the first two based on the size of the XML document and a
      number of estimated changes.

    • DiffOptions – determines which items should be taken into
      consideration during a comparison between two XML documents, based on the boolean
      value assigned to properties such as IgnoreProcessingInstructions, IgnoreOrderOfChildElements,
      IgnoreXMLDeclarations, IgnoreNamespaces, IgnoreDTD, IgnoreWhiteSpaces, IgnoreComments,
      and IgnorePrefixes.

    • FailOnDifference – triggers a task failure if two XML documents
      being compared are different.

    • SaveDiffGram – dictates whether the outcome of the comparison
      should be saved.

    • DiffGramSave – with the SaveDiffGram property set to True, DiffGramSave
      gives you ability to define a File Connection or a variable where the resulting
      DiffGram will be stored.

    We will leave the DiffAlgorithm and DiffOptions
    properties at their default values (you might need to adjust them according to
    your requirements), set FailOnDifference to False, SaveDiffGram to True, and
    create a File Connection pointing to a text file (with arbitrary target
    location and name) that will contain the outcome of the comparison. As our
    Source and the SecondOperand (both of Direct input type) we will use two
    following sample files included in the documentation of The XML Diff and
    Patch GUI Tool
    , which you can leverage to gain better understanding of XML Diffgrams
    and Patching (note that its installation requires .NET Framework version
    1.1.4322):


    <?xml version=”1.0″?>
    <b>
    <a>Some text 1</a>
    <b>Some text 2</b>
    <c>Some text 3</c>
    <d>
    Another text
    <foo/>
    </d>
    <x firstAttr=”value1″ secondAttr=”value2″/>
    <y>
    <!–Any comments?–>
    <z id=”10″>Just another text</z>
    </y>
    </b>

    and


    <?xml version=”1.0″?>
    <b>
    <a>Some text 1</a>
    <b>Some text 2</b>
    <c>Some text 3</c>
    <e>Some text 4</e>
    <f>Some text 5</f>
    <d>Changed text</d>
    <x firstAttr=”changed attribute value” newAttr=”new value”/>
    <p>
    <q>
    <y>
    <!–Any comments?–>
    <z id=”10″>Just another text</z>
    </y>
    </q>
    </p>
    </b>

    Task execution will produce a diffgram in the
    following form (while its content looks rather cryptic, you will be able to
    appreciate its significance after reading through the next section discussing the
    XML patching operation):


    <?xml version=”1.0″ encoding=”utf-16″?>
    <xd:xmldiff version=”1.0″ srcDocHash=”5708212576896487287″ options=”None” fragments=”no”
    xmlns_xd=”http://www.microsoft.com/xmldiff”>
    <xd:node match=”2″>
    <xd:node match=”3″ />
    <xd:add>
    <e>Some text 4</e>
    <f>Some text 5</f>
    </xd:add>
    <xd:node match=”4″>
    <xd:change match=”1″>Changed text</xd:change>
    <xd:remove match=”2″ />
    </xd:node>
    <xd:node match=”5″>
    <xd:remove match=”@secondAttr” />
    <xd:add type=”2″ name=”newAttr”>new value</xd:add>
    <xd:change match=”@firstAttr”>changed attribute value</xd:change>
    </xd:node>
    <xd:remove match=”6″ opid=”1″ />
    <xd:add type=”1″ name=”p”>
    <xd:add type=”1″ name=”q”>
    <xd:add match=”/2/6″ opid=”1″ />
    </xd:add>
    </xd:add>
    </xd:node>
    <xd:descriptor opid=”1″ type=”move” />
    </xd:xmldiff>

  • Patch – applies an XML diffgram to an XML document (referenced by
    the values of the Source and SecondOperand properties in the XML Task Editor
    window), producing an XML document that was used to generate the diffgram (in
    other words, reverses the process implemented by the Diff operation). Patching
    is frequently used in scenarios that involve reproducing changes applied to the
    source document (simplifying deploying multiple, identical replicas of the
    modified XML document in a distributed environment). This means that applying the
    diffgram created by the Diff operation in the previous exercise to the first
    XML document (the Source) should result in output identical to the second (and
    vice versa – the second document can be used to produce the first). To
    demonstrate this, create a new project of Integration Services type in SQL
    Server 2005 Business Intelligence Development Studio and add an XML Task to its
    Control Flow area. In its Editor window, set OperationType to Patch, copy to
    its Source (with SourceType set to Direct input) the same XML document that was
    used in our Diff example, and configure a File Connection as the SecondOperand
    pointing to the file where diffgram from the previous exercise is stored.
    Ensure that value of SaveOperationResult is True and define a new File
    Connection as the destination for the task output. Once you execute the
    package, you should find there an XML document that looks the same as the other
    one (listed as the SecondOperand) appearing in the previous section.

With the description of three OperationTypes options listed
above, we conclude our coverage of the SQL Server 2005 Integration Services XML
Control Flow task.

»


See All Articles by Columnist
Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles