SQL Server 2005 Integration Services - Part 43 - XML Task continued

February 26, 2007

In this article of our series dedicated to SQL Server 2005 Integration Services, we are continuing coverage of features provided by XML Control Flow task. So far, we have discussed validate, XSLT, and XPATH options (configured by assigning appropriate values to OperationType property in the XML Task Editor window). The remaining features of SSIS XML Control Flow task, which we will describe here, are merge, diff, and patch (as mentioned earlier, we will include elementary information about each of these XML operations necessary for basic understanding of their characteristics and functionality, but we encourage readers to explore each of these topics in more detail):

  • Merge - imagine a scenario, in which you want to collect inventory information by extracting content of individual records from an external source and appending each (in one-by-one fashion) to an existing XML file. If the input is formatted according to XML syntax (for example, when processing it with the XML Source Data Flow component), then choosing the Merge option as XML Task OperationType property will yield the required results. In order to familiarize yourself with details of this setup, launch Business Intelligence Development Studio, initiate a new project of Integration Services type, and, in its Control Flow Designer interface, create an instance of XML Task. Display its editor window (by selecting Edit... from its context sensitive menu) and set the OperationType property to Merge. Note that this selection changes the number of parameters you need to configure, adding to Source and SecondOperand items (representing, respectively, an existing XML document and an XML fragment to be merged) another one, which takes the form of XPath string and indicates where the merge operation should take place (for more information on the XPath, refer to the previous article of our series). In our example, we will use sample data from the Microsoft Knowledge Base article 308066, with direct input option for each of the property values (for the sake of simplicity). Our Source will consists of:

    <?xml version="1.0"?>
    <catalog>
       <book id="bk101">
          <author>Gambardella, Matthew</author>
          <title>XML Developer's Guide</title>
          <genre>Computer</genre>
          <price>44.95</price>
       </book>
       <book id="bk102">
          <author>Ralls, Kim</author>
          <title>Midnight Rain</title>
          <genre>Fantasy</genre>
          <price>5.95</price>
       </book>
    </catalog>
    

    with the SecondOperand equal to (note that you need to exclude the <?xml version="1.0"?> declaration):

    <book id="bk106">
          <author>Randall, Cynthia</author>
          <title>Lover Birds</title>
          <genre>Romance</genre>
          <price>4.95</price>
    </book>
    

    and XPathStringSource property assigned /catalog value. Set SaveOperationResult to True (so you can examine the outcome) and create a new File Connection Manager, pointing to a flat file (with arbitrarily chosen location and name) via the OperationResult property listbox. Once you execute the task, the file you designated in this manner should contain the following output:

    <?xml version="1.0"?>
    <catalog>
      <book id="bk101">
        <author>Gambardella, Matthew</author>
        <title>XML Developer's Guide</title>
        <genre>Computer</genre>
        <price>44.95</price>
      </book>
      <book id="bk102">
        <author>Ralls, Kim</author>
        <title>Midnight Rain</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
      </book>
      <book id="bk106">
        <author>Randall, Cynthia</author>
        <title>Lover Birds</title>
        <genre>Romance</genre>
        <price>4.95</price>
      </book>
    </catalog>
    

    We could use a similar approach if one of the entries in our original XML document was missing. For example, if the price of the book by Matthew Gambardella did not exist in the source, we could set the XPathStringSource property to /catalog/book[@id="bk101"] and the SecondOperand to <price>44.95</price>.

  • Diff - determines differences between two XML documents and records them in a file, called diffgram, formatted according to syntactical rules of the XML Diffgram standard (which we described briefly in one of our earlier articles Using DiffGrams for XML data modifications). The resulting diffgram can be used to display differences in an easily readable format (in case you want to simply evaluate their scope) or to convert one of the documents into the other (this procedure, called patching, is implemented by the last of XML Task operation types that is described in the next section). To examine its operations, launch another Integration Services project in the Business Intelligence Development Studio and add an XML Task to its Control Flow Designer area. From the task's context sensitive menu, select the Edit option to display its editor window. Choosing Diff as the value of the OperationType property will alter the appearance by adding the Diff Options section. Its content will allows you to manage following parameters:

    • DiffAlgorithm - takes on one of three possible values: Precise, Fast, and Auto - with the first one resulting in an in-depth, but potentially slow (when dealing with larger XML trees) comparison algorithm, the second performing more superficial, but faster analysis, and the third one choosing automatically betwen the first two based on the size of the XML document and a number of estimated changes.

    • DiffOptions - determines which items should be taken into consideration during a comparison between two XML documents, based on the boolean value assigned to properties such as IgnoreProcessingInstructions, IgnoreOrderOfChildElements, IgnoreXMLDeclarations, IgnoreNamespaces, IgnoreDTD, IgnoreWhiteSpaces, IgnoreComments, and IgnorePrefixes.

    • FailOnDifference - triggers a task failure if two XML documents being compared are different.

    • SaveDiffGram - dictates whether the outcome of the comparison should be saved.

    • DiffGramSave - with the SaveDiffGram property set to True, DiffGramSave gives you ability to define a File Connection or a variable where the resulting DiffGram will be stored.

    We will leave the DiffAlgorithm and DiffOptions properties at their default values (you might need to adjust them according to your requirements), set FailOnDifference to False, SaveDiffGram to True, and create a File Connection pointing to a text file (with arbitrary target location and name) that will contain the outcome of the comparison. As our Source and the SecondOperand (both of Direct input type) we will use two following sample files included in the documentation of The XML Diff and Patch GUI Tool, which you can leverage to gain better understanding of XML Diffgrams and Patching (note that its installation requires .NET Framework version 1.1.4322):

    <?xml version="1.0"?>
    <b>
      <a>Some text 1</a>
      <b>Some text 2</b>
      <c>Some text 3</c>
      <d>
        Another text
        <foo/>
      </d>
      <x firstAttr="value1" secondAttr="value2"/>
      <y>
        <!--Any comments?-->
        <z id="10">Just another text</z>
      </y>
    </b>
    

    and

    <?xml version="1.0"?>
    <b>
      <a>Some text 1</a>
      <b>Some text 2</b>
      <c>Some text 3</c>
      <e>Some text 4</e>
      <f>Some text 5</f>
      <d>Changed text</d>
      <x firstAttr="changed attribute value" newAttr="new value"/>
      <p>
        <q>
          <y>
            <!--Any comments?-->
            <z id="10">Just another text</z>
          </y>
        </q>
      </p>
    </b>
    

    Task execution will produce a diffgram in the following form (while its content looks rather cryptic, you will be able to appreciate its significance after reading through the next section discussing the XML patching operation):

    <?xml version="1.0" encoding="utf-16"?>
    <xd:xmldiff version="1.0" srcDocHash="5708212576896487287" options="None" fragments="no"
    xmlns:xd="http://www.microsoft.com/xmldiff">
    <xd:node match="2">
    <xd:node match="3" />
     <xd:add>
      <e>Some text 4</e>
      <f>Some text 5</f>
     </xd:add>
    <xd:node match="4">
     <xd:change match="1">Changed text</xd:change>
     <xd:remove match="2" />
    </xd:node>
    <xd:node match="5">
     <xd:remove match="@secondAttr" />
     <xd:add type="2" name="newAttr">new value</xd:add>
     <xd:change match="@firstAttr">changed attribute value</xd:change>
    </xd:node>
    <xd:remove match="6" opid="1" />
    <xd:add type="1" name="p">
     <xd:add type="1" name="q">
      <xd:add match="/2/6" opid="1" />
     </xd:add>
    </xd:add>
    </xd:node>
    <xd:descriptor opid="1" type="move" />
    </xd:xmldiff>
    
  • Patch - applies an XML diffgram to an XML document (referenced by the values of the Source and SecondOperand properties in the XML Task Editor window), producing an XML document that was used to generate the diffgram (in other words, reverses the process implemented by the Diff operation). Patching is frequently used in scenarios that involve reproducing changes applied to the source document (simplifying deploying multiple, identical replicas of the modified XML document in a distributed environment). This means that applying the diffgram created by the Diff operation in the previous exercise to the first XML document (the Source) should result in output identical to the second (and vice versa - the second document can be used to produce the first). To demonstrate this, create a new project of Integration Services type in SQL Server 2005 Business Intelligence Development Studio and add an XML Task to its Control Flow area. In its Editor window, set OperationType to Patch, copy to its Source (with SourceType set to Direct input) the same XML document that was used in our Diff example, and configure a File Connection as the SecondOperand pointing to the file where diffgram from the previous exercise is stored. Ensure that value of SaveOperationResult is True and define a new File Connection as the destination for the task output. Once you execute the package, you should find there an XML document that looks the same as the other one (listed as the SecondOperand) appearing in the previous section.

With the description of three OperationTypes options listed above, we conclude our coverage of the SQL Server 2005 Integration Services XML Control Flow task.

» See All Articles by Columnist Marcin Policht








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers