In this article of our series dedicated to SQL Server 2005
Integration Services, we are continuing coverage of features provided by XML
Control Flow task. So far, we have discussed validate,
XSLT, and XPATH
options (configured by assigning appropriate values to OperationType property
in the XML Task Editor window). The remaining features of SSIS XML Control Flow
task, which we will describe here, are merge, diff, and patch (as mentioned
earlier, we will include elementary information about each of these XML
operations necessary for basic understanding of their characteristics and
functionality, but we encourage readers to explore each of these topics in more
detail):
-
Merge – imagine a scenario, in which you want to collect
inventory information by extracting content of individual records from an
external source and appending each (in one-by-one fashion) to an existing XML
file. If the input is formatted according to XML syntax (for example, when
processing it with the XML Source Data Flow component), then choosing the Merge
option as XML Task OperationType property will yield the required results. In
order to familiarize yourself with details of this setup, launch Business
Intelligence Development Studio, initiate a new project of Integration Services
type, and, in its Control Flow Designer interface, create an instance of XML
Task. Display its editor window (by selecting Edit… from its context
sensitive menu) and set the OperationType property to Merge. Note that this
selection changes the number of parameters you need to configure, adding to
Source and SecondOperand items (representing, respectively, an existing XML
document and an XML fragment to be merged) another one, which takes the form of
XPath string and indicates where the merge operation should take place (for
more information on the XPath, refer to the previous article of our
series). In our example, we will use sample data from the Microsoft Knowledge Base
article 308066, with direct input option for each of the property values
(for the sake of simplicity). Our Source will consists of:
<?xml version=”1.0″?>
<catalog>
<book id=”bk101″>
<author>Gambardella, Matthew</author>
<title>XML Developer’s Guide</title>
<genre>Computer</genre>
<price>44.95</price>
</book>
<book id=”bk102″>
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
</book>
</catalog>
with the SecondOperand equal to
(note that you need to exclude the<?xml
declaration):
version="1.0"?>
<book id=”bk106″>
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
</book>
and XPathStringSource property
assigned/catalog
value. Set
SaveOperationResult to True (so you can examine the outcome) and create a new
File Connection Manager, pointing to a flat file (with arbitrarily chosen
location and name) via the OperationResult property listbox. Once you execute
the task, the file you designated in this manner should contain the following
output:
<?xml version=”1.0″?>
<catalog>
<book id=”bk101″>
<author>Gambardella, Matthew</author>
<title>XML Developer’s Guide</title>
<genre>Computer</genre>
<price>44.95</price>
</book>
<book id=”bk102″>
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
</book>
<book id=”bk106″>
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
</book>
</catalog>
We could use a similar approach if one of the
entries in our original XML document was missing. For example, if the price of
the book by Matthew Gambardella did not exist in the source, we could set the XPathStringSource
property to/catalog/book[@id="bk101"]
and the SecondOperand to<price>44.95</price>
. -
Diff – determines differences between two XML documents and
records them in a file, called diffgram, formatted according to syntactical
rules of the XML Diffgram standard (which we described briefly in one of our
earlier articles Using
DiffGrams for XML data modifications). The resulting diffgram can be used
to display differences in an easily readable format (in case you want to simply
evaluate their scope) or to convert one of the documents into the other (this
procedure, called patching, is implemented by the last of XML Task operation
types that is described in the next section). To examine its operations, launch
another Integration Services project in the Business Intelligence Development
Studio and add an XML Task to its Control Flow Designer area. From the task’s
context sensitive menu, select the Edit option to display its editor window.
Choosing Diff as the value of the OperationType property will alter the
appearance by adding the Diff Options section. Its content will allows you to
manage following parameters:-
DiffAlgorithm – takes on one of three possible values: Precise,
Fast, and Auto – with the first one resulting in an in-depth, but potentially
slow (when dealing with larger XML trees) comparison algorithm, the second
performing more superficial, but faster analysis, and the third one choosing
automatically betwen the first two based on the size of the XML document and a
number of estimated changes. -
DiffOptions – determines which items should be taken into
consideration during a comparison between two XML documents, based on the boolean
value assigned to properties such as IgnoreProcessingInstructions, IgnoreOrderOfChildElements,
IgnoreXMLDeclarations, IgnoreNamespaces, IgnoreDTD, IgnoreWhiteSpaces, IgnoreComments,
and IgnorePrefixes. -
FailOnDifference – triggers a task failure if two XML documents
being compared are different. -
SaveDiffGram – dictates whether the outcome of the comparison
should be saved. -
DiffGramSave – with the SaveDiffGram property set to True, DiffGramSave
gives you ability to define a File Connection or a variable where the resulting
DiffGram will be stored.
We will leave the DiffAlgorithm and DiffOptions
properties at their default values (you might need to adjust them according to
your requirements), set FailOnDifference to False, SaveDiffGram to True, and
create a File Connection pointing to a text file (with arbitrary target
location and name) that will contain the outcome of the comparison. As our
Source and the SecondOperand (both of Direct input type) we will use two
following sample files included in the documentation of The XML Diff and
Patch GUI Tool, which you can leverage to gain better understanding of XML Diffgrams
and Patching (note that its installation requires .NET Framework version
1.1.4322):
<?xml version=”1.0″?>
<b>
<a>Some text 1</a>
<b>Some text 2</b>
<c>Some text 3</c>
<d>
Another text
<foo/>
</d>
<x firstAttr=”value1″ secondAttr=”value2″/>
<y>
<!–Any comments?–>
<z id=”10″>Just another text</z>
</y>
</b>
and
<?xml version=”1.0″?>
<b>
<a>Some text 1</a>
<b>Some text 2</b>
<c>Some text 3</c>
<e>Some text 4</e>
<f>Some text 5</f>
<d>Changed text</d>
<x firstAttr=”changed attribute value” newAttr=”new value”/>
<p>
<q>
<y>
<!–Any comments?–>
<z id=”10″>Just another text</z>
</y>
</q>
</p>
</b>
Task execution will produce a diffgram in the
following form (while its content looks rather cryptic, you will be able to
appreciate its significance after reading through the next section discussing the
XML patching operation):
<?xml version=”1.0″ encoding=”utf-16″?>
<xd:xmldiff version=”1.0″ srcDocHash=”5708212576896487287″ options=”None” fragments=”no”
xmlns_xd=”http://www.microsoft.com/xmldiff”>
<xd:node match=”2″>
<xd:node match=”3″ />
<xd:add>
<e>Some text 4</e>
<f>Some text 5</f>
</xd:add>
<xd:node match=”4″>
<xd:change match=”1″>Changed text</xd:change>
<xd:remove match=”2″ />
</xd:node>
<xd:node match=”5″>
<xd:remove match=”@secondAttr” />
<xd:add type=”2″ name=”newAttr”>new value</xd:add>
<xd:change match=”@firstAttr”>changed attribute value</xd:change>
</xd:node>
<xd:remove match=”6″ opid=”1″ />
<xd:add type=”1″ name=”p”>
<xd:add type=”1″ name=”q”>
<xd:add match=”/2/6″ opid=”1″ />
</xd:add>
</xd:add>
</xd:node>
<xd:descriptor opid=”1″ type=”move” />
</xd:xmldiff>
-
DiffAlgorithm – takes on one of three possible values: Precise,
-
Patch – applies an XML diffgram to an XML document (referenced by
the values of the Source and SecondOperand properties in the XML Task Editor
window), producing an XML document that was used to generate the diffgram (in
other words, reverses the process implemented by the Diff operation). Patching
is frequently used in scenarios that involve reproducing changes applied to the
source document (simplifying deploying multiple, identical replicas of the
modified XML document in a distributed environment). This means that applying the
diffgram created by the Diff operation in the previous exercise to the first
XML document (the Source) should result in output identical to the second (and
vice versa – the second document can be used to produce the first). To
demonstrate this, create a new project of Integration Services type in SQL
Server 2005 Business Intelligence Development Studio and add an XML Task to its
Control Flow area. In its Editor window, set OperationType to Patch, copy to
its Source (with SourceType set to Direct input) the same XML document that was
used in our Diff example, and configure a File Connection as the SecondOperand
pointing to the file where diffgram from the previous exercise is stored.
Ensure that value of SaveOperationResult is True and define a new File
Connection as the destination for the task output. Once you execute the
package, you should find there an XML document that looks the same as the other
one (listed as the SecondOperand) appearing in the previous section.
With the description of three OperationTypes options listed
above, we conclude our coverage of the SQL Server 2005 Integration Services XML
Control Flow task.