Merge – imagine a scenario, in which you want to collect
inventory information by extracting content of individual records from an
external source and appending each (in one-by-one fashion) to an existing XML
file. If the input is formatted according to XML syntax (for example, when
processing it with the XML Source Data Flow component), then choosing the Merge
option as XML Task OperationType property will yield the required results. In
order to familiarize yourself with details of this setup, launch Business
Intelligence Development Studio, initiate a new project of Integration Services
type, and, in its Control Flow Designer interface, create an instance of XML
Task. Display its editor window (by selecting Edit… from its context
sensitive menu) and set the OperationType property to Merge. Note that this
selection changes the number of parameters you need to configure, adding to
Source and SecondOperand items (representing, respectively, an existing XML
document and an XML fragment to be merged) another one, which takes the form of
XPath string and indicates where the merge operation should take place (for
more information on the XPath, refer to the previous article of our
series). In our example, we will use sample data from the Microsoft Knowledge Base
article 308066, with direct input option for each of the property values
(for the sake of simplicity). Our Source will consists of:
<?xml version=”1.0″?>
<catalog>
<book id=”bk101″>
<author>Gambardella, Matthew</author>
<title>XML Developer’s Guide</title>
<genre>Computer</genre>
<price>44.95</price>
</book>
<book id=”bk102″>
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
</book>
</catalog>
with the SecondOperand equal to
(note that you need to exclude the <?xml
version="1.0"?> declaration):
<book id=”bk106″>
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
</book>
and XPathStringSource property
assigned /catalog value. Set
SaveOperationResult to True (so you can examine the outcome) and create a new
File Connection Manager, pointing to a flat file (with arbitrarily chosen
location and name) via the OperationResult property listbox. Once you execute
the task, the file you designated in this manner should contain the following
output:
<?xml version=”1.0″?>
<catalog>
<book id=”bk101″>
<author>Gambardella, Matthew</author>
<title>XML Developer’s Guide</title>
<genre>Computer</genre>
<price>44.95</price>
</book>
<book id=”bk102″>
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
</book>
<book id=”bk106″>
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
</book>
</catalog>
We could use a similar approach if one of the
entries in our original XML document was missing. For example, if the price of
the book by Matthew Gambardella did not exist in the source, we could set the XPathStringSource
property to /catalog/book[@id="bk101"]
and the SecondOperand to <price>44.95</price>.
Diff – determines differences between two XML documents and
records them in a file, called diffgram, formatted according to syntactical
rules of the XML Diffgram standard (which we described briefly in one of our
earlier articles Using
DiffGrams for XML data modifications). The resulting diffgram can be used
to display differences in an easily readable format (in case you want to simply
evaluate their scope) or to convert one of the documents into the other (this
procedure, called patching, is implemented by the last of XML Task operation
types that is described in the next section). To examine its operations, launch
another Integration Services project in the Business Intelligence Development
Studio and add an XML Task to its Control Flow Designer area. From the task’s
context sensitive menu, select the Edit option to display its editor window.
Choosing Diff as the value of the OperationType property will alter the
appearance by adding the Diff Options section. Its content will allows you to
manage following parameters:
-
DiffAlgorithm – takes on one of three possible values: Precise,
Fast, and Auto – with the first one resulting in an in-depth, but potentially
slow (when dealing with larger XML trees) comparison algorithm, the second
performing more superficial, but faster analysis, and the third one choosing
automatically betwen the first two based on the size of the XML document and a
number of estimated changes.
-
DiffOptions – determines which items should be taken into
consideration during a comparison between two XML documents, based on the boolean
value assigned to properties such as IgnoreProcessingInstructions, IgnoreOrderOfChildElements,
IgnoreXMLDeclarations, IgnoreNamespaces, IgnoreDTD, IgnoreWhiteSpaces, IgnoreComments,
and IgnorePrefixes.
-
FailOnDifference – triggers a task failure if two XML documents
being compared are different.
-
SaveDiffGram – dictates whether the outcome of the comparison
should be saved.
-
DiffGramSave – with the SaveDiffGram property set to True, DiffGramSave
gives you ability to define a File Connection or a variable where the resulting
DiffGram will be stored.
We will leave the DiffAlgorithm and DiffOptions
properties at their default values (you might need to adjust them according to
your requirements), set FailOnDifference to False, SaveDiffGram to True, and
create a File Connection pointing to a text file (with arbitrary target
location and name) that will contain the outcome of the comparison. As our
Source and the SecondOperand (both of Direct input type) we will use two
following sample files included in the documentation of The XML Diff and
Patch GUI Tool, which you can leverage to gain better understanding of XML Diffgrams
and Patching (note that its installation requires .NET Framework version
1.1.4322):
<?xml version=”1.0″?>
<b>
<a>Some text 1</a>
<b>Some text 2</b>
<c>Some text 3</c>
<d>
Another text
<foo/>
</d>
<x firstAttr=”value1″ secondAttr=”value2″/>
<y>
<!–Any comments?–>
<z id=”10″>Just another text</z>
</y>
</b>
and
<?xml version=”1.0″?>
<b>
<a>Some text 1</a>
<b>Some text 2</b>
<c>Some text 3</c>
<e>Some text 4</e>
<f>Some text 5</f>
<d>Changed text</d>
<x firstAttr=”changed attribute value” newAttr=”new value”/>
<p>
<q>
<y>
<!–Any comments?–>
<z id=”10″>Just another text</z>
</y>
</q>
</p>
</b>
Task execution will produce a diffgram in the
following form (while its content looks rather cryptic, you will be able to
appreciate its significance after reading through the next section discussing the
XML patching operation):
<?xml version=”1.0″ encoding=”utf-16″?>
<xd:xmldiff version=”1.0″ srcDocHash=”5708212576896487287″ options=”None” fragments=”no”
xmlns_xd=”http://www.microsoft.com/xmldiff”>
<xd:node match=”2″>
<xd:node match=”3″ />
<xd:add>
<e>Some text 4</e>
<f>Some text 5</f>
</xd:add>
<xd:node match=”4″>
<xd:change match=”1″>Changed text</xd:change>
<xd:remove match=”2″ />
</xd:node>
<xd:node match=”5″>
<xd:remove match=”@secondAttr” />
<xd:add type=”2″ name=”newAttr”>new value</xd:add>
<xd:change match=”@firstAttr”>changed attribute value</xd:change>
</xd:node>
<xd:remove match=”6″ opid=”1″ />
<xd:add type=”1″ name=”p”>
<xd:add type=”1″ name=”q”>
<xd:add match=”/2/6″ opid=”1″ />
</xd:add>
</xd:add>
</xd:node>
<xd:descriptor opid=”1″ type=”move” />
</xd:xmldiff>