The importance of XML-based technologies is steadily increasing and this
trend is reflected by a variety of new features in this category introduced in SQL
Server 2005 Integration Services. One of the most prominent ones is the XML
Control Flow task, which provides assistance with processing XML-formatted
documents. In this installment of our series, we will describe its properties
and functionality. Note that in order to better understand each of the topics
presented here, you should familiarize yourself with basic XML-related
concepts. You can find their overview on Microsoft XML Development
Center Web site (we also have discussed some
of them in the context of SQL Server 2000).
In short, XML (an acronym for the term eXtensible Markup Language) serves as
an implementation-independent markup language, that can be easily employed in
cross-platform data exchanges. Among the most distinguishable characteristics
of XML-formatted documents is their tree-based, self-documenting structure,
consisting of multi-level nodes, representing primary syntactical XML
components, such as elements, attributes, and values (with the header
consisting of an optional XML declaration, which lists the XML version and
character encoding type, followed by XML namespace specifications and
stylesheet processing instructions). Formal definition of these components
(including permitted data types) is accomplished through an associated XML
schema. The most popular choice in this category is currently XML Schema
Definition (XSD), which left behind earlier contenders, such as the oldest and
widely supported (due to its inclusion in XML 1.0 standard), but lacking
support for a number of newer features (such as XML namespaces or custom data
types), Document Type Definition (DTD) or XML Data Reduced (XDR) schema (introduced
by Microsoft in their early implementations of Microsoft XML Core Services).
Due to importance and omnipresent nature of XML, its support has been
improved in SQL Server 2005 Integration Services (comparing with Data
Transformation Services in earlier versions of SQL Server). Even though
implementation of the XML Data Flow Destination is painfully absent (you have
to emulate it through scripting), you can take advantage of built-in the XML
Data Flow source, as well as versatile XML Control Flow task. To get familiar
with the latter, launch Business Intelligence Development Studio and initiate a
new Integration Services project. Drag XML Task from the Toolbox and drop it on
the Control Flow tab of the Designer interface. Launch its Editor window by selecting
the Edit… item from its context sensitive menu. The General area is divided
into several sections, although their exact content depends largely on the
value assigned to the OperationType property. Choosing one of its six possible
options (which you can view in the drop-down list in the Input section and
which we will be describing in more detail next) affects the range of the remaining
properties that need to be configured in order to deliver desired
functionality. All of the operations require two operands (where the first one
is referred to as Source and contains an original XML file against which
operations are carried out) with independently configurable types (in the majority
of cases, both of them can be either entered directly in the XML Task Editor
interface or stored in a file or a variable). We will start with two of them
(and cover the remaining four in the next article of this series):
-
Validate – its purpose is to verify syntactical
correctness of an XML document by comparing its content against the XML Schema
Definition (XSD) or Document Type Definition (DTD), depending on the value
assigned to the ValidationType property (XSD or DTD). This comes in handy in
situations where you want to load output of XML Data Flow Source into a data
store (such as a database table), but you are not entirely certain that it
follows the required specifications. For the sake of simplicity, we will test
it using Direct input for both the source and the second operand (which, in
this case, is the schema definition), based on the examples included in A Quick
Guide to XML Schema on the MSDN Web site (which gives you a fairly
comprehensive introduction to DTD and XSD schemas). We will start with the
ones listed at the beginning of the section "Elements and Attributes".
Our sample XML document will look as follows:<tns:employee xmlns_tns="http://example.org/employee/" tns_id="555-12-3434"> <tns:name>Monica</tns:name> <tns:hiredate>1997-12-02</tns:hiredate> <tns:salary>42000.00</tns:salary> </tns:employee>
Copy its content directly into the
Source text box (using Source Editor) in the Input section of the XML Task
Editor window, with SourceType set to Direct input. In the same window, set the
ValidationType to XSD, assign Direct input value to the SecondOperandType
property, and copy the following entries (representing XSD schema of the sample
XML document) into the Second Operand text box:<xsd:schema xmlns_xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://example.org/employee/"> <xsd:element name="employee"/> <xsd:element name="name"/> <xsd:element name="hiredate"/> <xsd:element name="salary"/> <xsd:attribute name="id"/> </xsd:schema>
Once you have the Source and SecondOperand
properties configured, ensure that FailOnValidationFail is assigned True value
(in order to detect whether validation has been successful). Execute the
package, confirming whether our XML document conforms with the schema defined
for it (which should be the case providing you copied their content correctly).Similarly, you can validate an equivalent
(practically identical, with the exception of the DOCTYPE declaration) XML
document that was written according to DTD specifications:<!DOCTYPE employee SYSTEM "employee.dtd"> <tns:employee xmlns_tns="http://example.org/employee/" tns_id="555-12-3434"> <tns:name>Monica</tns:name> <tns:hiredate>1997-12-02</tns:hiredate> <tns:salary>42000.00</tns:salary> </tns:employee>
Enter its content into the Source
textbox in the Input section of the XML Task Editor window. Next change the ValidationType
property value to DTD and note that this will alter a number of available
entries. In particular, you will no longer have the option to specify the
second operand and its type (this will be accompanied by a validation warning
stating that DTD Validation will rely on the DTD file defined in the DOCTYPE
line in the XML document). To address this issue and provide DTD content (as
the second operand for validation), copy the following DTD structure:<!-- employee.dtd --> <!ELEMENT employee (name, hiredate, salary)> <!ATTLIST employee id CDATA #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT hiredate (#PCDATA)> <!ELEMENT salary (#PCDATA)>
Create a text file (its location
and name are arbitrary). Point Flat File Connection Manager to it, with
the name matching the one assigned to the DTD file in the DOCTYPE declaration
of the XML document (i.e. "employee.dtd" in our case). This is
accomplished by right-clicking on the tabbed area of Connection Managers and
selecting the New Flat File Connection… entry. In the resulting dialog box,
name the newly created connection manager as indicated (employee.dtd) and point
to the DTD file using the Browse… button next to the File name label. Once
you confirm that appropriate entries appear in the Columns section, close the
Editor window and execute the package to verify that our XML document conforms
to the DTD specifications. -
XSLT (an acronym derived from the term eXtensible Stylesheet
Language Transformation) – utilizes an XSLT stylesheet in order to transform a
source XML document into a new format. This is performed routinely in order to
change its visual appearance by adding HTML-specific tags (effectively producing
an HTML web page) or modify its content by adding, removing or reordering its
elements. The format of XSLT documents follow XML specifications, typically contain
a number XSL processing statements (that result in appropriate HTML tags being
added into the generated output), and employ XPath-based mechanisms in order to
identify a portion of the XML document to which each tag should be applied (or
which simply should be copied to the output without any changes).XPath (an abbreviation for the term XML Path
Language) is a notation used for locating specific components within XML
documents. By describing these components as nodes of a tree, XPath provides a
way to traverse its structure between any of them. The root of this tree is
designated by the "/" symbol. To reach a desired element (from the
root or any other element of the same tree), you need to specify its path,
consisting of all elements encountered on the way to it. While elements are
referenced simply using their name, attributes are distinguished by the
"@" prefix. Paths can be absolute (starting from the root – with the
"/" as its first character) or relative (from one non-root element to
another). XPath also includes constructs, that are common to programming
languages, such as loops or conditional statements that allow you to expand and
customize its capabilities. For more information regarding this topic
(including some straightforward examples), refer to one of
our earlier articles.We will review operations of XSLT functionality
within the SSIS XML Task by taking advantage of an example posted in the Hello, World!
(XSLT) article on the MSDN Web site (it is a well established tradition to
use a very simple "Hello World" program as a way of helping coding
neophytes shed their fears when entering the treacherous, yet extremely boring
world of programming). The XML document that we will process has the following
format:<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="hello.xsl"?> <hello-world> <greeter>An XSLT Programmer</greeter> <greeting>Hello, World!</greeting> </hello-world>
Return to the SSIS project we
created earlier and bring up the XML Task Editor interface. Select the XSLT
option as the value of the OperationType property in the Input section of the
General node. Once this is completed, ensure that SourceType is set to Direct
input and paste the content of the XML document listed above to the Source
entry (via Source Editor dialog box). Next, choose Direct input as the value of
the SecondOperandType and paste the following XSLT stylesheet as its content:<?xml version="1.0"?> <xsl:stylesheet xmlns_xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/hello-world"> <HTML> <HEAD> <TITLE></TITLE> </HEAD> <BODY> <H1> <xsl:value-of select="greeting"/> </H1> <xsl:apply-templates select="greeter"/> </BODY> </HTML> </xsl:template> <xsl:template match="greeter"> <DIV>from <I><xsl:value-of select="."/></I></DIV> </xsl:template> </xsl:stylesheet>
Finally, in the Output section,
under the OperationResult subnode, select File Connection as DestinationType
and create a new File Connection Manager pointing to an HTML file (e.g.
XMLTaskOutput.htm) and assign it to the Destination property (with DestinationType
set to File Connection). Close the XML Task Editor window and execute the package.
This should populate the destination HTML file with the following content:<HTML> <HEAD> <META http-equiv="Content-Type" content="text/html; charset=utf-8"> <TITLE> </TITLE> </HEAD> <BODY> <H1>Hello, World!</H1> <DIV>from <I>An XSLT Programmer</I></DIV> </BODY> </HTML>
When viewed via Internet Explorer,
this displays the familiar "Hello, World!" message in bold font,
followed by "from An XSLT Programmer" line below.
In our next article, we will continue our review of the remaining types of
SSIS Control Flow XML Task operations.