XML has become a common form of representing and exchanging data in today’s information age. SQL Server introduced XML-centric capabilities in SQL Server 2000. That functionality has been expanded in later releases. One aspect of working with XML is creating XML from relational data, which is accomplished utilizing the FOR XML clause in SQL Server.
Overview
FOR XML is a clause that
can be appended to the end of a standard SELECT statement to cause the output to
be represented as xml. The result is a Unicode string containing elements and
attributes as determined by the mode specified in the clause. The four modes
are RAW, AUTO, EXPLICIT, and PATH. The following is the Books
Online syntax for the FOR XML Clause.
[ FOR { BROWSE | <XML> } ] <XML> ::= XML { { RAW [ ('ElementName') ] | AUTO } [ <CommonDirectives> [ , { XMLDATA | XMLSCHEMA [ ('TargetNameSpaceURI') ]} ] [ , ELEMENTS [ XSINIL | ABSENT ] ] | EXPLICIT [ <CommonDirectives> [ , XMLDATA ] ] | PATH [ ('ElementName') ] [ <CommonDirectives> [ , ELEMENTS [ XSINIL | ABSENT ] ] ] } <CommonDirectives> ::= [ , BINARY BASE64 ] [ , TYPE ] [ , ROOT [ ('RootName') ] ]
This syntax shows each of
the four modes and the options available with those modes. Let’s have a look at
each of the modes in turn.
FOR XML RAW
The most basic and
straight forward mode for creating XML with FOR XML is RAW. RAW spits out an
element named "row" for each row of the result set from the SELECT
statement. Each column of the row is created as an attribute of the row
element.
To show you what this
would look like, we are going to use the AdventureWorks sample database. Below
we have a standard SELECT statement for retrieving employee address information
from this database. At the bottom of the statement, you’ll notice the FOR XML
clause and that it is using RAW mode. This results in a Unicode string that
contains a couple hundred row elements. I’m going to show you just the first
four because that’s enough to make the point.
SELECT e.EmployeeID, FirstName, LastName, AddressLine1, City, sp.Name as 'StateOrProvince', PostalCode FROM HumanResources.Employee e(NOLOCK) JOIN Person.Contact c (NOLOCK) ON e.ContactID = c.ContactID JOIN HumanResources.EmployeeAddress ea(NOLOCK) ON e.EmployeeID = ea.EmployeeID JOIN Person.Address a (NOLOCK) ON ea.AddressID = a.AddressID JOIN Person.StateProvince sp (NOLOCK) ON a.StateProvinceID = sp.StateProvinceID FOR XML RAW Results: <row EmployeeID="1" FirstName="Guy" LastName="Gilbert" AddressLine1="7726 Driftwood Drive" City="Monroe" StateOrProvince="Washington" PostalCode="98272" /> <row EmployeeID="2" FirstName="Kevin" LastName="Brown" AddressLine1="7883 Missing Canyon Court" City="Everett" StateOrProvince="Washington" PostalCode="98201" /> <row EmployeeID="3" FirstName="Roberto" LastName="Tamburello" AddressLine1="2137 Birchwood Dr" City="Redmond" StateOrProvince="Washington" PostalCode="98052" /> <row EmployeeID="4" FirstName="Rob" LastName="Walters" AddressLine1="5678 Lakeview Blvd." City="Minneapolis" StateOrProvince="Minnesota" PostalCode="55402" />
Notice that each row of
the results of the SELECT statement is rather generically assigned to a row
element in the XML and each column of the result set is assigned to an
attribute of that row. This is a rather quick and dirty XML representation of
the data as it gives you no idea how that data is related hierarchically.
In the syntax of FOR XML,
RAW mode has a couple of options of note to give you slightly more control over
the styling of the resulting xml. Should you decide that you don’t like that
row element, you can supply a name for it in parentheses and quotes directly
following the keyword RAW. Additionally, you can have a schema generated for
your XML using the XMLDATA | XMLSCHEMA options. Not fond of attributes? Specify
the ELEMENTS option to have each column generate as a new child element of the
row. Note that if any column of your result set has a null value and you wish
to have an element for it anyway, you will need to specify XSINIL following the
ELEMENTS option.
Each of the modes also
includes the common directives of BINARY BASE64, TYPE, and ROOT. BINARY BASE 64
is used to direct SQL Server to render binary columns of the result set as
base-64 encoding. This is implicit for some of the modes, but mandatory for RAW
mode if you are retrieving binary data. TYPE instructs SQL Server to return the
XML as an xml data type instead of a string. Finally, ROOT is used to specify a
root node and optional name.
Here’s an example of a few
of the directives and options. This is the same select from above only with the
option to rename the row element, the directive to supply a named root element,
and the ELEMENTS option to specify each non-null column to be rendered as a
child element of the row. I’ve adjusted the results to just show 2 of the 290
resulting rows.
SELECT e.EmployeeID, FirstName, LastName, AddressLine1, City, sp.Name AS 'StateOrProvince', PostalCode FROM HumanResources.Employee e(NOLOCK) JOIN Person.Contact c (NOLOCK) ON e.ContactID = c.ContactID JOIN HumanResources.EmployeeAddress ea(NOLOCK) ON e.EmployeeID = ea.EmployeeID JOIN Person.Address a (NOLOCK) ON ea.AddressID = a.AddressID JOIN Person.StateProvince sp (NOLOCK) ON a.StateProvinceID = sp.StateProvinceID FOR XML RAW ('Employee'), ROOT ('Employees'), ELEMENTS
Result :
<Employees> <Employee> <EmployeeID>1</EmployeeID> <FirstName>Guy</FirstName> <LastName>Gilbert</LastName> <AddressLine1>7726 Driftwood Drive</AddressLine1> <City>Monroe</City> <StateOrProvince>Washington</StateOrProvince> <PostalCode>98272</PostalCode> </Employee> <Employee> <EmployeeID>2</EmployeeID> <FirstName>Kevin</FirstName> <LastName>Brown</LastName> <AddressLine1>7883 Missing Canyon Court</AddressLine1> <City>Everett</City> <StateOrProvince>Washington</StateOrProvince> <PostalCode>98201</PostalCode> </Employee> </Employees>
FOR XML AUTO
The biggest difference
between RAW and AUTO mode is that AUTO mode provides for a level of
hierarchical meaning to the style of the XML. In other words, instead of having
columns in the result set all as attributes of the row or even subelements all
on the same level, columns from different tables become nested elements based
on the order of the columns in the select.
Below we take the same
example query from above and replace RAW with AUTO. The row element name cannot
be specified with AUTO (only with RAW and PATH), so that has been removed. I
also took off ELEMENTS even though it is valid here so it wouldn’t obscure the
differences in how AUTO renders the attributes and elements by default.
SELECT e.EmployeeID, FirstName, LastName, AddressLine1, City, sp.Name as 'StateOrProvince', PostalCode FROM HumanResources.Employee e(NOLOCK) JOIN Person.Contact c (NOLOCK) ON e.ContactID = c.ContactID JOIN HumanResources.EmployeeAddress ea(NOLOCK) ON e.EmployeeID = ea.EmployeeID JOIN Person.Address a (NOLOCK) ON ea.AddressID = a.AddressID JOIN Person.StateProvince sp (NOLOCK) ON a.StateProvinceID = sp.StateProvinceID FOR XML AUTO, ROOT ('Employees')
Result:
<Employees> <e EmployeeID="1"> <c FirstName="Guy" LastName="Gilbert"> <a AddressLine1="7726 Driftwood Drive" City="Monroe" PostalCode="98272"> <sp StateOrProvince="Washington" /> </a> </c> </e> <e EmployeeID="2"> <c FirstName="Kevin" LastName="Brown"> <a AddressLine1="7883 Missing Canyon Court" City="Everett" PostalCode="98201"> <sp StateOrProvince="Washington" /> </a> </c> </e> </Employees>
In the results (trimmed
down to just 2 rows), you can see that AUTO mode utilizes the table aliases to
name the elements and the columns of each table having a column in the select clause
is rendered as an attribute of that table-named or alias-named element. If you
add back in the ELEMENTS option, each column would be represented as a child
element of the table-named or alias-named element. Make note that the table
EmployeeAddress was not represented in the XML because it did not have a column
in the SELECT clause.
In this particular case,
the data was so normalized out that it makes the XML rather deep for what was
supposed to be a simple list of employee names and addresses. The next two
modes we discuss will provide much more control over the styling of the XML.
FOR XML EXPLICIT
For greater control over
the styling of the resulting XML, SQL developers have the option of using
EXPLICIT mode. This was the only option for granular control up until SQL
Server 2005. EXPLICIT mode is quite painful and wordy (I’m having flashbacks to
COBOL programming), but it does give you increased flexibility. In the next section,
we’ll talk about the new and better way to mold your XML using PATH mode.
In the example below we
are continuing to pull a list of employees and their addresses from the
AdventureWorks database, however, we are going to attempt to make a much more
readable and more shallow XML document than what we previously produced with
AUTO mode. Keep in mind that I’m truncating the results to just 2 of the rows
for the sake of brevity.
SELECT 1 AS Tag, NULL AS Parent, e.EmployeeID AS [Employee!1!EmployeeID], c.FirstName AS [Employee!1!FirstName], c.LastName AS [Employee!1!LastName], NULL AS [Address!2!AddressLine1], NULL AS [Address!2!AddressLine2], NULL AS [Address!2!City], NULL AS [Address!2!StateOrProvince], NULL AS [Address!2!PostalCode] FROM HumanResources.Employee e(NOLOCK) JOIN Person.Contact c (NOLOCK) ON e.ContactID = c.ContactID UNION ALL SELECT 2 AS Tag, 1 AS Parent, e.EmployeeID AS [Employee!1!EmployeeID], NULL AS [Employee!1!FirstName], NULL AS [Employee!1!LastName], AddressLine1 AS [Address!2!AddressLine1], AddressLine2 AS [Address!2!AddressLine2], City AS [Address!2!City], StateProvinceCode AS [Address!2!StateOrProvince], PostalCode AS [Address!2!PostalCode] FROM HumanResources.Employee e(NOLOCK) JOIN HumanResources.EmployeeAddress ea(NOLOCK) ON e.EmployeeID = ea.EmployeeID JOIN Person.Address a (NOLOCK) ON ea.AddressID = a.AddressID JOIN Person.StateProvince s (NOLOCK) ON a.StateProvinceID = s.StateProvinceID ORDER BY e.EmployeeID, Tag FOR XML EXPLICIT, ROOT('Employees')
Result:
<Employees> <Employee EmployeeID="1" FirstName="Guy" LastName="Gilbert"> <Address AddressLine1="7726 Driftwood Drive" City="Monroe" StateOrProvince="WA " PostalCode="98272" /> </Employee> <Employee EmployeeID="2" FirstName="Kevin" LastName="Brown"> <Address AddressLine1="7883 Missing Canyon Court" City="Everett" StateOrProvince="WA " PostalCode="98201" /> </Employee> </Employees>
EXPLICIT mode can be
tricky to get just right. For instance, take the ORDER BY clause off the above
query and you’ll get all the employee elements first and then all the addresses
nested under the last employee node. This is of course less than helpful.
When debugging your
SELECT, you might find it useful to run the SELECT without the FOR XML clause.
This presents you with the results in what is called a universal table. It
looks like this for the first two rows in the example above. Again, you can try
running this without the ORDER BY as well so you can see what it looks like
when it’s incorrectly nested.
Tag |
Parent |
Employee!1!EmployeeID |
Employee!1!FirstName |
Employee!1!LastName |
Address!2!AddressLine1 |
Address!2!AddressLine2 |
Address!2!City |
Address!2!StateOrProvince |
Address!2!PostalCode |
1 |
NULL |
1 |
Guy |
Gilbert |
NULL |
NULL |
NULL |
NULL |
NULL |
2 |
1 |
1 |
NULL |
NULL |
7726 Driftwood Drive |
NULL |
Monroe |
WA |
98272 |
1 |
NULL |
2 |
Kevin |
Brown |
NULL |
NULL |
NULL |
NULL |
NULL |
2 |
1 |
2 |
NULL |
NULL |
7883 Missing Canyon Court |
NULL |
Everett |
WA |
98201 |
FOR XML Path
The new and improved mode
for generating XML with much control over the styling is PATH mode. Instead of
writing UNIONs to accomplish the nesting in PATH mode, you are able to write a
simpler SELECT and supply XPath expressions for the column aliases to designate
hierarchy.
In the example below, we
create the employees list we’ve been looking for. We are able to create a root
node by specifying the ROOT directive after FOR XML PATH and also name the
"row" element Employee. Because the first three columns don’t have a
path specified, they become children of the row node. Notice that EmployeeID
becomes an attribute of that node because it has the @ sign in front of its
alias name. All the address related columns have "Address/" in front
of their alias. This directs SQL Server to return them as subelements of an
Address node, which will be created as a sibling to the columns above it. It is
important to note that you cannot declare an attribute following an element
column on the same level (referred to as a non-attribute-centric sibling).
SELECT Employee.EmployeeID AS '@EmployeeID', Contact.FirstName AS 'FirstName', Contact.LastName AS 'LastName', AddressLine1 AS 'Address/AddressLine1', AddressLine2 AS 'Address/AddressLine2', City AS 'Address/City', StateProvinceCode AS 'Address/StateOrProvince', PostalCode AS 'Address/PostalCode' FROM HumanResources.Employee Employee(NOLOCK) JOIN Person.Contact Contact (NOLOCK) ON Employee.ContactID = Contact.ContactID JOIN HumanResources.EmployeeAddress ea(NOLOCK) ON Employee.EmployeeID = ea.EmployeeID JOIN Person.Address Address (NOLOCK) ON ea.AddressID = Address.AddressID JOIN Person.StateProvince s (NOLOCK) ON Address.StateProvinceID = s.StateProvinceID FOR XML PATH ('Employee'), ROOT('Employees')
Result:
<Employees> <Employee EmployeeID="1"> <FirstName>Guy</FirstName> <LastName>Gilbert</LastName> <Address> <AddressLine1>7726 Driftwood Drive</AddressLine1> <City>Monroe</City> <StateOrProvince>WA </StateOrProvince> <PostalCode>98272</PostalCode> </Address> </Employee> <Employee EmployeeID="2"> <FirstName>Kevin</FirstName> <LastName>Brown</LastName> <Address> <AddressLine1>7883 Missing Canyon Court</AddressLine1> <City>Everett</City> <StateOrProvince>WA </StateOrProvince> <PostalCode>98201</PostalCode> </Address> </Employee> ... </Employees>
Conclusion
SQL Server provides the
FOR XML clause as a means to render XML from relational data. The four modes:
RAW, AUTO, EXPLICIT, and PATH each provide different levels of complexity and
flexibility in styling the XML depending on your needs. PATH is the newest of
these modes and a refreshingly easier syntax for customizing the XML output.
Additional Resources
MSDN – Constructing XML using FOR XML
MSDN – Using RAW Mode
MSDN – Using AUTO Mode
MSDN – Using EXPLICIT Mode
MSDN – Using PATH Mode
MSDN – Generating Elements for NULL Values Using the XSINIL Parameter
MSDN – TYPE Directive in FOR XML Queries