Creating XML in SQL Server


XML has become a common form of representing and exchanging data in today’s information age. SQL Server introduced XML-centric capabilities in SQL Server 2000. That functionality has been expanded in later releases. One aspect of working with XML is creating XML from relational data, which is accomplished utilizing the FOR XML clause in SQL Server.

Overview

FOR XML is a clause that
can be appended to the end of a standard SELECT statement to cause the output to
be represented as xml. The result is a Unicode string containing elements and
attributes as determined by the mode specified in the clause. The four modes
are RAW, AUTO, EXPLICIT, and PATH. The following is the Books
Online
syntax for the FOR XML Clause.

[ FOR { BROWSE | <XML> } ]
<XML> ::=
XML 
    { 
      { RAW [ ('ElementName') ] | AUTO } 
        [ 
           <CommonDirectives> 
           [ , { XMLDATA | XMLSCHEMA [ ('TargetNameSpaceURI') ]} ] 
           [ , ELEMENTS [ XSINIL | ABSENT ] 
        ]
      | EXPLICIT 
        [ 
           <CommonDirectives> 
           [ , XMLDATA ] 
        ]
      | PATH [ ('ElementName') ] 
        [ 
           <CommonDirectives> 
           [ , ELEMENTS [ XSINIL | ABSENT ] ]
        ]
     } 
 
 <CommonDirectives> ::= 
   [ , BINARY BASE64 ]
   [ , TYPE ]
   [ , ROOT [ ('RootName') ] ]

This syntax shows each of
the four modes and the options available with those modes. Let’s have a look at
each of the modes in turn.

FOR XML RAW

The most basic and
straight forward mode for creating XML with FOR XML is RAW. RAW spits out an
element named "row" for each row of the result set from the SELECT
statement. Each column of the row is created as an attribute of the row
element.

To show you what this
would look like, we are going to use the AdventureWorks sample database. Below
we have a standard SELECT statement for retrieving employee address information
from this database. At the bottom of the statement, you’ll notice the FOR XML
clause and that it is using RAW mode. This results in a Unicode string that
contains a couple hundred row elements. I’m going to show you just the first
four because that’s enough to make the point.

SELECT e.EmployeeID,
	 FirstName,
	 LastName,
	 AddressLine1,
	 City,
	 sp.Name as 'StateOrProvince',
	 PostalCode
  FROM HumanResources.Employee e(NOLOCK)
  JOIN Person.Contact c (NOLOCK)
    ON e.ContactID = c.ContactID
  JOIN HumanResources.EmployeeAddress ea(NOLOCK)
    ON e.EmployeeID = ea.EmployeeID
  JOIN Person.Address a (NOLOCK)
    ON ea.AddressID = a.AddressID
  JOIN Person.StateProvince sp (NOLOCK)
    ON a.StateProvinceID = sp.StateProvinceID
FOR XML RAW

Results:
<row EmployeeID="1" FirstName="Guy" LastName="Gilbert" AddressLine1="7726 Driftwood Drive" City="Monroe" StateOrProvince="Washington" PostalCode="98272" />
<row EmployeeID="2" FirstName="Kevin" LastName="Brown" AddressLine1="7883 Missing Canyon Court" City="Everett" StateOrProvince="Washington" PostalCode="98201" />
<row EmployeeID="3" FirstName="Roberto" LastName="Tamburello" AddressLine1="2137 Birchwood Dr" City="Redmond" StateOrProvince="Washington" PostalCode="98052" />
<row EmployeeID="4" FirstName="Rob" LastName="Walters" AddressLine1="5678 Lakeview Blvd." City="Minneapolis" StateOrProvince="Minnesota" PostalCode="55402" />

Notice that each row of
the results of the SELECT statement is rather generically assigned to a row
element in the XML and each column of the result set is assigned to an
attribute of that row. This is a rather quick and dirty XML representation of
the data as it gives you no idea how that data is related hierarchically.

In the syntax of FOR XML,
RAW mode has a couple of options of note to give you slightly more control over
the styling of the resulting xml. Should you decide that you don’t like that
row element, you can supply a name for it in parentheses and quotes directly
following the keyword RAW. Additionally, you can have a schema generated for
your XML using the XMLDATA | XMLSCHEMA options. Not fond of attributes? Specify
the ELEMENTS option to have each column generate as a new child element of the
row. Note that if any column of your result set has a null value and you wish
to have an element for it anyway, you will need to specify XSINIL following the
ELEMENTS option.

Each of the modes also
includes the common directives of BINARY BASE64, TYPE, and ROOT. BINARY BASE 64
is used to direct SQL Server to render binary columns of the result set as
base-64 encoding. This is implicit for some of the modes, but mandatory for RAW
mode if you are retrieving binary data. TYPE instructs SQL Server to return the
XML as an xml data type instead of a string. Finally, ROOT is used to specify a
root node and optional name.

Here’s an example of a few
of the directives and options. This is the same select from above only with the
option to rename the row element, the directive to supply a named root element,
and the ELEMENTS option to specify each non-null column to be rendered as a
child element of the row. I’ve adjusted the results to just show 2 of the 290
resulting rows.

SELECT e.EmployeeID,
	 FirstName,
	 LastName,
	 AddressLine1,
	 City,
	 sp.Name AS 'StateOrProvince',
	 PostalCode
  FROM HumanResources.Employee e(NOLOCK)
  JOIN Person.Contact c (NOLOCK)
    ON e.ContactID = c.ContactID
  JOIN HumanResources.EmployeeAddress ea(NOLOCK)
    ON e.EmployeeID = ea.EmployeeID
  JOIN Person.Address a (NOLOCK)
    ON ea.AddressID = a.AddressID
  JOIN Person.StateProvince sp (NOLOCK)
    ON a.StateProvinceID = sp.StateProvinceID
FOR XML RAW ('Employee'), ROOT ('Employees'), ELEMENTS 

Result :

<Employees>
  <Employee>
    <EmployeeID>1</EmployeeID>
    <FirstName>Guy</FirstName>
    <LastName>Gilbert</LastName>
    <AddressLine1>7726 Driftwood Drive</AddressLine1>
    <City>Monroe</City>
    <StateOrProvince>Washington</StateOrProvince>
    <PostalCode>98272</PostalCode>
  </Employee>
  <Employee>
    <EmployeeID>2</EmployeeID>
    <FirstName>Kevin</FirstName>
    <LastName>Brown</LastName>
    <AddressLine1>7883 Missing Canyon Court</AddressLine1>
    <City>Everett</City>
    <StateOrProvince>Washington</StateOrProvince>
    <PostalCode>98201</PostalCode>
  </Employee>
</Employees>

FOR XML AUTO

The biggest difference
between RAW and AUTO mode is that AUTO mode provides for a level of
hierarchical meaning to the style of the XML. In other words, instead of having
columns in the result set all as attributes of the row or even subelements all
on the same level, columns from different tables become nested elements based
on the order of the columns in the select.

Below we take the same
example query from above and replace RAW with AUTO. The row element name cannot
be specified with AUTO (only with RAW and PATH), so that has been removed. I
also took off ELEMENTS even though it is valid here so it wouldn’t obscure the
differences in how AUTO renders the attributes and elements by default.

SELECT e.EmployeeID,
	 FirstName,
	 LastName,
	 AddressLine1,
	 City,
	 sp.Name as 'StateOrProvince',
	 PostalCode
  FROM HumanResources.Employee e(NOLOCK)
  JOIN Person.Contact c (NOLOCK)
    ON e.ContactID = c.ContactID
  JOIN HumanResources.EmployeeAddress ea(NOLOCK)
    ON e.EmployeeID = ea.EmployeeID
  JOIN Person.Address a (NOLOCK)
    ON ea.AddressID = a.AddressID
  JOIN Person.StateProvince sp (NOLOCK)
    ON a.StateProvinceID = sp.StateProvinceID
FOR XML AUTO, ROOT ('Employees')

Result:

<Employees>
  <e EmployeeID="1">
    <c FirstName="Guy" LastName="Gilbert">
      <a AddressLine1="7726 Driftwood Drive" City="Monroe" PostalCode="98272">
        <sp StateOrProvince="Washington" />
      </a>
    </c>
  </e>
  <e EmployeeID="2">
    <c FirstName="Kevin" LastName="Brown">
      <a AddressLine1="7883 Missing Canyon Court" City="Everett" PostalCode="98201">
        <sp StateOrProvince="Washington" />
      </a>
    </c>
  </e>
</Employees>

In the results (trimmed
down to just 2 rows), you can see that AUTO mode utilizes the table aliases to
name the elements and the columns of each table having a column in the select clause
is rendered as an attribute of that table-named or alias-named element. If you
add back in the ELEMENTS option, each column would be represented as a child
element of the table-named or alias-named element. Make note that the table
EmployeeAddress was not represented in the XML because it did not have a column
in the SELECT clause.

In this particular case,
the data was so normalized out that it makes the XML rather deep for what was
supposed to be a simple list of employee names and addresses. The next two
modes we discuss will provide much more control over the styling of the XML.

FOR XML EXPLICIT

For greater control over
the styling of the resulting XML, SQL developers have the option of using
EXPLICIT mode. This was the only option for granular control up until SQL
Server 2005. EXPLICIT mode is quite painful and wordy (I’m having flashbacks to
COBOL programming), but it does give you increased flexibility. In the next section,
we’ll talk about the new and better way to mold your XML using PATH mode.

In the example below we
are continuing to pull a list of employees and their addresses from the
AdventureWorks database, however, we are going to attempt to make a much more
readable and more shallow XML document than what we previously produced with
AUTO mode. Keep in mind that I’m truncating the results to just 2 of the rows
for the sake of brevity.

SELECT 1            AS Tag,
       NULL         AS Parent,
       e.EmployeeID AS [Employee!1!EmployeeID],
       c.FirstName  AS [Employee!1!FirstName],
       c.LastName   AS [Employee!1!LastName],
       NULL         AS [Address!2!AddressLine1],
       NULL         AS [Address!2!AddressLine2],
       NULL         AS [Address!2!City],
       NULL         AS [Address!2!StateOrProvince],
       NULL         AS [Address!2!PostalCode]
 FROM  HumanResources.Employee e(NOLOCK)
 JOIN  Person.Contact c (NOLOCK)
   ON  e.ContactID = c.ContactID
UNION ALL	
SELECT 2            AS Tag,
       1            AS Parent,
       e.EmployeeID AS [Employee!1!EmployeeID],
       NULL         AS [Employee!1!FirstName],
       NULL         AS [Employee!1!LastName],
       AddressLine1 AS [Address!2!AddressLine1],
       AddressLine2 AS [Address!2!AddressLine2],
       City         AS [Address!2!City],
       StateProvinceCode AS [Address!2!StateOrProvince],
       PostalCode   AS [Address!2!PostalCode]
 FROM  HumanResources.Employee e(NOLOCK)
 JOIN  HumanResources.EmployeeAddress ea(NOLOCK)
   ON  e.EmployeeID = ea.EmployeeID
 JOIN  Person.Address a (NOLOCK)
   ON  ea.AddressID = a.AddressID  
 JOIN  Person.StateProvince s (NOLOCK)
   ON  a.StateProvinceID = s.StateProvinceID
ORDER BY e.EmployeeID, Tag
FOR XML EXPLICIT, ROOT('Employees')

Result:

<Employees>
  <Employee EmployeeID="1" FirstName="Guy" LastName="Gilbert">
    <Address AddressLine1="7726 Driftwood Drive" City="Monroe" StateOrProvince="WA " PostalCode="98272" />
  </Employee>
  <Employee EmployeeID="2" FirstName="Kevin" LastName="Brown">
    <Address AddressLine1="7883 Missing Canyon Court" City="Everett" StateOrProvince="WA " PostalCode="98201" />
  </Employee>
</Employees>

EXPLICIT mode can be
tricky to get just right. For instance, take the ORDER BY clause off the above
query and you’ll get all the employee elements first and then all the addresses
nested under the last employee node. This is of course less than helpful.

When debugging your
SELECT, you might find it useful to run the SELECT without the FOR XML clause.
This presents you with the results in what is called a universal table. It
looks like this for the first two rows in the example above. Again, you can try
running this without the ORDER BY as well so you can see what it looks like
when it’s incorrectly nested.

Tag

Parent

Employee!1!EmployeeID

Employee!1!FirstName

Employee!1!LastName

Address!2!AddressLine1

Address!2!AddressLine2

Address!2!City

Address!2!StateOrProvince

Address!2!PostalCode

1

NULL

1

Guy

Gilbert

NULL

NULL

NULL

NULL

NULL

2

1

1

NULL

NULL

7726 Driftwood Drive

NULL

Monroe

WA

98272

1

NULL

2

Kevin

Brown

NULL

NULL

NULL

NULL

NULL

2

1

2

NULL

NULL

7883 Missing Canyon Court

NULL

Everett

WA

98201

FOR XML Path

The new and improved mode
for generating XML with much control over the styling is PATH mode. Instead of
writing UNIONs to accomplish the nesting in PATH mode, you are able to write a
simpler SELECT and supply XPath expressions for the column aliases to designate
hierarchy.

In the example below, we
create the employees list we’ve been looking for. We are able to create a root
node by specifying the ROOT directive after FOR XML PATH and also name the
"row" element Employee. Because the first three columns don’t have a
path specified, they become children of the row node. Notice that EmployeeID
becomes an attribute of that node because it has the @ sign in front of its
alias name. All the address related columns have "Address/" in front
of their alias. This directs SQL Server to return them as subelements of an
Address node, which will be created as a sibling to the columns above it. It is
important to note that you cannot declare an attribute following an element
column on the same level (referred to as a non-attribute-centric sibling).

SELECT Employee.EmployeeID  AS '@EmployeeID',
       Contact.FirstName    AS 'FirstName',
       Contact.LastName     AS 'LastName',
       AddressLine1         AS 'Address/AddressLine1',
       AddressLine2         AS 'Address/AddressLine2',
       City                 AS 'Address/City',
       StateProvinceCode    AS 'Address/StateOrProvince',
       PostalCode           AS 'Address/PostalCode'
 FROM  HumanResources.Employee Employee(NOLOCK)
 JOIN  Person.Contact Contact (NOLOCK)
   ON  Employee.ContactID = Contact.ContactID
 JOIN  HumanResources.EmployeeAddress ea(NOLOCK)
   ON  Employee.EmployeeID = ea.EmployeeID
 JOIN  Person.Address Address (NOLOCK)
   ON  ea.AddressID = Address.AddressID  
 JOIN  Person.StateProvince s (NOLOCK)
   ON  Address.StateProvinceID = s.StateProvinceID
FOR XML PATH ('Employee'), ROOT('Employees')

Result:

<Employees>
  <Employee EmployeeID="1">
    <FirstName>Guy</FirstName>
    <LastName>Gilbert</LastName>
    <Address>
      <AddressLine1>7726 Driftwood Drive</AddressLine1>
      <City>Monroe</City>
      <StateOrProvince>WA </StateOrProvince>
      <PostalCode>98272</PostalCode>
    </Address>
  </Employee>
  <Employee EmployeeID="2">
    <FirstName>Kevin</FirstName>
    <LastName>Brown</LastName>
    <Address>
      <AddressLine1>7883 Missing Canyon Court</AddressLine1>
      <City>Everett</City>
      <StateOrProvince>WA </StateOrProvince>
      <PostalCode>98201</PostalCode>
    </Address>
  </Employee>
... 
</Employees>

Conclusion

SQL Server provides the
FOR XML clause as a means to render XML from relational data. The four modes:
RAW, AUTO, EXPLICIT, and PATH each provide different levels of complexity and
flexibility in styling the XML depending on your needs. PATH is the newest of
these modes and a refreshingly easier syntax for customizing the XML output.

Additional Resources

MSDN – Constructing XML using FOR XML

MSDN – Using RAW Mode

MSDN – Using AUTO Mode

MSDN – Using EXPLICIT Mode

MSDN – Using PATH Mode

MSDN – Generating Elements for NULL Values Using the XSINIL Parameter

MSDN – TYPE Directive in FOR XML Queries

»


See All Articles by Columnist

Deanna Dicken

Deanna Dicken
Deanna Dicken
Deanna Dicken is an architect with approximately 20 years of IT experience. In that time she has worked on many large-scale, mission-critical applications and has been involved at all phases of the lifecycle. She has also contributed to three SQL Server MCSE books, co-authored Learn SQL in a Weekend, and tech edited many other titles. She lives on the outskirts of Indianapolis, Indiana with her husband, Curtis, and children, Kylee and Zach.

Latest Articles