Creating XML in SQL Server

June 4, 2010

XML has become a common form of representing and exchanging data in today's information age. SQL Server introduced XML-centric capabilities in SQL Server 2000. That functionality has been expanded in later releases. One aspect of working with XML is creating XML from relational data, which is accomplished utilizing the FOR XML clause in SQL Server.

Overview

FOR XML is a clause that can be appended to the end of a standard SELECT statement to cause the output to be represented as xml. The result is a Unicode string containing elements and attributes as determined by the mode specified in the clause. The four modes are RAW, AUTO, EXPLICIT, and PATH. The following is the Books Online syntax for the FOR XML Clause.

[ FOR { BROWSE | <XML> } ]
<XML> ::=
XML 
    { 
      { RAW [ ('ElementName') ] | AUTO } 
        [ 
           <CommonDirectives> 
           [ , { XMLDATA | XMLSCHEMA [ ('TargetNameSpaceURI') ]} ] 
           [ , ELEMENTS [ XSINIL | ABSENT ] 
        ]
      | EXPLICIT 
        [ 
           <CommonDirectives> 
           [ , XMLDATA ] 
        ]
      | PATH [ ('ElementName') ] 
        [ 
           <CommonDirectives> 
           [ , ELEMENTS [ XSINIL | ABSENT ] ]
        ]
     } 
 
 <CommonDirectives> ::= 
   [ , BINARY BASE64 ]
   [ , TYPE ]
   [ , ROOT [ ('RootName') ] ]

This syntax shows each of the four modes and the options available with those modes. Let's have a look at each of the modes in turn.

FOR XML RAW

The most basic and straight forward mode for creating XML with FOR XML is RAW. RAW spits out an element named "row" for each row of the result set from the SELECT statement. Each column of the row is created as an attribute of the row element.

To show you what this would look like, we are going to use the AdventureWorks sample database. Below we have a standard SELECT statement for retrieving employee address information from this database. At the bottom of the statement, you'll notice the FOR XML clause and that it is using RAW mode. This results in a Unicode string that contains a couple hundred row elements. I'm going to show you just the first four because that's enough to make the point.

SELECT e.EmployeeID,
	 FirstName,
	 LastName,
	 AddressLine1,
	 City,
	 sp.Name as 'StateOrProvince',
	 PostalCode
  FROM HumanResources.Employee e(NOLOCK)
  JOIN Person.Contact c (NOLOCK)
    ON e.ContactID = c.ContactID
  JOIN HumanResources.EmployeeAddress ea(NOLOCK)
    ON e.EmployeeID = ea.EmployeeID
  JOIN Person.Address a (NOLOCK)
    ON ea.AddressID = a.AddressID
  JOIN Person.StateProvince sp (NOLOCK)
    ON a.StateProvinceID = sp.StateProvinceID
FOR XML RAW

Results:
<row EmployeeID="1" FirstName="Guy" LastName="Gilbert" AddressLine1="7726 Driftwood Drive" City="Monroe" StateOrProvince="Washington" PostalCode="98272" />
<row EmployeeID="2" FirstName="Kevin" LastName="Brown" AddressLine1="7883 Missing Canyon Court" City="Everett" StateOrProvince="Washington" PostalCode="98201" />
<row EmployeeID="3" FirstName="Roberto" LastName="Tamburello" AddressLine1="2137 Birchwood Dr" City="Redmond" StateOrProvince="Washington" PostalCode="98052" />
<row EmployeeID="4" FirstName="Rob" LastName="Walters" AddressLine1="5678 Lakeview Blvd." City="Minneapolis" StateOrProvince="Minnesota" PostalCode="55402" />

Notice that each row of the results of the SELECT statement is rather generically assigned to a row element in the XML and each column of the result set is assigned to an attribute of that row. This is a rather quick and dirty XML representation of the data as it gives you no idea how that data is related hierarchically.

In the syntax of FOR XML, RAW mode has a couple of options of note to give you slightly more control over the styling of the resulting xml. Should you decide that you don't like that row element, you can supply a name for it in parentheses and quotes directly following the keyword RAW. Additionally, you can have a schema generated for your XML using the XMLDATA | XMLSCHEMA options. Not fond of attributes? Specify the ELEMENTS option to have each column generate as a new child element of the row. Note that if any column of your result set has a null value and you wish to have an element for it anyway, you will need to specify XSINIL following the ELEMENTS option.

Each of the modes also includes the common directives of BINARY BASE64, TYPE, and ROOT. BINARY BASE 64 is used to direct SQL Server to render binary columns of the result set as base-64 encoding. This is implicit for some of the modes, but mandatory for RAW mode if you are retrieving binary data. TYPE instructs SQL Server to return the XML as an xml data type instead of a string. Finally, ROOT is used to specify a root node and optional name.

Here's an example of a few of the directives and options. This is the same select from above only with the option to rename the row element, the directive to supply a named root element, and the ELEMENTS option to specify each non-null column to be rendered as a child element of the row. I've adjusted the results to just show 2 of the 290 resulting rows.

SELECT e.EmployeeID,
	 FirstName,
	 LastName,
	 AddressLine1,
	 City,
	 sp.Name AS 'StateOrProvince',
	 PostalCode
  FROM HumanResources.Employee e(NOLOCK)
  JOIN Person.Contact c (NOLOCK)
    ON e.ContactID = c.ContactID
  JOIN HumanResources.EmployeeAddress ea(NOLOCK)
    ON e.EmployeeID = ea.EmployeeID
  JOIN Person.Address a (NOLOCK)
    ON ea.AddressID = a.AddressID
  JOIN Person.StateProvince sp (NOLOCK)
    ON a.StateProvinceID = sp.StateProvinceID
FOR XML RAW ('Employee'), ROOT ('Employees'), ELEMENTS 

Result :

<Employees>
  <Employee>
    <EmployeeID>1</EmployeeID>
    <FirstName>Guy</FirstName>
    <LastName>Gilbert</LastName>
    <AddressLine1>7726 Driftwood Drive</AddressLine1>
    <City>Monroe</City>
    <StateOrProvince>Washington</StateOrProvince>
    <PostalCode>98272</PostalCode>
  </Employee>
  <Employee>
    <EmployeeID>2</EmployeeID>
    <FirstName>Kevin</FirstName>
    <LastName>Brown</LastName>
    <AddressLine1>7883 Missing Canyon Court</AddressLine1>
    <City>Everett</City>
    <StateOrProvince>Washington</StateOrProvince>
    <PostalCode>98201</PostalCode>
  </Employee>
</Employees>

FOR XML AUTO

The biggest difference between RAW and AUTO mode is that AUTO mode provides for a level of hierarchical meaning to the style of the XML. In other words, instead of having columns in the result set all as attributes of the row or even subelements all on the same level, columns from different tables become nested elements based on the order of the columns in the select.

Below we take the same example query from above and replace RAW with AUTO. The row element name cannot be specified with AUTO (only with RAW and PATH), so that has been removed. I also took off ELEMENTS even though it is valid here so it wouldn't obscure the differences in how AUTO renders the attributes and elements by default.

SELECT e.EmployeeID,
	 FirstName,
	 LastName,
	 AddressLine1,
	 City,
	 sp.Name as 'StateOrProvince',
	 PostalCode
  FROM HumanResources.Employee e(NOLOCK)
  JOIN Person.Contact c (NOLOCK)
    ON e.ContactID = c.ContactID
  JOIN HumanResources.EmployeeAddress ea(NOLOCK)
    ON e.EmployeeID = ea.EmployeeID
  JOIN Person.Address a (NOLOCK)
    ON ea.AddressID = a.AddressID
  JOIN Person.StateProvince sp (NOLOCK)
    ON a.StateProvinceID = sp.StateProvinceID
FOR XML AUTO, ROOT ('Employees')

Result:

<Employees>
  <e EmployeeID="1">
    <c FirstName="Guy" LastName="Gilbert">
      <a AddressLine1="7726 Driftwood Drive" City="Monroe" PostalCode="98272">
        <sp StateOrProvince="Washington" />
      </a>
    </c>
  </e>
  <e EmployeeID="2">
    <c FirstName="Kevin" LastName="Brown">
      <a AddressLine1="7883 Missing Canyon Court" City="Everett" PostalCode="98201">
        <sp StateOrProvince="Washington" />
      </a>
    </c>
  </e>
</Employees>

In the results (trimmed down to just 2 rows), you can see that AUTO mode utilizes the table aliases to name the elements and the columns of each table having a column in the select clause is rendered as an attribute of that table-named or alias-named element. If you add back in the ELEMENTS option, each column would be represented as a child element of the table-named or alias-named element. Make note that the table EmployeeAddress was not represented in the XML because it did not have a column in the SELECT clause.

In this particular case, the data was so normalized out that it makes the XML rather deep for what was supposed to be a simple list of employee names and addresses. The next two modes we discuss will provide much more control over the styling of the XML.

FOR XML EXPLICIT

For greater control over the styling of the resulting XML, SQL developers have the option of using EXPLICIT mode. This was the only option for granular control up until SQL Server 2005. EXPLICIT mode is quite painful and wordy (I'm having flashbacks to COBOL programming), but it does give you increased flexibility. In the next section, we'll talk about the new and better way to mold your XML using PATH mode.

In the example below we are continuing to pull a list of employees and their addresses from the AdventureWorks database, however, we are going to attempt to make a much more readable and more shallow XML document than what we previously produced with AUTO mode. Keep in mind that I'm truncating the results to just 2 of the rows for the sake of brevity.

SELECT 1            AS Tag,
       NULL         AS Parent,
       e.EmployeeID AS [Employee!1!EmployeeID],
       c.FirstName  AS [Employee!1!FirstName],
       c.LastName   AS [Employee!1!LastName],
       NULL         AS [Address!2!AddressLine1],
       NULL         AS [Address!2!AddressLine2],
       NULL         AS [Address!2!City],
       NULL         AS [Address!2!StateOrProvince],
       NULL         AS [Address!2!PostalCode]
 FROM  HumanResources.Employee e(NOLOCK)
 JOIN  Person.Contact c (NOLOCK)
   ON  e.ContactID = c.ContactID
UNION ALL	
SELECT 2            AS Tag,
       1            AS Parent,
       e.EmployeeID AS [Employee!1!EmployeeID],
       NULL         AS [Employee!1!FirstName],
       NULL         AS [Employee!1!LastName],
       AddressLine1 AS [Address!2!AddressLine1],
       AddressLine2 AS [Address!2!AddressLine2],
       City         AS [Address!2!City],
       StateProvinceCode AS [Address!2!StateOrProvince],
       PostalCode   AS [Address!2!PostalCode]
 FROM  HumanResources.Employee e(NOLOCK)
 JOIN  HumanResources.EmployeeAddress ea(NOLOCK)
   ON  e.EmployeeID = ea.EmployeeID
 JOIN  Person.Address a (NOLOCK)
   ON  ea.AddressID = a.AddressID  
 JOIN  Person.StateProvince s (NOLOCK)
   ON  a.StateProvinceID = s.StateProvinceID
ORDER BY e.EmployeeID, Tag
FOR XML EXPLICIT, ROOT('Employees')

Result:

<Employees>
  <Employee EmployeeID="1" FirstName="Guy" LastName="Gilbert">
    <Address AddressLine1="7726 Driftwood Drive" City="Monroe" StateOrProvince="WA " PostalCode="98272" />
  </Employee>
  <Employee EmployeeID="2" FirstName="Kevin" LastName="Brown">
    <Address AddressLine1="7883 Missing Canyon Court" City="Everett" StateOrProvince="WA " PostalCode="98201" />
  </Employee>
</Employees>

EXPLICIT mode can be tricky to get just right. For instance, take the ORDER BY clause off the above query and you'll get all the employee elements first and then all the addresses nested under the last employee node. This is of course less than helpful.

When debugging your SELECT, you might find it useful to run the SELECT without the FOR XML clause. This presents you with the results in what is called a universal table. It looks like this for the first two rows in the example above. Again, you can try running this without the ORDER BY as well so you can see what it looks like when it's incorrectly nested.

Tag

Parent

Employee!1!EmployeeID

Employee!1!FirstName

Employee!1!LastName

Address!2!AddressLine1

Address!2!AddressLine2

Address!2!City

Address!2!StateOrProvince

Address!2!PostalCode

1

NULL

1

Guy

Gilbert

NULL

NULL

NULL

NULL

NULL

2

1

1

NULL

NULL

7726 Driftwood Drive

NULL

Monroe

WA

98272

1

NULL

2

Kevin

Brown

NULL

NULL

NULL

NULL

NULL

2

1

2

NULL

NULL

7883 Missing Canyon Court

NULL

Everett

WA

98201

FOR XML Path

The new and improved mode for generating XML with much control over the styling is PATH mode. Instead of writing UNIONs to accomplish the nesting in PATH mode, you are able to write a simpler SELECT and supply XPath expressions for the column aliases to designate hierarchy.

In the example below, we create the employees list we've been looking for. We are able to create a root node by specifying the ROOT directive after FOR XML PATH and also name the "row" element Employee. Because the first three columns don't have a path specified, they become children of the row node. Notice that EmployeeID becomes an attribute of that node because it has the @ sign in front of its alias name. All the address related columns have "Address/" in front of their alias. This directs SQL Server to return them as subelements of an Address node, which will be created as a sibling to the columns above it. It is important to note that you cannot declare an attribute following an element column on the same level (referred to as a non-attribute-centric sibling).

SELECT Employee.EmployeeID  AS '@EmployeeID',
       Contact.FirstName    AS 'FirstName',
       Contact.LastName     AS 'LastName',
       AddressLine1         AS 'Address/AddressLine1',
       AddressLine2         AS 'Address/AddressLine2',
       City                 AS 'Address/City',
       StateProvinceCode    AS 'Address/StateOrProvince',
       PostalCode           AS 'Address/PostalCode'
 FROM  HumanResources.Employee Employee(NOLOCK)
 JOIN  Person.Contact Contact (NOLOCK)
   ON  Employee.ContactID = Contact.ContactID
 JOIN  HumanResources.EmployeeAddress ea(NOLOCK)
   ON  Employee.EmployeeID = ea.EmployeeID
 JOIN  Person.Address Address (NOLOCK)
   ON  ea.AddressID = Address.AddressID  
 JOIN  Person.StateProvince s (NOLOCK)
   ON  Address.StateProvinceID = s.StateProvinceID
FOR XML PATH ('Employee'), ROOT('Employees')

Result:

<Employees>
  <Employee EmployeeID="1">
    <FirstName>Guy</FirstName>
    <LastName>Gilbert</LastName>
    <Address>
      <AddressLine1>7726 Driftwood Drive</AddressLine1>
      <City>Monroe</City>
      <StateOrProvince>WA </StateOrProvince>
      <PostalCode>98272</PostalCode>
    </Address>
  </Employee>
  <Employee EmployeeID="2">
    <FirstName>Kevin</FirstName>
    <LastName>Brown</LastName>
    <Address>
      <AddressLine1>7883 Missing Canyon Court</AddressLine1>
      <City>Everett</City>
      <StateOrProvince>WA </StateOrProvince>
      <PostalCode>98201</PostalCode>
    </Address>
  </Employee>
... 
</Employees>

Conclusion

SQL Server provides the FOR XML clause as a means to render XML from relational data. The four modes: RAW, AUTO, EXPLICIT, and PATH each provide different levels of complexity and flexibility in styling the XML depending on your needs. PATH is the newest of these modes and a refreshingly easier syntax for customizing the XML output.

Additional Resources

MSDN - Constructing XML using FOR XML
MSDN - Using RAW Mode
MSDN - Using AUTO Mode
MSDN - Using EXPLICIT Mode
MSDN - Using PATH Mode
MSDN - Generating Elements for NULL Values Using the XSINIL Parameter
MSDN - TYPE Directive in FOR XML Queries

» See All Articles by Columnist Deanna Dicken








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers