XPath Support for Working with XML Data in MySQL

During the past few weeks, we’ve been getting better acquainted with
XML support in MySQL. In the last article, we
expanded the discussion to include a third-party tool called MySqlDump. We used it to produce an XML document that we queried using a stored procedure. Today, we’ll be finishing
off the series by talking about how to manipulate XML data using the XPath,
which is a language for navigating and querying XML documents.

The UpdateXML() Function

While we’ve spent considerable time going over the ExtractValue()
function, we haven’t yet taken a good look at MySQL’s other XML function,
UpdateXML(). That’s because we were focusing on importing data from an XML
document into a MySQL table. UpdateXML() is a slightly more sophisticated function that is used to match and
replace a portion of an XML fragment with different XML markup. You can see
how such a function would be usefull for massaging the data without converting back
and forth between XML and table structures.

ExtractValue() takes two string arguments, a fragment of XML markup and an XPath
expression:

ExtractValue(xml_frag, xpath_expr)

It returns the first text node which is a child of the element(s)
matched by the XPath expression.

Here’s an example:

Say you wanted to change “<state><city/></state> to “<state><county><city/></county></state>“ and save the result into a variable. Here’s how you would do it using
the UpdateXML() function:


mysql> SELECT @new_xml_node:=UpdateXML
(‘<state><city/></state>’,
-> ‘//city’,
-> ‘<county><city/></county>’)
-> AS xml_node;
+—————————————–+
| xml_node |
+—————————————–+
| <state><county><city/></county></state> |
+—————————————–+
1 row in set (0.03 sec)

mysql> SELECT @new_xml_node;
+—————————————–+
| @new_xml_node |
+—————————————–+
| <state><county><city/></county></state> |
+—————————————–+
1 row in set (0.00 sec)

If no matching text node is found for the expression, the original XML
string is returned:


mysql> SELECT @new_xml_node:=UpdateXML
(‘<state><city/></state>’,
-> ‘//dummy’,
-> ‘<county><city/></county>’)
-> AS xml_node;
+—————————+
| xml_node |
+—————————+
| <state><city/></state> |
+—————————+
1 row in set (0.03 sec)

In the event that multiple matches are found, the content of the first
child text node of each matching element is returned in the order matched:


mysql> SELECT @new_xml_node:=UpdateXML(‘<state><city/></state><state><city/></state><state><city/></state>’,
-> ‘//city,
-> ‘<county><city/></county>’)
-> AS xml_node;

+——————————————————————–+
| xml_node |
+——————————————————————–+
| <state><city/></state><state><city/></state><state><city/></state> |
+——————————————————————–+
1 row in set (0.00 sec)

Because the UpdateXML() function makes no distinction between a match
on an empty element and no match at all, there needs to be a way of differenciating between the two situations where either no matching
element was found in the xml_frag or such an element was found but
contained no child text nodes. The way to do it, is to test the result of
ExtractValue() using the XPath count() function:


mysql> SELECT ExtractValue(‘<state><city/></state><state><city/></state><state><city/></state>’,
-> ‘count(//city)’) AS xml_node;
+———-+
| xml_node |
+———-+
| 3 |
+———-+
1 row in set (0.00 sec)

mysql> SELECT ExtractValue(‘<state><city/></state><state><city/></state><state><city/></state>’,
-> ‘count(//county)’) AS xml_node;
+———-+
| xml_node |
+———-+
| 0 |
+———-+
1 row in set (0.00 sec)

Error handling

For both ExtractValue() and UpdateXML(),
the XPath locator used must be valid and the XML to be searched must be well-formed.
If the locator is invalid, an error is generated:


mysql> SELECT @new_xml_node:=UpdateXML(‘<state><city/></state>’,
-> ‘//city/”state’, ‘<county><city/></county>’) AS xml_node;
ERROR 1105 (HY000): XPATH syntax error: ‘”state’

If the XML to be searched is not well-formed, then NULL is returned, and a warning is issued:


mysql> SELECT @new_xml_node:=UpdateXML(‘<state><city></state>’, ‘//city’,
-> ‘<county><city/></county>’) AS xml_node;
+———-+
| xml_node |
+———-+
| NULL |
+———-+
1 row in set, 1 warning (0.01 sec)

You can display the warning by issuing the show warnings command:


mysql> show warnings;
+———+——+————————————————————————————-+
| Level | Code | Message |
+———+——+————————————————————————————-+
| Warning | 1525 | Incorrect XML value: ‘parse error at line 1 pos 21: ‘</state>’ unexpected (‘</city>’ wanted)’ |
+———+——+————————————————————————————-+
1 row in set (0.01 sec)

The replacement XML passed as the third argument to UpdateXML() is not checked for
well-formedness at all:


mysql> SELECT @new_xml_node:=UpdateXML(‘<state><city/></state>’, ‘//city’,
-> ‘<<county><city/></county>>’) AS xml_node;
+——————————————-+
| xml_node |
+——————————————-+
| <state><<county><city/></county>></state> |
+——————————————-+
1 row in set (0.00 sec)

Using Variables in XPath Expressions

As of MySQL 5.1.20, you can supply the XPath locator argument in
variables. This gives you the flexability to pass them to stored functions and
procs. Variables can be weakly or strongly checked, depending on the syntax
used.

Weakly Checked Variables

A variable using the syntax [email protected]variable_name is not
checked for type or for whether or not it has previously been assigned a value.
Hence, MySQL won’t report
any warnings or errors if a variable has the wrong type or is undefined
.
For example, if you misspell a variable as [email protected] instead of [email protected], and the misspelled variable has not been assigned a value,
MySQL will assign a value of “none”, or NULL.

The following example should help clarify. In it, we create a variable
called @xml_cities. It contains three states with a city in the text node. Next, we
create two variables to store some XPath information. The first SELECT query
shows the value of the @state_index_first
variable along with the
ExtractValue() function. Since the
@state_index_first variable contains a value of 1, the //state[@state_index_first]
expression retrieves the first city found: “Chicago”.

The
second SELECT query retrieves the value of the
@state_index_second variable, along with the results of the ExtractValue(). It works as expected
and returns "Ocean City".

The
final SELECT statement is flawed because there is no @state_index_third variable. Although the MySQL client
doesn’t complain at all and the ExtractValue() function runs, no value is
returned. This could lead us to believe that there are no more cities, when there
clearly is!


mysql> SET @xml_cities = ‘<state value=”Illinois”>Chicago</state>
-> <state value=”New Jersey”>Ocean City</state>
-> <state value=”Maryland”>Baltimore</state>’;
Query OK, 0 rows affected (0.21 sec)

mysql> SET @state_index_first = 1, @state_index_second = 2;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @state_index_first, ExtractValue(@xml_cities,
-> ‘//state[[email protected]_index_first]’) AS ‘First City’;
+——————–+————+
| @state_index_first | First City |
+——————–+————+
| 1 | Chicago |
+——————–+————+
1 row in set (0.00 sec)

mysql> SELECT @state_index_second, ExtractValue(@xml_cities,
-> ‘//state[[email protected]_index_second]’) AS ‘Second City’;
+———————+————-+
| @state_index_second | Second City |
+———————+————-+
| 2 | Ocean City |
+———————+————-+
1 row in set (0.00 sec)

mysql> SELECT @state_index_third, ExtractValue(@xml_cities,
-> ‘//state[[email protected]_index_third]’) AS ‘Third City’;
+——————–+————+
| @state_index_third | Third City |
+——————–+————+
| NULL | |
+——————–+————+
1 row in set (0.00 sec)

Strongly Checked Variables

For the reason outlined above, it’s best to use strongly checked
variables. This shouldn’t come as a great surprise, as strong variable
checking has long been standard for computer programming langugages. To use the
$variable_name syntax,
you have to declare the variable inside a stored procedure. Such a variable is
local to the stored procedure in which it is defined, and it is checked for
both type and value.

The following stored proc accepts an xml_data string such as
the one that we created earlier and adds the results of ExtractValue() to a
temporary table. We can then SELECT all the cities at once:


DELIMITER |
CREATE PROCEDURE get_cities(IN ‘xml_data’ varchar(255))
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE city_name VARCHAR(100);
CREATE TEMPORARY TABLE cities (
match_number INT UNSIGNED NOT NULL DEFAULT 0,
name VARCHAR(100) NULL );
SET city_name = ExtractValue(xml_data, ‘//state[$i]’);
WHILE city_name != “” DO
INSERT INTO cities
SELECT i, city_name;
SET i = i+1;
SET city_name = ExtractValue(xml_data, ‘//state[$i]’);
END WHILE;
SELECT * FROM cities;
DROP TABLE cities;
END |
DELIMITER ;

CALL get_cities(@xml_cities);

+——————–+————+
| match_number | name |
+——————–+————+
| 1 | Chicago |
+——————–+————+
| 2 | Ocean City |
+——————–+————+
1 row in set (0.01 sec)

That
wraps up our look at XML support in MySQL. While it is somewhat limited at
this point, every new version is brining in more XML-related functionality. There
are also many third party vendors providing tools to help fill in the gaps. In
the future, we can hope to see support for newer, less verbose, data formatting
markups, such as JSON.

»


See All Articles by Columnist
Rob Gravelle

Robert Gravelle
Robert Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Latest Articles