Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MySQL

Posted Nov 23, 2009

XPath Support for Working with XML Data in MySQL

By Rob Gravelle

During the past few weeks, we’ve been getting better acquainted with XML support in MySQL. In the last article, we expanded the discussion to include a third-party tool called MySqlDump. We used it to produce an XML document that we queried using a stored procedure. Today, we’ll be finishing off the series by talking about how to manipulate XML data using the XPath, which is a language for navigating and querying XML documents.

The UpdateXML() Function

While we’ve spent considerable time going over the ExtractValue() function, we haven’t yet taken a good look at MySQL’s other XML function, UpdateXML(). That’s because we were focusing on importing data from an XML document into a MySQL table. UpdateXML() is a slightly more sophisticated function that is used to match and replace a portion of an XML fragment with different XML markup. You can see how such a function would be usefull for massaging the data without converting back and forth between XML and table structures.

ExtractValue() takes two string arguments, a fragment of XML markup and an XPath expression:

ExtractValue(xml_frag, xpath_expr)

It returns the first text node which is a child of the element(s) matched by the XPath expression.

Here’s an example:

Say you wanted to change “<state><city/></state> to “<state><county><city/></county></state>“ and save the result into a variable. Here’s how you would do it using the UpdateXML() function:

mysql> SELECT @new_xml_node:=UpdateXML
		('<state><city/></state>', 
    ->    '//city', 
    ->    '<county><city/></county>')
    -> AS xml_node;
+-----------------------------------------+
| xml_node                                |
+-----------------------------------------+
| <state><county><city/></county></state> |
+-----------------------------------------+
1 row in set (0.03 sec)

mysql> SELECT @new_xml_node;
+-----------------------------------------+
| @new_xml_node                           |
+-----------------------------------------+
| <state><county><city/></county></state> |
+-----------------------------------------+
1 row in set (0.00 sec)

If no matching text node is found for the expression, the original XML string is returned:

mysql> SELECT @new_xml_node:=UpdateXML
		('<state><city/></state>', 
    ->    '//dummy', 
    ->    '<county><city/></county>')
    -> AS xml_node;
+---------------------------+
| xml_node                  |
+---------------------------+
| <state><city/></state>    |
+---------------------------+
1 row in set (0.03 sec)

In the event that multiple matches are found, the content of the first child text node of each matching element is returned in the order matched:

mysql> SELECT @new_xml_node:=UpdateXML('<state><city/></state><state><city/></state><state><city/></state>', 
    ->    '//city, 
    ->    '<county><city/></county>')
    -> AS xml_node;

+--------------------------------------------------------------------+
| xml_node                                                           |
+--------------------------------------------------------------------+
| <state><city/></state><state><city/></state><state><city/></state> |
+--------------------------------------------------------------------+
1 row in set (0.00 sec)

Because the UpdateXML() function makes no distinction between a match on an empty element and no match at all, there needs to be a way of differenciating between the two situations where either no matching element was found in the xml_frag or such an element was found but contained no child text nodes. The way to do it, is to test the result of ExtractValue() using the XPath count() function:

mysql> SELECT ExtractValue('<state><city/></state><state><city/></state><state><city/></state>',
    ->    'count(//city)') AS xml_node;
+----------+
| xml_node |
+----------+
| 3        |
+----------+
1 row in set (0.00 sec)

mysql> SELECT ExtractValue('<state><city/></state><state><city/></state><state><city/></state>',
    ->    'count(//county)') AS xml_node;
+----------+
| xml_node |
+----------+
| 0        |
+----------+
1 row in set (0.00 sec)

Error handling

For both ExtractValue() and UpdateXML(), the XPath locator used must be valid and the XML to be searched must be well-formed. If the locator is invalid, an error is generated:

mysql> SELECT @new_xml_node:=UpdateXML('<state><city/></state>',
    ->    '//city/"state', '<county><city/></county>') AS xml_node;
ERROR 1105 (HY000): XPATH syntax error: '"state'

If the XML to be searched is not well-formed, then NULL is returned, and a warning is issued:

mysql> SELECT @new_xml_node:=UpdateXML('<state><city></state>', '//city',
    ->    '<county><city/></county>') AS xml_node;
+----------+
| xml_node |
+----------+
| NULL     |
+----------+
1 row in set, 1 warning (0.01 sec)

You can display the warning by issuing the show warnings command:

mysql> show warnings;
+---------+------+-------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                             |
+---------+------+-------------------------------------------------------------------------------------+
| Warning | 1525 | Incorrect XML value: 'parse error at line 1 pos 21: '</state>' unexpected ('</city>' wanted)' |
+---------+------+-------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

The replacement XML passed as the third argument to UpdateXML() is not checked for well-formedness at all:

mysql> SELECT @new_xml_node:=UpdateXML('<state><city/></state>', '//city',
    ->    '<<county><city/></county>>') AS xml_node;
+-------------------------------------------+
| xml_node                                  |
+-------------------------------------------+
| <state><<county><city/></county>></state> |
+-------------------------------------------+
1 row in set (0.00 sec)

Using Variables in XPath Expressions

As of MySQL 5.1.20, you can supply the XPath locator argument in variables. This gives you the flexability to pass them to stored functions and procs. Variables can be weakly or strongly checked, depending on the syntax used.

Weakly Checked Variables

A variable using the syntax $@variable_name is not checked for type or for whether or not it has previously been assigned a value. Hence, MySQL won’t report any warnings or errors if a variable has the wrong type or is undefined. For example, if you misspell a variable as $@conty instead of $@county, and the misspelled variable has not been assigned a value, MySQL will assign a value of “none”, or NULL.

The following example should help clarify. In it, we create a variable called @xml_cities. It contains three states with a city in the text node. Next, we create two variables to store some XPath information. The first SELECT query shows the value of the @state_index_first variable along with the ExtractValue() function. Since the @state_index_first variable contains a value of 1, the //state[@state_index_first] expression retrieves the first city found: “Chicago”.

The second SELECT query retrieves the value of the @state_index_second variable, along with the results of the ExtractValue(). It works as expected and returns "Ocean City".

The final SELECT statement is flawed because there is no @state_index_third variable. Although the MySQL client doesn’t complain at all and the ExtractValue() function runs, no value is returned. This could lead us to believe that there are no more cities, when there clearly is!

mysql> SET @xml_cities = '<state value="Illinois">Chicago</state>
    ->    <state value="New Jersey">Ocean City</state>
    ->    <state value="Maryland">Baltimore</state>';
Query OK, 0 rows affected (0.21 sec)

mysql> SET @state_index_first = 1, @state_index_second = 2;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @state_index_first, ExtractValue(@xml_cities,
    ->    '//state[$@state_index_first]') AS 'First City';
+--------------------+------------+
| @state_index_first | First City |
+--------------------+------------+
|                  1 | Chicago    |
+--------------------+------------+
1 row in set (0.00 sec)

mysql> SELECT @state_index_second, ExtractValue(@xml_cities,
    ->    '//state[$@state_index_second]') AS 'Second City';
+---------------------+-------------+
| @state_index_second | Second City |
+---------------------+-------------+
|                   2 | Ocean City  |
+---------------------+-------------+
1 row in set (0.00 sec)

mysql> SELECT @state_index_third, ExtractValue(@xml_cities,
    ->    '//state[$@state_index_third]') AS 'Third City';
+--------------------+------------+
| @state_index_third | Third City |
+--------------------+------------+
| NULL               |            |
+--------------------+------------+
1 row in set (0.00 sec)

Strongly Checked Variables

For the reason outlined above, it’s best to use strongly checked variables. This shouldn’t come as a great surprise, as strong variable checking has long been standard for computer programming langugages. To use the $variable_name syntax, you have to declare the variable inside a stored procedure. Such a variable is local to the stored procedure in which it is defined, and it is checked for both type and value.

The following stored proc accepts an xml_data string such as the one that we created earlier and adds the results of ExtractValue() to a temporary table. We can then SELECT all the cities at once:

DELIMITER |
CREATE PROCEDURE get_cities(IN 'xml_data' varchar(255)) 
BEGIN   
    DECLARE i INT DEFAULT 1;   
    DECLARE city_name VARCHAR(100);      
    CREATE TEMPORARY TABLE cities (    
        match_number INT UNSIGNED NOT NULL DEFAULT 0,    
        name VARCHAR(100) NULL  );   
    SET city_name = ExtractValue(xml_data, '//state[$i]');     
    WHILE city_name != "" DO     
        INSERT INTO cities       
        	SELECT i, city_name;     
        SET i = i+1;     
        SET city_name = ExtractValue(xml_data, '//state[$i]');   
    END WHILE;     
    SELECT * FROM cities;   
    DROP TABLE cities;
END |
DELIMITER ;

CALL get_cities(@xml_cities);

+--------------------+------------+
| match_number       | name       |
+--------------------+------------+
| 1                  | Chicago    |
+--------------------+------------+
| 2                  | Ocean City |
+--------------------+------------+
1 row in set (0.01 sec)

That wraps up our look at XML support in MySQL. While it is somewhat limited at this point, every new version is brining in more XML-related functionality. There are also many third party vendors providing tools to help fill in the gaps. In the future, we can hope to see support for newer, less verbose, data formatting markups, such as JSON.

» See All Articles by Columnist Rob Gravelle



MySQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MySQL Forum
Topic By Replies Updated
MySQL in high availability and transction secure banking application klamor 1 August 28th, 10:24 AM
MySQL rollback UAL225 1 August 28th, 10:15 AM
Browsing a DB file that uses MySql finleytech 1 July 26th, 10:51 AM
php cookie won't pass variable to next page ITdevGirl 0 June 22nd, 12:13 PM