Ten Problems with XQuery and the SQL/XML Standard
February 17, 2010
XQuery and SQL/XML standard are processors for XML. SQL/XML was designed to try to match the capabilities of XQuery as closely as possible and XQuery was designed not only to support XML, but also to support relational processing. Read on to learn why this may have a negative influence on their capabilities.
XQuery and SQL/XML standard are processors for XML. The design and development of these two XML processors also influence the capabilities that each other have. SQL/XML was designed to try to match the capabilities of XQuery as closely as possible and XQuery was designed not only to support XML, but also to support relational processing. Therefore, these two products do affect each other, which may negatively influence their capabilities by retarding their natural and separate growth pattern and directions. Some of my top ten items below are influenced by this situation and contain topics I will cover further in future articles here at Database Journal.
1) Previous Hierarchical Data Processing History Forgotten or Ignored
Looking at XML hierarchical processing today, no one would ever suspect that four decades ago there was hierarchical processing technology in commercial operation that far exceeds hierarchical structured data processing today. This technology involved commonplace full hierarchical query processing capabilities that have been forgotten today.
The history of database technology can explain this technology oversight. Four decades ago, hierarchical databases were popular. After relational databases were introduced, hierarchical databases with their fixed data dependence were eventually replaced by relational databases with their flexible structures and data independence. Now with hierarchical structures popular again with XML, the advantages of full hierarchical structures and navigationless access have been forgotten and little has been preserved in documentation because it was a pre-internet technology. Today's relational joining has been naturally mapped into hierarchical processing as single path processing, ignoring lessons learned about full hierarchical processing used successfully in the past. If this technology had still been generally available today, it may have more positively affected how far SQL/XML processing has progressed today.
2) SQL/XML Vendor Solutions are Proprietary and Incompatible
Because no previous definitive relational and XML data integration solution has been discovered, all of the SQL/XML vendors have each developed and branded their own solution to the integration of XML in SQL. One common approach they share is the lowest common denominator approach of flattening XML hierarchical data to integrate with flat relational data. This method can introduce data loss and inaccuracy in XML results. The real problem I see is that their proprietary solutions have made them incompatible with each other. This is good for the SQL vendor, but bad for the customer because it locks them into a specific SQL vendor. These problems have prevented the SQL/XML industry from reaching its full potential.
3) XQuery Does Not Automatically Support Hierarchical Processing
Why doesn't XQuery support hierarchical processing automatically or at least better? After all, XQuery was created to support XML and XML is hierarchical. This is because a political decision was made early in XQuery design to help migrate SQL users over to XQuery. It was felt by those influential in both the SQL and XQuery product design fields that SQL was past its prime. They decided that SQL users needed to start moving on to a better replacement for future growth and XQuery could satisfy that objective. This was accomplished by having XQuery support relational processing as well as hierarchical processing thus supplying a migration path from SQL to XQuery. This is why the inner join relational operation is the default join of XQuery.
Having XQuery support hierarchical and relation processing actually weakens XQuery. The inner join operation is incompatible with hierarchical structures and their processing because it destroys hierarchical structures and prevents full hierarchical processing from being automatic and fully supported. This is the tradeoff for having a migration path from SQL to XQuery. In hindsight, XQuery should have only supported hierarchical processing which would have left the door open to significantly increase its hierarchical processing capabilities.
4) SQL Supports Full Hierarchical Processing, But it is Not Being Used in SQL/XML
Full hierarchical processing is possible in SQL-92 by using the LEFT outer join to model and define the operation of hierarchical structures. It is very curious why it has not been used in SQL/XML standardized integration. One problem preventing this is that the SQL/XML standard would support hierarchical processing better than XQuery could, which is not good for XQuery's reputation as the leading XML processor. Another problem is that the SQL vendors already have their XML proprietary solutions and this new improved solution while natural and complete is too disruptive and costly to retrofit until customer's start demanding a better and seamless solution.
5) SQL/XML Uses Navigation and XML Centric Functions Making it Not User Friendly
XQuery requires user navigation to processes complex XML semistructured documents. Because XQuery and the SQL/XML standard are so close in operation, XML integration was added to SQL by also using user navigation and XML centric functions. There are a number of problems with this approach. SQL is known as a navigationless language, this goes directly against SQL's normal navigationless processing. In addition to this, XML centric functions were introduced into the simple elegant SQL syntax and semantics. This also means that besides navigation, the SQL user needs to know XML and even the SQL vendor's particular proprietary solution. The coding of these additions requires a high level of XML technical knowledge usage. With these additions and problems, SQL is no longer a simple friendly language when used for XML integration.
6) There is No XML Standard for Hierarchical Processing
There is no W3 or ANSI standard or even a best practices document on how to specify correct hierarchical processing. XQuery and the SQL/XML standard rely on the user to specify the hierarchical processing logic to get a valid meaningful result. Hierarchical structures and their processing have very precise principles which need to be followed for XML results to be correct.
There is no control in XQuery and the SQL/XML standard to make sure that the user specified processing follows hierarchical principles to assure the correct result. XQuery and SQL/XML are not aware of the structure being processed, which would help assist with correct hierarchical processing. This structure-aware processing is not possible because the structure being processed may not even be hierarchical even though the result output is usually structured XML. This is worrisome too.
7) XQuery and SQL/XML Processors Try to Solve Both Sides of XML Processing
Structured XML processing and development is being avoided today because of its required XML navigation and its procedural coding. This is part of a larger problem today because XML processor product designers have not realized that there are two different problems that need solving differently. XML was invented to support semistructured data and not structured data. Semistructured data can accommodate natural unstructured data by tagging it. This can produce an ambiguous hierarchical structure with duplicate node names, which requires fuzzy hierarchical processing requiring user navigation.
After XML was designed for semistructured data processing, it was applied to fixed structured data database processing such as used in SQL. These fixed XML hierarchical structures are unambiguous requiring exact results. These unambiguous XML hierarchical structures can and should be nonprocedurally and navigationlessly processed. The problem today is that XML vendor solutions do not differentiate between these two types of XML processing and default to fuzzy semistructured processing which is not appropriate for the exact processing of structured XML data processing. It would make good sense to have SQL's XML processing take on this natural structured XML hierarchical processing while having XQuery specialize in semistructured processing.
8) XML Processing Today is Limited to Linear Processing
I do not understand why the XML data processing industry and vendors have not taken full advantage of the powerful additional information potential in XMLs full hierarchical structure. Today, reuse of hierarchical structures is at a linear level by sharing a portion of an existing path to create another path. This linear use is limited, caused by a relational join mindset still stuck in a linear mode. But, the significantly more powerful nonlinear use is from creating more powerful multipath queries derived from the unlimited number of different combinations of paths in a structure. This allows any possible query involving multiple paths to be specified within the structure, significantly increasing the number of queries possible.
An even more powerful advantage of nonlinear hierarchical structures is in the powerful multipath semantic processing of the structure. This automatically utilizes the current unutilized hierarchical semantics naturally existing between all the pathways used in queries. For example, selecting data from one pathway based on data values from a different pathway. Each different multipath query has its own unique semantics based on the naturally occurring semantics used to solve the query. This significantly and dynamically increases the data value of all the data in the structure and with the navigationless access described previously, it can be queried without nontechnical users having to know the data structure. This user friendly full multipath processing is possible today or when it is finally recognized.
9) XQuery Does Not Support SQL's Powerful SELECT Operator
Every XQuery book I have looked at always mentions how XQuery very closely parallels SQL processing and contains SQL's SELECT, FROM and WHERE operations so that SQL users should feel comfortable with XQuery. The WHERE clause in SQL is basically identical to its use in XQuery and the FROM clause in SQL is similar to the input capability used in the XQuery LET and FOR operations. But the powerful SQL SELECT operation is missing in XQuery.
The SELECT list in SQL specifies which data items in the query are to be output. There is no one isolated specification of output data items in XQuery. In XQuery, each output item has to be specifically controlled in the processing logic. This processing logic usually consists of multiple nesting of hierarchical looping logic, so the placement of each data item that will be output has to be performed carefully and possibly separately for each data item. So adding or removing an output item in XQuery takes manual programming and testing. In some cases adding an output data item will require programming an additional hierarchical nested loop level.
With SQL, adding or removing an output item is just the simple process of removing or adding an output item in the SELECT list which can be done dynamically at view invocation and requires no additional procedural program logic or knowledge of the structure. With SQLs automatic and true nonprocedural processing, it performs only the processing necessary to produce the result for the data items that are specified in the SELECT list. XQuery has no automatic equivalent or similar general operation that can dynamically accommodate the simple ad hoc adding and removing of output data items that also automatically drives the different required processing from query to query.
10) XQuery is Not a Query Language and is Not Declarative
A short time ago, I wrote a paper entitled XPath Navigation Holding Back the SQL/XML Database Industry. I was criticized by XML product developers for claiming XPath was procedural. These XML professionals believe that XPath navigation is nonprocedural and declarative even though the navigation logic has to be correctly specified in the procedural navigational order of the elements accessed. It was explained to me that XPath could internally change the processing order, which made it nonprocedural. To me, this is optimization and does not change procedural processing into nonprocedural processing.
The XQuery developers also say their product is nonprocedural and declarative. I do not know how this is possible with XQuery, which requires the user to code complex hierarchical processing looping logic. If the user has to specify the looping logic, this brings up the question: Where is the query in XQuery? Database queries are not expected to specify the processing logic, they are nonprocedural and declarative, requesting what data is desired. Again, it seems that if a product can optimize its specified logic automatically, it is considered nonprocedural and declarative by product developers.
My point of view is from the users point of view. If the user has to specify the procedural logic involved in communicating the desired effect and has to know the structure in order to specify the elements in correct database order, it certainly does not seem nonprocedural to me. I believe that most users would feel the same way. I guess what it comes down to, is whether your point of view is from the user or the XML product developers. What worries me is if XML product developers believe their products are truly nonprocedural and declarative today, will they ever consider making their products more automatic and truly nonprocedural for nontechnical users or are they content in continually using user procedural navigation when it can be fully automated?
One aspect of programming not generally realized is that more complex problems can be solved by automatic nonprocedural processing than for procedural code. For example, multipath processing is too complex and time consuming to be performed procedurally.
It is always easy to criticize a product and say it could be better in this way or that way, suggest what ifs, and describe past events as if they had been there as I have done in this article. Except I have been there. I have been designing and implementing database query products from the hierarchical database era through the relational database era and now I am working with hierarchical structures again using XML. So I have written in this article what I know first hand; and these were the problems I see with the database XML industry. The solutions I describe are working and can be tested online and interactively at www.adatinc.com/demo.html.
With the advent of relational databases, it was almost an immediate success because it is mathematically accurate and having a single relational SQL language. When Object Oriented processing and databases were introduced, the technology was good, but not focused on a specific language. This allowed software vendors to go off in different implementation directions for their products and they faded into the background with object technology. This is where I see the database XML industry today as I indicated in this article. I also indicated how I thought XML processing could be improved in a way that was more natural and utilized the inherent capabilities and principles in hierarchical processing as SQL did for relational processing.
I also see that semistructured fuzzy processing can be processed automatically utilizing nonprocedural navigationless processing by stepping back and looking at the processing of the entire query.