XQuery and SQL/XML standard are processors for XML. SQL/XML
was designed to try to match the capabilities of XQuery as closely as possible
and XQuery was designed not only to support XML, but also to support relational
processing. Read on to learn why this may have a negative influence on their
capabilities.
XQuery and SQL/XML standard are processors for XML. The
design and development of these two XML processors also influence the
capabilities that each other have. SQL/XML was designed to try to match the
capabilities of XQuery as closely as possible and XQuery was designed not only
to support XML, but also to support relational processing. Therefore, these two
products do affect each other, which may negatively influence their
capabilities by retarding their natural and separate growth pattern and
directions. Some of my top ten items below are influenced by this situation and
contain topics I will cover further in future articles here at Database Journal.
1) Previous Hierarchical Data Processing History Forgotten or Ignored
Looking at XML hierarchical processing today, no one would
ever suspect that four decades ago there was hierarchical processing technology
in commercial operation that far exceeds hierarchical structured data
processing today. This technology involved commonplace full hierarchical query
processing capabilities that have been forgotten today.
The history of database technology can explain this
technology oversight. Four decades ago, hierarchical databases were popular.
After relational databases were introduced, hierarchical databases with their
fixed data dependence were eventually replaced by relational databases with
their flexible structures and data independence. Now with hierarchical
structures popular again with XML, the advantages of full hierarchical
structures and navigationless access have been forgotten and little has been preserved
in documentation because it was a pre-internet technology. Today’s relational
joining has been naturally mapped into hierarchical processing as single path
processing, ignoring lessons learned about full hierarchical processing used
successfully in the past. If this technology had still been generally available
today, it may have more positively affected how far SQL/XML processing has
progressed today.
2) SQL/XML Vendor Solutions are Proprietary and Incompatible
Because no previous definitive relational and XML data
integration solution has been discovered, all of the SQL/XML vendors have each developed
and branded their own solution to the integration of XML in SQL. One common
approach they share is the lowest common denominator approach of flattening XML
hierarchical data to integrate with flat relational data. This method can
introduce data loss and inaccuracy in XML results. The real problem I see is
that their proprietary solutions have made them incompatible with each other.
This is good for the SQL vendor, but bad for the customer because it locks them
into a specific SQL vendor. These problems have prevented the SQL/XML industry
from reaching its full potential.
3) XQuery Does Not Automatically Support Hierarchical Processing
Why doesn’t XQuery support hierarchical processing
automatically or at least better? After all, XQuery was created to support XML
and XML is hierarchical. This is because a political decision was made early in
XQuery design to help migrate SQL users over to XQuery. It was felt by those
influential in both the SQL and XQuery product design fields that SQL was past its
prime. They decided that SQL users needed to start moving on to a better
replacement for future growth and XQuery could satisfy that objective. This was
accomplished by having XQuery support relational processing as well as
hierarchical processing thus supplying a migration path from SQL to XQuery.
This is why the inner join relational operation is the default join of XQuery.
Having XQuery support hierarchical and relation processing
actually weakens XQuery. The inner join operation is incompatible with
hierarchical structures and their processing because it destroys hierarchical
structures and prevents full hierarchical processing from being automatic and
fully supported. This is the tradeoff for having a migration path from SQL to
XQuery. In hindsight, XQuery should have only supported hierarchical processing
which would have left the door open to significantly increase its hierarchical
processing capabilities.
4) SQL Supports Full Hierarchical Processing, But it is Not Being Used in SQL/XML
Full hierarchical processing is possible in SQL-92 by using
the LEFT outer join to model and define the operation of hierarchical structures.
It is very curious why it has not been used in SQL/XML standardized
integration. One problem preventing this is that the SQL/XML standard would
support hierarchical processing better than XQuery could, which is not good for
XQuery’s reputation as the leading XML processor. Another problem is that the
SQL vendors already have their XML proprietary solutions and this new improved solution
while natural and complete is too disruptive and costly to retrofit until
customer’s start demanding a better and seamless solution.
5) SQL/XML Uses Navigation and XML Centric Functions Making it Not User Friendly
XQuery requires user navigation to processes complex XML
semistructured documents. Because XQuery and the SQL/XML standard are so close
in operation, XML integration was added to SQL by also using user navigation
and XML centric functions. There are a number of problems with this approach.
SQL is known as a navigationless language, this goes directly against SQL’s
normal navigationless processing. In addition to this, XML centric functions
were introduced into the simple elegant SQL syntax and semantics. This also
means that besides navigation, the SQL user needs to know XML and even the SQL
vendor’s particular proprietary solution. The coding of these additions
requires a high level of XML technical knowledge usage. With these additions
and problems, SQL is no longer a simple friendly language when used for XML
integration.
6) There is No XML Standard for Hierarchical Processing
There is no W3 or ANSI standard
or even a best practices document on how to specify correct hierarchical
processing. XQuery and the SQL/XML standard rely on the user to specify the
hierarchical processing logic to get a valid meaningful result. Hierarchical
structures and their processing have very precise principles which need to be
followed for XML results to be correct.
There is no control in XQuery
and the SQL/XML standard to make sure that the user specified processing
follows hierarchical principles to assure the correct result. XQuery and
SQL/XML are not aware of the structure being processed, which would help assist
with correct hierarchical processing. This structure-aware processing is not
possible because the structure being processed may not even be hierarchical
even though the result output is usually structured XML. This is worrisome too.
7) XQuery and SQL/XML Processors Try to Solve Both Sides of XML Processing
Structured XML processing and development is being avoided
today because of its required XML navigation and its procedural coding. This is
part of a larger problem today because XML processor product designers have not
realized that there are two different problems that need solving differently.
XML was invented to support semistructured data and not structured data.
Semistructured data can accommodate natural unstructured data by tagging it.
This can produce an ambiguous hierarchical structure with duplicate node names,
which requires fuzzy hierarchical processing requiring user navigation.
After XML was designed for semistructured data processing,
it was applied to fixed structured data database processing such as used in SQL.
These fixed XML hierarchical structures are unambiguous requiring exact
results. These unambiguous XML hierarchical structures can and should be
nonprocedurally and navigationlessly processed. The problem today is that XML
vendor solutions do not differentiate between these two types of XML processing
and default to fuzzy semistructured processing which is not appropriate for the
exact processing of structured XML data processing. It would make good sense to
have SQL’s XML processing take on this natural structured XML hierarchical
processing while having XQuery specialize in semistructured processing.
8) XML Processing Today is Limited to Linear Processing
I do not understand why the XML
data processing industry and vendors have not taken full advantage of the
powerful additional information potential in XML’s full hierarchical structure.
Today, reuse of hierarchical structures is at a linear level by sharing a
portion of an existing path to create another path. This linear use is limited,
caused by a relational join mindset still stuck in a linear mode. But, the
significantly more powerful nonlinear use is from creating more powerful multipath
queries derived from the unlimited number of different combinations of paths in
a structure. This allows any possible query involving multiple paths to be
specified within the structure, significantly increasing the number of queries
possible.
An even more powerful advantage of
nonlinear hierarchical structures is in the powerful multipath semantic
processing of the structure. This automatically utilizes the current unutilized
hierarchical semantics naturally existing between all the pathways used in queries.
For example, selecting data from one pathway based on data values from a
different pathway. Each different multipath query has its own unique semantics
based on the naturally occurring semantics used to solve the query. This
significantly and dynamically increases the data value of all the data in the
structure and with the navigationless access described previously, it can be
queried without nontechnical users having to know the data structure. This user
friendly full multipath processing is possible today or when it is finally
recognized.
9) XQuery Does Not Support SQL’s Powerful SELECT Operator
Every XQuery book I have looked
at always mentions how XQuery very closely parallels SQL processing and contains
SQL’s SELECT, FROM and WHERE operations so that SQL users should feel
comfortable with XQuery. The WHERE clause in SQL is basically identical to its
use in XQuery and the FROM clause in SQL is similar to the input capability
used in the XQuery LET and FOR operations. But the powerful SQL SELECT
operation is missing in XQuery.
The SELECT list in SQL
specifies which data items in the query are to be output. There is no one
isolated specification of output data items in XQuery. In XQuery, each output
item has to be specifically controlled in the processing logic. This processing
logic usually consists of multiple nesting of hierarchical looping logic, so
the placement of each data item that will be output has to be performed
carefully and possibly separately for each data item. So adding or removing an
output item in XQuery takes manual programming and testing. In some cases
adding an output data item will require programming an additional hierarchical
nested loop level.
With SQL, adding or
removing an output item is just the simple process of removing or adding an
output item in the SELECT list which can be done dynamically at view invocation
and requires no additional procedural program logic or knowledge of the
structure. With SQL’s automatic and true nonprocedural processing, it performs
only the processing necessary to produce the result for the data items that are
specified in the SELECT list. XQuery has no automatic equivalent or similar general
operation that can dynamically accommodate the simple ad hoc adding and
removing of output data items that also automatically drives the different required
processing from query to query.
10) XQuery is Not a Query Language and is Not Declarative
A short time ago, I wrote a paper entitled XPath Navigation Holding
Back the SQL/XML Database Industry. I was criticized by XML product
developers for claiming XPath was procedural. These XML professionals believe that
XPath navigation is nonprocedural and declarative even though the navigation
logic has to be correctly specified in the procedural navigational order of the
elements accessed. It was explained to me that XPath could internally change
the processing order, which made it nonprocedural. To me, this is optimization
and does not change procedural processing into nonprocedural processing.
The XQuery developers also say their product is
nonprocedural and declarative. I do not know how this is possible with XQuery,
which requires the user to code complex hierarchical processing looping logic. If
the user has to specify the looping logic, this brings up the question: Where
is the query in XQuery? Database queries are not expected to specify the
processing logic, they are nonprocedural and declarative, requesting what data
is desired. Again, it seems that if a product can optimize its specified logic
automatically, it is considered nonprocedural and declarative by product
developers.
My point of view is from the user’s point of view. If the
user has to specify the procedural logic involved in communicating the desired
effect and has to know the structure in order to specify the elements in
correct database order, it certainly does not seem nonprocedural to me. I believe
that most users would feel the same way. I guess what it comes down to, is
whether your point of view is from the user or the XML product developers. What
worries me is if XML product developers believe their products are truly nonprocedural
and declarative today, will they ever consider making their products more
automatic and truly nonprocedural for nontechnical users or are they content in
continually using user procedural navigation when it can be fully automated?
One aspect of programming not generally realized is that more
complex problems can be solved by automatic nonprocedural processing than for procedural
code. For example, multipath processing is too complex and time consuming to be
performed procedurally.
Conclusion
It is always easy to criticize a product and say it could be
better in this way or that way, suggest what ifs, and describe past events as
if they had been there as I have done in this article. Except I have been
there. I have been designing and implementing database query products from the
hierarchical database era through the relational database era and now I am working
with hierarchical structures again using XML. So I have written in this article
what I know first hand; and these were the problems I see with the database XML
industry. The solutions I describe are working and can be tested online and
interactively at www.adatinc.com/demo.html.
With the advent of relational databases, it was almost an
immediate success because it is mathematically accurate and having a single
relational SQL language. When Object Oriented processing and databases were
introduced, the technology was good, but not focused on a specific language.
This allowed software vendors to go off in different implementation directions
for their products and they faded into the background with object technology.
This is where I see the database XML industry today as I indicated in this article. I also
indicated how I thought XML processing could be improved in a way that was more
natural and utilized the inherent capabilities and principles in hierarchical
processing as SQL did for relational processing.
I also see that semistructured fuzzy processing can be
processed automatically utilizing nonprocedural navigationless processing by
stepping back and looking at the processing of the entire query.
Additional Resources
OTN – Getting into SQL/XML
MSDN – XML Best Practices for Microsoft SQL Server 2005
SQL Summit
Navigationless XML