Michael M. David reviews the top 10 new capabilities presented by ANSI SQL Transparent XML Hierarchical Processing, including Inherent hierarchical processing in SQL, Multipath Lowest Common Ancestor (LCA) processing occurring naturally in SQL, New hierarchical data modeling capabilities, and the solving of how these new capabilities work naturally, which lead to how they can be controlled.
I have been writing on the topic of ANSI SQL Transparent XML
Hierarchical Processing for Database Journal for almost a year now. This
topic has presented a number of new and powerful capabilities that I will
review in this Top 10 article. This includes a number of discoveries in SQL
operation that were recently made that support new capabilities described in my
articles and recapped in this article. These SQL discoveries include Inherent
hierarchical processing in SQL, Multipath Lowest Common Ancestor (LCA)
processing occurring naturally in SQL, New hierarchical data modeling
capabilities, and the solving of how these new capabilities work naturally,
which lead to how they can be controlled. The order of the capabilities made
possible by these SQL discoveries are described in this article in a building
block order because the capabilities build upon each other.
1: SQL Transparent Hierarchical Processing
ANSI SQL Transparent Hierarchical Processing is naturally supported in
SQL-92 by the SQL LEFT Outer Join operation. The LEFT Outer Join syntax
performs full multipath data modeling and its associated semantics supports the
hierarchical processing when processed directly by the SQL hierarchical
processor. What makes this work reliably is the new Outer Join ON clause used
at each join point to replace the WHERE clause single join usage insuring the
SQL request is unambiguous. This natural hierarchical processing finally solves
the relational/hierarchical integration problem with no data loss while
preserving and utilizing the naturally existing hierarchical semantics
automatically. This is used to seamlessly support SQL: hierarchical data
filtering; navigationless processing; and automatic structured XML output. This
is described in more detail in the article ANSI SQL Hierarchical Data Processing Basics.
2: Inherent LCA Multipath Processing
The initial computerized database technology four decades ago operated on
hierarchical data structures and had query languages that supported full
multipath navigationless queries. Multipath queries dynamically increase the
customer’s data value. This multipath hierarchical processing automatically
utilizes the natural hierarchical semantics across pathways. This dynamically
increases the data value by utilizing the additional semantic data and number
of pathways used in deriving the multipath hierarchical result.
There is one more piece of logic that is required for multipath hierarchical
processing. It introduced a richer level of semantics and meaning that must be
taken into consideration to keep the query meaningful for processing multipath
hierarchical structures. This logic uses Lowest Common Ancestor (LCA) logic to
keep the hierarchical processing meaningful across pathways. During processing,
whenever two points on different pathways in the structure require being
associated in anyway, the LCA determines the shortest connection between the
two points to keep the hierarchical result meaningful. This is a form of
hierarchical proximity. This multipath LCA processing occurs naturally in SQL
but is not performed by XQuery. This capability is described in more detail in
the article The Power Behind SQL’s Inherent Multipath LCA
Hierarchical Processing.
3: Structure-Aware Processing
A number of new and advanced capabilities are made possible for SQL
hierarchical processing by the capability to automatically and dynamically determine
the hierarchical structure being processed. This is known as structure-aware
processing. This is possible through interpreting the semantics in the
hierarchical data modeling LEFT Outer Join string driving the query. The first
and primary capability possible from this structure-aware processing that is
required for other advanced capabilities is hierarchical optimization. Also
tightly involved with these hierarchical optimization capabilities is SQL’s
powerful variable SELECT list.
3A: Hierarchical Optimization
SQL hierarchical optimization is controlled by SQL’s variable SELECT list.
It dynamically specifies the output result. This enables the output result to
change from use to use for the same view. The output infers the required processing,
which dynamically controls the hierarchical optimization. This optimization
determines which nodes do not require access and rewrites the expanded LEFT
Outer Join string eliminating unnecessary LEFT Outer Join entries. This
automatically tailors the associated processing required to produce the desired
result. This is easy for SQL to support because it is an interpretive language
driven by the optimized LEFT Outer Join string. XQuery does not support a
variable SELECT list type of operation so it does not automatically support
this dynamic data access and processing optimization.
3B: Navigationless Access
Having structure knowledge of the hierarchical structure being accessed and
the fact that SQL structured data and not semistructured data is being
accessed, enables the database access to be performed automatically and
navigationlessly. Structured and semistructured data require different
processing. A choice has to be made between supporting accurate, unambiguous,
and navigationless SQL structured processing or semistructured fuzzy processing
that requires navigational processing. This navigationless access is also known
as schema-free access. Most if not all XML and SQL/XML processors perform
semistructured navigational processing. Our solution only processes SQL
structured data in order to produce correct hierarchical structured results
consistently and automatically.
3C: Global Views
The hierarchical optimization applied automatically to views enables any
view to be a global view. Global views can contain very large structures with
no overhead. This means that only the current required data in the view is
accessed and no additional overhead occurs because of any additional unneeded
data in the view. This means that fewer views are necessary and since no structure
navigation is necessary, query use by non-technical users becomes even simpler
and more powerful with seamless Global Views.
3D: Dynamically Formatted Output
Dynamically formatted output is controlled by the SQL variable SELECT list.
This means that the output data for the same view can change from use to use.
Another important feature of the automatic output is that it is dynamically
aggregated by the automatic use of node promotion and node collection to slice
out the unreferenced nodes and keep the pathways seamlessly connected. In SQL,
this is called projection and is controlled by the SELECT list operator. The
Global View capability also makes the dynamically formatted output easier to
specify.
XQuery does not support structure-aware processing because it can freely use
any type of structure. This flexibility comes with a tradeoff that hierarchical
structures cannot be monitored or utilized fully. This Structure-aware
processing and its capabilities are described in more detail in the article: SQL’s Optimized Hierarchical Data Processing Driven
by its Data Structure.
4: Dynamic Structure Generation
There are five ways that data structures can be generated. Four are covered
in this Section; transformations are covered later in Section 6.
Transformations are already existing structures that are modified at execution.
This makes them different from standard structure generation and they contain
their own multiple choices of different transformation capabilities.
4A: Basic Data Modeling
Basic data modeling is accomplished using the LEFT Outer Join to specify the
structure a node at a time building the structure top (root) down. This is
usually specified in a view and can be reused. This is used to define
relational hierarchically modeled structures and to define physical structures
like XML. With the newer SQL ON clause used at each join point any structure
can be modeled more easily and unambiguously than trying to use a single WHERE
clause for the entire structure. In addition, the ON and WHERE clause work
differently. The WHERE clause filters the entire structure a row at a time,
while the ON clause only affects its area of the structure and a given portion
of the row. This is the same as hierarchical data preservation, which is
necessary for correct hierarchical processing.
4B: Joining Hierarchical Data Structures
Joining hierarchical data structures is accomplished using the LEFT Outer
Join operation. These hierarchical joins can consist of logical relational
and/or physical XML structure views. This is because they both use the same
LEFT Outer Join operation and can dynamically support heterogeneous
hierarchical joins seamlessly to create a virtual hierarchical view. Standard
hierarchical joining is applied from anywhere in the upper structure to the
root of the lower structure. Linking anywhere under the root of the lower
structure has presented hierarchical semantic problems for interpreting the
appropriate resulting hierarchical semantics or structure. Section 5 does
present a solution for linking below the root of the lower level structure.
4C: Dynamically Structured Output
Dynamically structured output is controlled by the SQL variable SELECT list.
This means that the output structure for the same view can change from use to
use controlled by the variable SELECT list. Another important feature of the
automatic structured output is that it is dynamically smoothed by the automatic
hierarchical node promotion and node collection (same as relational projection)
to seamlessly splice out the missing nodes and data while keeping the pathways
connected. The Global View capability also makes the dynamic structured output
much easier to specify. This description is basically the same as what was
described for dynamically formatted output in Section 3D but is listed here to
complete the list for dynamic structure generation because it can be used for
this purpose too.
4D: Data Driven Variable Structures
Data driven variable structure creation is controlled by the newer ON
clause. The ON clause is not limited to specifying join criteria. It can also
specify data filtering criteria. This allows the specifying of view structures
and the applying of data filtering criteria on the ON clause to control whether
the hierarchical view join is performed or not depending on the data value in
the current data occurrence. The data value can be located up the active structure
path or down the structure anywhere in the reachable lower structure view being
joined. This ability to test data that is in the lower structure being joined
is very powerful; it is actually a look-ahead capability described further in
Section 5. These data driven variable structure processing capabilities
described in more detail in the article: Dynamic Data Driven Variable Hierarchical Structures
in SQL.
5: Extended Hierarchical Modeling for Data Mashups
While experimenting with our ANSI SQL transparent hierarchical XML
processor, we determined that it was possible to join two structures and have
the lower level linked to anywhere below the root node. By letting the ANSI SQL
hierarchical processor perform its natural multipath hierarchical joining
allowing linking below the lower level structure we determined that the result
always made semantic sense for the query. This makes sense since SQL
hierarchical processing is automatic, correct, and has no limitations. To apply
this capability to our SQL hierarchical processor, which operates utilizing its
structure-aware capability, we had to derive the resulting combined
hierarchical structure.
We determined the resulting structure when linking below the lower level
root structure was always as if the lower level structure’s root was linked to.
This is because the lower level structure is fully materialized before joining.
This happens because of the natural view nesting controlled by the ending ON
clause signals that the right view can now be joined to the left view as in:
SELECT * FROM ABC LEFT OUTER JOIN XYZ ON A.a=X.x. This is very easy to
process and naturally supports a look-ahead capability. This linking below the
lower level root allows for a very powerful data mashup capability with no
limitations for hierarchical joining. For more information on linking below the
lower level structure’s root and why it works see the article: Extending Hierarchical Data Modeling Demonstrated in
SQL.
6: Data Structure Transforms and Data Virtualization
While researching data transformations with our ANSI SQL transparent XML
hierarchical processor it was determined that there were two basic types of
data transformations, which we classified. The first is Restructure transformations,
which utilize data relationships in the data. It takes the structure apart and
reassembles it utilizing different relationships in the data. The other
transformation is Reshaping, which operates by using only the natural semantics
of the data structure, which can allow any-to-any structure transformations.
These two transformations have different uses for changing the structure versus
changing the semantics, but can combine their different capabilities. The
Reshaping transform is also useful because it can always be used when there are
no data relationships available.
Data transformations normally operate on a single structure while data
virtualization collects fragments from multiple structures. In order to support
data virtualization operations, using a transformation that utilizes and
combines fragments from multiple structures is used. Restructure
transformations can utilize their natural multiple structure access ability to
support data virtualization. These transformations utilize the natural
capability of relational rowsets containing variable hierarchical structures.
These relational rowsets also have a conceptual hierarchical fragment
manipulation capability needed to perform flexible transformations. For more
information on data structure transforms and virtualization see the article: Hierarchical Data Structure Virtualization in SQL.
7: Enhanced Multipath Processing Capabilities
As indicated in this article, most relational operations are transparently
and seamlessly elevated to hierarchical operations. But some operations like
data ordering require a different approach that more naturally takes advantage
of multipath hierarchical processing that does not automatically fit into the
existing SQL hierarchical processing. For example, the current Order By
operation is linear. In order for it to be extended seamlessly to multipath
hierarchical processing, it needed to support multiple path ordering. This is
where data ordering can be independently occurring on separate pathways. This
becomes very useful and powerful. This was supported in our ANSI SQL
transparent XML hierarchical processor prototype with the current Order By syntax,
which was semantically applied differently to support multiple path ordering.
This type of operation will be extended to the prototype’s data summaries and
aggregates also. A hierarchical multipath Order BY operation is shown and explained
in the article: Extending SQL’s Inherent Hierarchical Processing
Operation.
8: Automatic Data Renormalization
Automatic data renormalization requires structure-aware processing. It also
deserves its own Top 10 entry because of its uniqueness and value. Operations
like the relational Cartesian product and inverting 1-to-M relationships to
M-to-1 and back to 1-to-M, require a process to remove the introduced data
replications to produce the correct results. This is called data renormalization.
Renormalization is a difficult operation to perform. Some processors like
XQuery offer functions that can be used to perform renormalization when
required. In this case, the user must be aware that this processing is
necessary and specify it correctly. My company’s ANSI SQL transparent XML
hierarchical processor prototype supports automatic renormalization, which is a
tremendous help because it is hard to keep track of replicated data. Automatic
renormalization requires knowledge of the data structure and additional
information added automatically to the input data. This data also can include
information used to preserve the natural data order of XML data input.
Renormalization is covered in more detail in the article: Extending SQL’s Inherent Hierarchical Processing
Operation.
9: Hierarchical Distributed Processing
The hierarchical oriented LEFT Outer Join can model the most complex
heterogeneous virtual data structures. These structures can be easily optimized
removing unnecessary pathways and fragments that are not necessary to the
processing of the active query. This is possible because the SQL-92 Outer Join
has a very recombinant capability. Hierarchical views ABC and XYZ can be easily
joined hierarchically in the SQL statement: SELECT * FROM ABC LEFT OUTER
JOIN XYZ ON A.a=X.x . This same SQL join statement will work automatically
and seamlessly in a distributed computing environment. This is because the ABC
and XYZ hierarchical views can also be automatically distributed to different
SQL processors where they will naturally perform hierarchical processing. Their
separate hierarchical results will be routed back and hierarchically joined
together at the originating SQL processor containing the above LEFT Outer Join
statement. This flexible hierarchical data modeling is shown in the article: Extended Hierarchical Data Modeling Demonstrated in
SQL.
10: Future Growth Capabilities
Two powerful future capabilities that offer unlimited power and growth
opportunities down the road are the replacement of the relational engine with a
true hierarchical engine, and the support of automatic parallel processing for
multicore processing.
10A: Hierarchical Engine Usage
It is possible to transparently swap out the relational engine operating
hierarchically for a more powerful, efficient, and flexible hierarchical
engine. Operating truly hierarchically this offers the most efficient method
and avoids the relational problems of expensive joins and replicated data
explosions.
10B: Parallel Processing and Multicore Support
Hierarchical processing of hierarchical structures offers the opportunity to
fully parallel process hierarchical queries and support multicore processors.
Hierarchical data structures with their separate pathways can be naturally
parallel processed to a very significant extent. Their hierarchical structure
naturally supplies a roadmap to enable fully automatic parallel processing.
This is described in the article: Automatic Full Parallel Processing of Hierarchical
SQL Queries.
Conclusion
The new hierarchical processing presented in this article is a new level of
hierarchical processing driven by ANSI SQL. The now proven fact that SQL can
support hierarchical processing means there is an overlap of the common
hierarchical and relational processing. This point of common hierarchical and
relational processing solves all current relational/hierarchical integration
problems. It enables advanced concepts like joining hierarchical structures;
mashups of hierarchical structures; transforms of hierarchical structures with
any-to-any reshaping; and generating virtualized data structures. It also
introduces new hierarchical concepts such as multipath ordering and performing
summaries along multiple pathways. To test the capabilities described in this
article, use our online interactive ANSI SQL Transparent XML Hierarchical
Processor prototype at: www.adatinc.com/demo.html
Related Articles
Navigationless Database XML: Hierarchical Data Processing
Performing Hierarchical Restructuring Using ANSI SQL
Automatic Full Parallel Processing of Hierarchical SQL Queries
Creating Hierarchical Data Structure Mashups