A lot has been written about the "NoSQL" alternative as an alternative to SQL and other large database systems for a lighter, faster, cheaper alternative for the cloud. Most of these light weight database processors ignore data processing principles and lack the ability to turn data into information. This SQL trade-off may not be necessary.
There has been a lot written
about the NoSQL alternative in the press as an alternative to SQL and other
large database systems for a lighter, faster, cheaper, easier to add new data
types, and larger memory capacity alternative for the cloud. Most of these
light weight database processors do what they claim but at a cost of ignoring
data processing principles and lacking the ability to turn data into
information. This SQL trade-off may not be necessary. The solution is to take advantage
of ANSI SQLs powerful
inherent full hierarchical processing capability as described previously in
this series of articles on SQL hierarchical processing.
Hierarchical Structure Natural Benefits
structures are very unique not only in how well they organize data, but also in
how they naturally capture more meaning than is stored with the data. Even more
impressive is their ability to dynamically process this natural goldmine of
meaning in unlimited ways that further increases the value of the stored data.
With the increased use of hierarchical XML data structures today, there is an
incredible amount of unused data value potential available.
Hierarchical structures are easily built and expanded naturally
over time. Suppose we start building a hierarchical structure by creating and
populating node A over node B. Then at some point we add node C also directly
under node A and also start populating it. This is an example of reuse by
reusing the already existing node A and its data. We also know what nodes node
A has control over and how nodes B and C are related to each other. With the
addition of each new node, the data value will increase in a nonlinear fashion.
At this point, this multipath nonlinear hierarchical structure looks like the
one below in Figure 1.
How SQL is Made to Perform Hierarchically
Figure 1 shows an example of the
LEFT Outer Join defining a hierarchical structure with its syntax and also specifies
its hierarchical processing using its associated semantics (i.e. preserving A
over B and C). This is possible when the processing is limited to only SQL
hierarchical join operations. The LEFT Outer Join and its less used reversed RIGHT
Outer Join are the only join operations that perform hierarchically, so these
are the only join operations that can be used for hierarchical processing. INNER
joins and FULL joins are not allowed. By limiting processing to only
hierarchical joins, hierarchical operational principles come to the foreground
and totally control the processing. This is an example of less is more. This
is similar to structured processing where the GOTO statement is not allowed, enabling
more correct code and error detection with provable correct processing. With
hierarchical processing, this restriction allows for valid hierarchical assumptions,
which supported by hierarchical principles, can create new capabilities such as
performing hierarchical operations and optimization.
Physical Hierarchical Structures Do Not Require Joining in SQL
1 above showed an example of how a hierarchical structure is defined and
modeled in ANSI SQL using the LEFT Outer Join. The fully formed LEFT Outer Join's
syntax and semantics together also specifies the hierarchical data structure's
metadata. For SQL data, the LEFT Outer Join operation is executed by the SQL
processor to build the physical hierarchical structure from the logical relational
hierarchical structure. But with physical hierarchical structures such as XML, the
LEFT Outer Join execution can be eliminated because the hierarchical structure
is already in one piece. This allows it to be converted and placed directly into
the relational rowset without performing any join operations, but the
structured metadata in its defining LEFT Outer Join is still used for its SQL
Hierarchical Optimization Can Produce Other Capabilities
hierarchical optimizations dynamically tailor each query when executed so that
only the necessary nodes in the view are accessed based on the active query. Besides
limiting data access, this greatly reduces relational data explosions. This
makes any large view a global view with no overhead so that fewer views or even
a single view can be utilized. This means that users do not have to know what
view to use or have to be familiar with the structure. This makes SQL
processing more user friendly. These powerful hierarchical capabilities are
performed transparently by using SQLs SELECT, FROM, WHERE and LEFT JOIN operations.
This means that nontechnical users can easily specify powerful hierarchical
queries using a simple standard ANSI SQL hierarchical processing subset.
Easier More powerful Queries
applications can be designed or modeled hierarchically and can benefit from
principled hierarchical operations. We have shown that as a hierarchical
structure naturally grows larger in structure by adding additional nodes and
pathways over time, its inherent data value automatically increases nonlinearly
by the natural hierarchical data reuse. This also significantly increases the
number of possible new queries, further increasing the value of the data. In
addition, the naturally occurring data semantics existing between the
referenced pathways are used to process more powerful and accurate multipath
queries dynamically. This creates more meaning and value from the data than is
stored. This enables these multipath queries to answer these internally complex
queries automatically without requiring user navigation for hierarchical data,
even for XML.
Replacing the Relational Engine with a Hierarchical Engine
processing automatically supports full hierarchical processing, taking
advantage of all the powerful capabilities and principles that come with the
overlap of hierarchical and relational processing principles. Putting an ANSI
SQL relational interface over a hierarchical processing engine to replace the
relational engine should not be that difficult. This is possible because
hierarchical processing is a subset of relational processing. This enables SQL's
hierarchical processing powerful and principled subset of relational processing
to be replaced with a simpler memory friendly powerful hierarchical engine that
can still fulfill the NoSQL capabilities mentioned above. Hierarchical engines
do not replicate data which avoids relational data explosions. They always
reflect the actual data exactly. In addition, performing joins of hierarchical
structure views is performed internally with a simple link operation instead of
expensive relational joins.
Automatic Parallel Processing
structures' natural parallel processing capabilities may be very useful in a
cloud environment by utilizing parallel processing's dynamic memory utilization
and the cloud's ability to service the request on demand and pay as you go
data structures are naturally parallelizable. By examining the hierarchical
data structure with its nodes (or processing tasks) in Figure 2 below, it can
be seen how it naturally maps directly to the parallel pathways. The parallel
paths are: A/B/C; C/D; C/E; A/F/G; G/H; and G/I. In addition, their multiple
data occurrences at each node can be parallel processed too. This is a much
higher than normal level of parallelization possible.
like C and G, which produce multiple pathways as shown in Figure 2 are natural sync
points for database processing. This means that all of their children and their
multiple occurrences must be processed under the same parent node data
occurrence before moving on. This is because moving to the next parent data node
occurrence before all its children are processed can lose database positioning
for the current parent's physical database node occurrence. For example, if
node D completes before node E, node C must wait for node E to complete.
possible problem to full parallel processing of SQL's hierarchical queries is
the Final Query Processing after the Database Processing completes, shown in
Figure 2. But it can be pipelined with the Database Processing. This means that
a new query can start processing while the Final Processing of the previous
query is operating. This helps with the overall throughput of the queries.
hierarchical data processing can be highly processed in parallel automatically,
this makes it a good candidate for cloud processing with its unpredictable
dynamic memory utilization and increased throughput. For additional information
on parallel processing of hierarchical data see: http://www.devx.com/SpecialReports/Article/40939/1954.
article has shown that cutting back on database memory usage and the storage
footprint size does not have to result in giving up data processing principles,
the ability to turn data into information, ease of use, and the support of
See All Articles by Columnist
Michael M. David