Hierarchical Data Structure Transformation in SQL


The two basic types of data structure transformation terms, Restructuring and Reshaping, are used interchangeably with data structure transformations. This article categorizes and defines these two types of data structure transformations and uses SQL to demonstrate the transformations, using SQL’s natural hierarchical data structure processing capability.

I have found that there are two basic types of data
structure transformations. I have also noticed that the transformation terms Restructuring
and Reshaping are used interchangeably with data structure transformations.
These two terms bring to mind different types of transformations I have associated
with the two basic transformations. In this article, I will categorize and
define these two types of data structure transformations. I will be using SQL
to demonstrate these types of data structure transformations using SQL’s
natural hierarchical data structure processing capability.

Data structures are built to a great extent by utilizing the
related data values in the data to make the linkages that can bring the data
together in different ways. Restructuring is performed by using these data relationships
in the data as linkages in order to change the structure to model these new
relationships. This is done by taking the structure apart as needed and applying
new data relationships not currently used or reusing them differently. This
introduces new meaning and semantics to the resulting structure. In addition,
these data relationships can involve comparisons and formulations to make the
new relationships linkages.

Changing the topic of Restructuring to Reshaping, reshaping
brings to mind a molding process by shifting pieces of the structure around as
in molding a piece of clay. This means that that there are no limitations to what
the resulting structure can be, allowing any structure to be transformed into
any other structure producing any-to-any structure transformations. The trick
with performing this type of transform is that the semantics of the structure
must drive how the transform is performed in order to preserve the basic semantics
that are applied to the creation of the new structure. This means the naturally
implied data relationships from the physically or logically juxtaposition of
the data in the structure flows with the structure as it is molded. This
produces the desired new structure while taking into consideration the current
semantics represented by the data.

Restructuring is used when new semantics are needed and data
relationships are available to support this. The resulting structure is not a
concern. Reshaping is used when a specific result structure is necessary as in
hierarchical data modeling. The resulting semantics are not a primary concern
but are expected to be hierarchically derived from the source structure.
Restructuring requires that the necessary data relationships are available
while Reshaping does not require any external or internal requirements so it is
always available to use.

Restructuring Example

The following SQL Restructure operation in Figure 1 slices
out the Invoice node shown below, and makes it the new root in the resulting
structure. The data relationship used in making this new structure may be the
same or a previously unused data relationship. If this data structure was
retrieved from a contiguous data structure like XML there may not have been any
physical data relationships to use. But on the other hand, once any type of
data structure (physical or logical, contiguous or linked) is retrieved into a
relational rowset it can be freely taken apart. Reassembling will depend on the
data relationships that can be made.

Restructuring

The Alias feature of SQL beside its renaming use allows
making multiple, separately named copies of a structure in the rowset, which can
be used in taking apart the structure above and hierarchically reassembling it.
The separately identified data from the two named copies of the identified structure
in the FROM statement above is identified by the prefixes of X and Y associated
with their created data copies in the FROM clause. The LEFT Outer join in
Figure 1 models the restructured data as Y over X since Y is preserved when X
is not. This is why InvID is identified in the SELECT statement with a Y prefix
and EmpID, DndID and AddrID are identified with an X prefix in the SELECT
statement since InvID is desired at a higher hierarchical level than the others.
The higher level Y.InvID is used in the SELECT List and is used in the ON
clause to reflect its proper position in the new hierarchy.

Restructuring Example Additional Level

This is the same as the previous Restructuring example
except in addition to moving the Invoice to the top of the new structure, the
Addr node is repositioned under the Dpnd node. This is done by adding another
level to the structure by using another LEFT Join to attach the Addr node under
the Dpnd node.

Restructuring adding extra level

Notice
in the above SQL in Figure 2 compared to the previous SQL in Figure 1 that the
additionally added LEFT Join creates another copy of the structure to reference
using the prefix of W. The AddrId field in the SELECT list now uses the W
prefix so it can be referenced in the new LEFT join. This LEFT join is used to
move the Addr node under the Dpnd node.

Reshaping

The following examples will demonstrate a number of data structure
reshaping examples that cover linear to linear, linear to nonlinear, nonlinear
to linear and nonlinear to nonlinear structure transforms. The previous
restructuring used physical data relationships in the data to perform the
transform. This can not be used in reshaping and a technique similar to the
restructuring is used. It uses the same processing of using multiple copies of
the structure, but the coordination between the copies of the structures is
different in that the relationship of the structures is now made between the same
named node types of the two structure copies that are being moved in the order
that they are being moved.

Inverting a 1 to M Linear Three Level Structure

This is a linear structure inversion using nodes Cust over
Emp over Dpnd (a series of 1 to M relationships) in Figure 3 below. This will
demonstrate that the alignment joining starts at the bottom of the structures,
and drives the inversion upward at each node as the selected inverted node (in
bold) from each level is selected: SELECT first: X.Dpnd, second: Y.Emp, third:
Z.Dpnd. Only these three nodes represented by their data are selected once each
for output so they are squeezed together through natural node promotion to keep
their structure which is naturally inverted and now has M to 1 relationships.

Linear inversion

Linear to Nonlinear Reshaping Logic

Linear structures can be reshaped into nonlinear structures.
In this example in Figure 4 below, the linear structure Cust over Emp over Dpnd
can be used to generate the structure Emp directly over the siblings Dpnd and
Cust. Two copies of the input structure are necessary and are joined by their
common Emp node since it is the starting node to building the new structure.
Only two copies of the structure are necessary because the first copy can be
used to move two nodes, Emp over Dpnd, which are already related as required.
This places Emp over the Dpnd and Cust siblings because Emp is linked to Cust.
This creates the nonlinear structure desired and these same node values (in the
SELECT list) will be selected to create the nonlinear structure desired. By
placing Emp over Cust, Cust is naturally and correctly replicated.

The previous linear examples and this nonlinear example have
not lost the semantics of the input structure in the new structure because the
semantics have been kept the same or inverted. This means the nodes have
basically kept attached to the same nodes as shown below in the derived result
structure. Emp over Dpnd remains the same while Emp over Cust has been
inverted.

Linear to Nonlinear Reshaping Logic

Nonlinear to Linear Reshaping

Nonlinear structures as input can also be used to build
linear and nonlinear structures. In fact, nonlinear structures offer more
flexibility in how they are utilized because their multiple paths offer more
opportunity to find the correct reshaping being sought. This means less input
copies need to be used.

The nonlinear input structure in Figure 5 below can be
duplicated to create a linear structure reshaping of itself using the SQL below.
Since we are starting with the Inv node as the root, this will be the first
matching link. Cust becomes available in the second level structure, which is
valid since it is related to Inv the related link point. Emp can also be
selected from the same lower structure copy since it is already located under the
Cust node.

Nonlinear to linear reshaping example

Nonlinear to Nonlinear Reshaping

This nonlinear to nonlinear example, in figure 6 below, is
very similar to the previous nonlinear to linear SQL example in Figure 5, where
the linear structure Inv over Cust over Emp was produced easily. The current
example below demonstrates that a nonlinear structure can transform into a
different nonlinear structure. This example copies the previous example but
places Emp not under Cust but under Inv making Cust and Emp siblings for this
structure. This requires a third copy of the input structure also matched to
Inv because that is where Emp is being attached to. The Emp node is accessed
indirectly from Inv up to Cust then back down to Emp, a powerful related
semantic reshape. Basically, the third structure and additional join in this
example was necessary to move Emp from under Cust to under the Inv node.

Nonlinear to nonlinear reshaping example

Polymorphic Reshaping

A polymorphic reshaping does not rely on the structure of
the input structure. This means that the input structure could be in any shape.
The reshaping capability shown in Figure 3 and 6 do support polymorphic
reshaping because only one node is moved per join. This means any shape
structure with the same nodes can be transformed into the desired resulting
structure. For example, the different input structure in Figure 7 would produce
the same results from the SQL in Figure 6. The examples in Figure 4 and 5 are
not polymorphic because it relies on a specific shape of the input structure by
moving two related nodes at one time. The choice of using reduced steps or a polymorphic
solution is up to the user. The examples in Figures 4 and 5 could have been
written to be performed in three steps moving a single node at a time making
them polymorphic when it is useful.

Different input structure

Conclusion

The SQL Reshaping and Restructuring shown do take a little
bit of procedural programming. But a high level nonprocedural transform
language could be designed for reshaping based only on the desired structure
and the SQL can be generated or internally performed.

It may have occurred to you that the need to create multiple
copies of the structure in the examples, which could be costly in memory use. Most
SQL processors optimize the multiple copies of the structure to use one
physical copy.

The different operations of Reshaping and Restructuring can
be combined as needed. The transformation examples shown are described in more
detail and can be tested at http://www.adatinc.com/demo.

»


See All Articles by Columnist

Michael M. David

Michael M. David
Michael M. David
Michael M. David is the founder of Advanced Data Access Technologies, Inc. Previously, he was the lead XML architect for NCR/Teradata, and served as their representative to the ANSI SQLX Group. Before that he was a staff scientist for Teradata and designed high level multi-featured SQL utilities. He has more than 25 years of experience researching and designing commercial nonprocedural heterogeneous database hierarchical query processing products using flat, relational and hierarchical data. From this experience, he authored the book Advanced ANSI SQL Data Modeling and Structure Processing, as well as numerous papers and articles on this subject.

Latest Articles