dcsimg

Just SQL Part VII - Hierarchical Queries

October 5, 2005

Whether they are called hierarchical structures, trees, or self-referencing tables they often pose quite a challenge to traverse in any simple manner. But Oracle has a solution.

It is not a question as to whether or not hierarchical models are good or bad. We will all come across them no matter where we go. However, querying these structures has often baffled many practitioners. Not just in understanding the model behind them but in querying them properly. Before we begin with the SQL required to query hierarchical table structures it is always good to understand what they are.

A sample hierarchical structure is a family tree. In Figure 1, I have given the famous Disney Duckdom family tree. The obvious first name and last name are given but then a unique ID is given for each of the family members. The parent ID corresponds to either the male or the sole surviving parent. So, if you ever wanted to know who Donald Duck's parents were and who his grandfather was you can easily just go to this table. If you follow up the tree from Donald Duck, you would see that Donald's dad was Quackmore Duck and his only living grandparent would be Grandma Duck. So where do the McDucks fit into this? Well it was actually Hortense McDuck who married Quackmore Duck, which is not shown in this table. Therefore, Donald Duck's mother was Hortense McDuck, the daughter of Scotty McDuck. An interesting note is that even though Huey, Dewey, and Louie often call Scrooge McDuck 'Uncle,' Scrooge is really Donald Duck's uncle through the marriage of Hortense and Quackmore. Quite an interesting family tree when you consider the brokenness and single parents evolved. We won't even get into where Daisy Duck fits in but it is interesting she has the same last name as Donald but they aren't married. A slightly different way to look at this is given in Figure 2.

Figure 1.
Duck Family Tree


Figure 2
Duck Family Tree

Therefore, if we created our DUCK_TREE table and inserted these values we would have the following after a simple select statement.

SQL7gt; select * from duck_tree order by family_id;
FIRST_NAME      LAST_NAME   FAMILY_ID  PARENT_ID
--------------- ---------- ---------- ----------
Scotty          McDuck            100
Matilda         McDuck            101        100
Scrooge         McDuck            102        100
Hortense        McDuck            103        100
Grandma         Duck              200
Quackmore       Duck              201        200
Daphne          Duck              202        200
Thelma          Duck              203        201
Donald          Duck              204        201
Huey            Duck              205        203
Dewey           Duck              206        203
Louie           Duck              207        203

What we want to do is query this table in such a manner to traverse and present the rows in an order that is in line with the true family tree without having to know the relationship numbers between the columns FAMILY_ID and PARENT_ID. With simple SQL, we can easily answer questions like who are the direct children of a certain parent. If we wanted to know the children of Quackmore Duck, we could write a simple SQL statement such as this.

SQL> select b.*
       from duck_tree a, duck_tree b
      where a.first_name = 'Quackmore'
        and a.family_id = b.parent_id
      order by b.family_id;
FIRST_NAME      LAST_NAME   FAMILY_ID  PARENT_ID
--------------- ---------- ---------- ----------
Thelma          Duck              203        201
Donald          Duck              204        201

This is not that difficult but the question now becomes how do we find the grandchildren of Quackmore Duck? This query is much harder to construct and requires us to create another level of the previous query such as the following.

SQL> select c.*
       from (select b.*
               from duck_tree a, duck_tree b
              where a.first_name = 'Quackmore'
                and a.family_id = b.parent_id) b,
            duck_tree c
      where b.family_id = c.parent_id
      order by c.family_id;
FIRST_NAME      LAST_NAME   FAMILY_ID  PARENT_ID
--------------- ---------- ---------- ----------
Huey            Duck              205        203
Dewey           Duck              206        203
Louie           Duck              207        203

As you can see, this can start to get quite deep and difficult to traverse much deeper into great-grandchildren. Moreover, we have not even started talking about presenting the parent's names along with their children within this query. Very difficult to say the least.

Have no fear; Oracle has come up with some hierarchical query operators that allow us to traverse these trees with quite simplistic SQL. All you need to do is supply Oracle with a hierarchical query clause that tells where the root of the tree is and a relationship between a parent and child node.

Suppose we want to select again the full family tree under Quackmore Duck.

We begin with:

SQL>  select * from duck_tree

Add a CONNECT BY clause to relate the children nodes to the parent nodes in our hierarchy.

     CONNECT BY PRIOR family_id = parent_id

Then we need to tell Oracle where in the tree our root node is

       START WITH first_name = 'Quackmore'

The SQL looks like this with output.

SQL> select *
       from duck_tree
    CONNECT BY PRIOR family_id = parent_id
      START WITH first_name = 'Quackmore';
FIRST_NAME      LAST_NAME   FAMILY_ID  PARENT_ID
--------------- ---------- ---------- ----------
Quackmore       Duck              201        200
Thelma          Duck              203        201
Huey            Duck              205        203
Dewey           Duck              206        203
Louie           Duck              207        203
Donald          Duck              204        201

The PRIOR operator is very important in this SQL statement. The PRIOR operator can actually be on either side of the equality but has quite a different impact on the result set. Typically, the PRIOR operator goes on the child and we give the root node through the START WITH clause. However, we could actually put the PRIOR operator on the parent side and give a child node in the START WITH clause. This allows us to answer questions like who are the parents and grand parents of Donald Duck. Here is the SQL and we can see Donald and his relationship up to the top root node on the 'Duck' side of this family tree.

SQL> select *
       from duck_tree
      connect by family_id = prior parent_id
       start with first_name = 'Donald';
FIRST_NAME      LAST_NAME   FAMILY_ID  PARENT_ID
--------------- ---------- ---------- ----------
Donald          Duck              204        201
Quackmore       Duck              201        200
Grandma         Duck              200

To give a nice hierarchical looking result set you can use the LEVEL pseudocolumn. This pseudocolumn returns 1 for the root node, 2 for a child, 3 for the grandchildren, and so forth. Add the function LPAD to indent the result set and you get the following nice looking results.

SQL> select LPAD(' ', 2*level-1)||first_name first_name,
            last_name, family_id, parent_id
       from duck_tree
      connect by prior family_id = parent_id
       start with first_name = 'Quackmore';
FIRST_NAME      LAST_NAME   FAMILY_ID  PARENT_ID
--------------- ---------- ---------- ----------
 Quackmore      Duck              201        200
   Thelma       Duck              203        201
     Huey       Duck              205        203
     Dewey      Duck              206        203
     Louie      Duck              207        203
   Donald       Duck              204        201

Very similar to the LEVEL pseudocolumn is the SYS_CONNECT_BY_PATH pseudocolumn. This shows the full path between parent and child.

SQL> select first_name, last_name, family_id, parent_id,
            SYS_CONNECT_BY_PATH(first_name, '/') "Path"
       from duck_tree
      connect by prior family_id = parent_id
       start with first_name = 'Quackmore';
FIRST_NAME      LAST_NAME   FAMILY_ID  PARENT_ID Path
--------------- ---------- ---------- ---------- -----------------------
Quackmore       Duck              201        200 /Quackmore
Thelma          Duck              203        201 /Quackmore/Thelma
Huey            Duck              205        203 /Quackmore/Thelma/Huey
Dewey           Duck              206        203 /Quackmore/Thelma/Dewey
Louie           Duck              207        203 /Quackmore/Thelma/Louie
Donald          Duck              204        201 /Quackmore/Donald

Traversing a hierarchical tree could be a very complex and tedious process if it were not for Oracle's internal operators and pseudocolumns. With the use of the CONNECT BY PRIOR and START WITH operators, complex queries are once again simple queries. So, the next time you (and I) see a self referencing table, we will not need to shy away.

» See All Articles by Columnist James Koopmann








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers