Trees and Hierarchies in Oracle

You may think you know everything there is to know about the Scott schema, but part of its overall usefulness lies in how versatile it is with respect to its ability to illustrate many of Oracle’s and SQL’s features. The illustrative point examined in this article concerns trees and hierarchies. An excellent book by Joe Celko (Joe Celko’s Trees and Hierarchies in SQL for Smarties) shows several means of arranging hierarchical data. This article combines the familiar emp table and some of the examples shown in Celko’s book.

The standard nested hierarchy

A quick search on the Internet yields numerous sites showing you how to get a nested hierarchy of the emp table. Another typical example of this hierarchy is that of parent-child foreign key relationships. Shown below is an example of the typical hierarchy based on the emp table, but with an extra feature (the mgr_key). This example is available at the askTom Web site (you’ll have to add the mgr_lookup function, and the code for that is at the site).

select /*+ first_rows */
rpad( '*', (level-1)*2, '*' ) || 
   ename ename, empno, mgr_key
from emp
start with mgr_lookup(mgr_key) = -1
connect by prior empno = mgr_lookup(mgr_key);
ENAME                EMPNO MGR_KEY
--------------- ---------- --------
KING                  7839 mgr_
**JONES               7566 mgr_7839
****SCOTT             7788 mgr_7566
******ADAMS           7876 mgr_7788
****FORD              7902 mgr_7566
******SMITH           7369 mgr_7902
**BLAKE               7698 mgr_7839
****ALLEN             7499 mgr_7698
****WARD              7521 mgr_7698
****MARTIN            7654 mgr_7698
****TURNER            7844 mgr_7698
****JAMES             7900 mgr_7698
**CLARK               7782 mgr_7839
****MILLER            7934 mgr_7782

Let’s put this into an organization chart (or wire diagram).

This chart is courtesy of Visio’s Organization Chart Wizard. The wizard connects to a data source (a database in this case), queries the table you select, and determines the hierarchical relationship of the table’s elements. Aside from presenting an easier means of interpreting the hierarchical relationships within the emp table, the chart leads us into the first example of how to record the relationships by using a nested set model.

The “emp” family

The algorithm for creating a nested set model is pretty simple. Start at the top, enter the next number in the lower left corner of the element’s box, then move down a level if there is one, or across the element, and enter the next number. If there are no lower elements, traverse across that element and enter the next number, then traverse across to the next sibling, and repeat the numbering. Continue until you return to the top element of the current branch. Traverse across to the next branch and repeat. Return to the top-level element when there are no more lower level branches to traverse to and enter the final/last number of the sequence.

Putting an algorithm into words may make it seem confusing, but when applying the steps, the implementation is usually much easier, and even more so when you have an example to go by. We start with King and enter a 1 in the lower left corner. Move down to Clark and enter a 2. Since Clark has a lower level/child element, move down to Miller and enter a 3. Move across Miller and enter a 4, returning to Clark with a 5. Traverse across to Blake and enter a 6, and so on from there.

Now that we have an ordered pair identification scheme (e.g., Martin is [9,10]), we can create a new emp table as shown below.

CREATE TABLE EMP_ORG
(EMPNO NUMBER NOT NULL,
LEFT NUMBER NOT NULL,
RIGHT NUMBER NOT NULL);
INSERT INTO EMP_ORG VALUES (7839,1,28);
INSERT INTO EMP_ORG VALUES (7782,2,5);
INSERT INTO EMP_ORG VALUES (7934,3,4);
INSERT INTO EMP_ORG VALUES (7698,6,17);
INSERT INTO EMP_ORG VALUES (7844,7,8);
INSERT INTO EMP_ORG VALUES (7654,9,10);
INSERT INTO EMP_ORG VALUES (7521,11,12);
INSERT INTO EMP_ORG VALUES (7900,13,14);
INSERT INTO EMP_ORG VALUES (7499,15,16);
INSERT INTO EMP_ORG VALUES (7566,18,27);
INSERT INTO EMP_ORG VALUES (7788,19,22);
INSERT INTO EMP_ORG VALUES (7876,20,21);
INSERT INTO EMP_ORG VALUES (7902,23,26);
INSERT INTO EMP_ORG VALUES (7369,24,25);
SQL> select * from emp_org;
     EMPNO       LEFT      RIGHT
---------- ---------- ----------
      7839          1         28
      7782          2          5
      7934          3          4
      7698          6         17
      7844          7          8
      7654          9         10
      7521         11         12
      7900         13         14
      7499         15         16
      7566         18         27
      7788         19         22
      7876         20         21
      7902         23         26
      7369         24         25
14 rows selected.

Following along with an example in the book, the next step is to create a view.

CREATE OR REPLACE VIEW EMP_ORG_VIEW 
(EMPNO, LEVEL_0, LEVEL_1, LEVEL_2)
AS
SELECT A.EMPNO,
  CASE WHEN COUNT(C.EMPNO) = 2
       THEN B.EMPNO
       ELSE NULL END AS LEVEL_0,
  CASE WHEN COUNT(C.EMPNO) = 3
       THEN B.EMPNO
       ELSE NULL END AS LEVEL_1,
  CASE WHEN COUNT(C.EMPNO) = 4
       THEN B.EMPNO
       ELSE NULL END AS LEVEL_2
FROM EMP_ORG A,
     EMP_ORG B,
     EMP_ORG C
WHERE
   A.LEFT BETWEEN B.LEFT AND B.RIGHT
AND
   C.LEFT BETWEEN B.LEFT AND B.RIGHT
AND
   A.LEFT BETWEEN C.LEFT AND C.RIGHT
GROUP BY A.EMPNO, B.EMPNO;       

The final step is to write the query which shows the hierarchical structure of the organization.

SELECT EMPNO,
       MAX(LEVEL_0) "LVL 0",
       MAX(LEVEL_1) "LVL 1",
       MAX(LEVEL_2) "LVL 2"
FROM EMP_ORG_VIEW
GROUP BY EMPNO;
     EMPNO LVL 0 LVL 1 LVL 2
     ----- ----- ----- -----
      7844  7698  7839
      7839
      7782  7839
      7698  7839
      7521  7698  7839
      7902  7566  7839
      7788  7566  7839
      7654  7698  7839
      7934  7782  7839
      7566  7839
      7499  7698  7839
      7876  7788  7566  7839
      7900  7698  7839
      7369  7902  7566  7839
14 rows selected.

As an alternative, you can replace empno with ename (make changes as appropriate for the data type and references), and generate a name display.

ENAME        LVL 0   LVL 1   LVL 2
------------ ------- ------- -------
ALLEN        BLAKE   KING
JONES        KING
FORD         JONES   KING
MILLER       CLARK   KING
CLARK        KING
SMITH        FORD    JONES   KING
WARD         BLAKE   KING
TURNER       BLAKE   KING
MARTIN       BLAKE   KING
ADAMS        SCOTT   JONES   KING
SCOTT        JONES   KING
BLAKE        KING
JAMES        BLAKE   KING
KING

Either way (by empno or ename), the results reflect the same organizational structure, but the ordered representation is somewhat hard to follow. Unfortunately, that is not the worst problem with a nested set. The number one weakness of this model is its inflexibility. What happens if a leaf node/element (Smith) leaves the organization? The structure survives, but what if you need to add a new element? For example, how do you add a new level under Ward?

The “connect by prior” version is indifferent to changes like that. Simply add the new employee, assign a manager (Ward), and the query output reflects the change. In the nested set model, every position after the insertion requires modification. Another real world example of a structure-breaking change is that of adding a new branch/department. Not only is there the reordering issue, but we have now introduced another problem – updating the structure of the view (adding a new level). Clearly, this type of model, although useful for academic purposes, is a maintenance nightmare. In fact, rigid models such as this are analogous to the rigid structure of hierarchical databases before the days of the relational model.

Build in some flexibility

As another example, who says the numbers you use in the ordered pair have to be consecutive? What matters is the relative ordering of the numbers (increasing in value as you walk the chart, but not necessarily increasing by one). Let’s take the ename version and induce a range (multiply each position value by 10).

CREATE TABLE ENAME_BY10_ORG
(ENAME VARCHAR2(12) NOT NULL,
LEFT NUMBER NOT NULL,
RIGHT NUMBER NOT NULL);
INSERT INTO ENAME_BY10_ORG VALUES ('KING',10,280);
INSERT INTO ENAME_BY10_ORG VALUES ('CLARK',20,50);
INSERT INTO ENAME_BY10_ORG VALUES ('MILLER',30,40);
INSERT INTO ENAME_BY10_ORG VALUES ('BLAKE',60,170);
INSERT INTO ENAME_BY10_ORG VALUES ('TURNER',70,80);
INSERT INTO ENAME_BY10_ORG VALUES ('MARTIN',90,100);
INSERT INTO ENAME_BY10_ORG VALUES ('WARD',110,120);
INSERT INTO ENAME_BY10_ORG VALUES ('JAMES',130,140);
INSERT INTO ENAME_BY10_ORG VALUES ('ALLEN',150,160);
INSERT INTO ENAME_BY10_ORG VALUES ('JONES',180,270);
INSERT INTO ENAME_BY10_ORG VALUES ('SCOTT',190,220);
INSERT INTO ENAME_BY10_ORG VALUES ('ADAMS',200,210);
INSERT INTO ENAME_BY10_ORG VALUES ('FORD',230,260);
INSERT INTO ENAME_BY10_ORG VALUES ('SMITH',240,250);

Let’s add someone between Martin and Ward.

INSERT INTO ENAME_BY10_ORG VALUES ('COLE',101,102);

You can see that Cole picks up the same hierarchy as her siblings.

So far, so good. Now let’s add someone between Martin and Cole. Oops, we didn’t plan far enough ahead in the insertion/addition scheme because there isn’t any room (numerically speaking, using whole numbers) between 100 (the end of Martin) and 101 (the beginning of Cole). Without having to resort to real numbers (which includes values such as 100.2, 100.74, etc.), one way to preserve the use of whole numbers is to make the range wide enough, and then pack in additions close to the boundary, but not too close so as to leave room for future additions.

With this modification to the nested set model, adding new nodes within existing levels is quite easy due to the increased range of numbers, but we are still faced with the maintenance issue regarding the addition of new levels. You can always create the view with more levels than currently being used, so that helps to minimize the maintenance problem.

Coming back to Oracle

What do connect by prior, start with, and level mean in this query?

select rpad( '*', (level-1)*2, '*' ) || table_name table_name
from temp_constraints
start with fkey_constraint is null
connect by prior pkey_constraint = r_constraint_name;

The SQL Reference guide defines these items as such:

  • START WITH specifies the root row(s) of the hierarchy.
  • CONNECT BY specifies the relationship between parent rows and child rows of the hierarchy. In a hierarchical query, one expression in condition must be qualified with the PRIOR operator to refer to the parent row.
  • LEVEL (pseudocolumn) returns 1 for a root row, 2 for a child of a root, and so on for each row returned by a hierarchical query

“Connect by” is not a SQL-92 ANSI standard, and other versions of SQL use different constructs to perform the same function. You can begin to appreciate the power and usefulness of this feature, especially when faced with having to query hierarchical relationships using nested sets or other set models.

In Closing

One of the hidden tools or algorithms behind hierarchical queries is recursion. The best way to learn to recursion is to first learn recursion. That’s an old joke, but recursion helps drive complex queries, and queries involving hierarchies and trees are certainly among the most complex. Fortunately for us mere mortals, masters of the trade such as Joe Celko and Tom Kyte have reduced the seemingly complex into more understandable chunks of information. Understanding and have some adeptness at using SQL is one of the key requirements for becoming a better DBA.

» See All Articles by Columnist Steve Callan

Steve Callan
Steve Callan
Steve is an Oracle DBA (OCP 8i and 9i)/developer working in Denver. His Oracle experience also includes Forms and Reports, Oracle9iAS and Oracle9iDS.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles