You may think you know everything there is to know about the Scott schema, but part of its overall usefulness lies in how versatile it is with respect to its ability to illustrate many of Oracle’s and SQL’s features. The illustrative point examined in this article concerns trees and hierarchies. An excellent book by Joe Celko (Joe Celko’s Trees and Hierarchies in SQL for Smarties) shows several means of arranging hierarchical data. This article combines the familiar emp table and some of the examples shown in Celko’s book.
The standard nested hierarchy
A quick search on the Internet yields numerous sites showing you how to get a nested hierarchy of the emp table. Another typical example of this hierarchy is that of parent-child foreign key relationships. Shown below is an example of the typical hierarchy based on the emp table, but with an extra feature (the mgr_key). This example is available at the askTom Web site (you’ll have to add the mgr_lookup function, and the code for that is at the site).
select /*+ first_rows */ rpad( '*', (level-1)*2, '*' ) || ename ename, empno, mgr_key from emp start with mgr_lookup(mgr_key) = -1 connect by prior empno = mgr_lookup(mgr_key); ENAME EMPNO MGR_KEY --------------- ---------- -------- KING 7839 mgr_ **JONES 7566 mgr_7839 ****SCOTT 7788 mgr_7566 ******ADAMS 7876 mgr_7788 ****FORD 7902 mgr_7566 ******SMITH 7369 mgr_7902 **BLAKE 7698 mgr_7839 ****ALLEN 7499 mgr_7698 ****WARD 7521 mgr_7698 ****MARTIN 7654 mgr_7698 ****TURNER 7844 mgr_7698 ****JAMES 7900 mgr_7698 **CLARK 7782 mgr_7839 ****MILLER 7934 mgr_7782
Let’s put this into an organization chart (or wire diagram).
This chart is courtesy of Visio’s Organization Chart Wizard. The wizard connects to a data source (a database in this case), queries the table you select, and determines the hierarchical relationship of the table’s elements. Aside from presenting an easier means of interpreting the hierarchical relationships within the emp table, the chart leads us into the first example of how to record the relationships by using a nested set model.
The “emp” family
The algorithm for creating a nested set model is pretty simple. Start at the top, enter the next number in the lower left corner of the element’s box, then move down a level if there is one, or across the element, and enter the next number. If there are no lower elements, traverse across that element and enter the next number, then traverse across to the next sibling, and repeat the numbering. Continue until you return to the top element of the current branch. Traverse across to the next branch and repeat. Return to the top-level element when there are no more lower level branches to traverse to and enter the final/last number of the sequence.
Putting an algorithm into words may make it seem confusing, but when applying the steps, the implementation is usually much easier, and even more so when you have an example to go by. We start with King and enter a 1 in the lower left corner. Move down to Clark and enter a 2. Since Clark has a lower level/child element, move down to Miller and enter a 3. Move across Miller and enter a 4, returning to Clark with a 5. Traverse across to Blake and enter a 6, and so on from there.
Now that we have an ordered pair identification scheme (e.g., Martin is [9,10]), we can create a new emp table as shown below.
CREATE TABLE EMP_ORG (EMPNO NUMBER NOT NULL, LEFT NUMBER NOT NULL, RIGHT NUMBER NOT NULL); INSERT INTO EMP_ORG VALUES (7839,1,28); INSERT INTO EMP_ORG VALUES (7782,2,5); INSERT INTO EMP_ORG VALUES (7934,3,4); INSERT INTO EMP_ORG VALUES (7698,6,17); INSERT INTO EMP_ORG VALUES (7844,7,8); INSERT INTO EMP_ORG VALUES (7654,9,10); INSERT INTO EMP_ORG VALUES (7521,11,12); INSERT INTO EMP_ORG VALUES (7900,13,14); INSERT INTO EMP_ORG VALUES (7499,15,16); INSERT INTO EMP_ORG VALUES (7566,18,27); INSERT INTO EMP_ORG VALUES (7788,19,22); INSERT INTO EMP_ORG VALUES (7876,20,21); INSERT INTO EMP_ORG VALUES (7902,23,26); INSERT INTO EMP_ORG VALUES (7369,24,25); SQL> select * from emp_org; EMPNO LEFT RIGHT ---------- ---------- ---------- 7839 1 28 7782 2 5 7934 3 4 7698 6 17 7844 7 8 7654 9 10 7521 11 12 7900 13 14 7499 15 16 7566 18 27 7788 19 22 7876 20 21 7902 23 26 7369 24 25 14 rows selected.
Following along with an example in the book, the next step is to create a view.
CREATE OR REPLACE VIEW EMP_ORG_VIEW (EMPNO, LEVEL_0, LEVEL_1, LEVEL_2) AS SELECT A.EMPNO, CASE WHEN COUNT(C.EMPNO) = 2 THEN B.EMPNO ELSE NULL END AS LEVEL_0, CASE WHEN COUNT(C.EMPNO) = 3 THEN B.EMPNO ELSE NULL END AS LEVEL_1, CASE WHEN COUNT(C.EMPNO) = 4 THEN B.EMPNO ELSE NULL END AS LEVEL_2 FROM EMP_ORG A, EMP_ORG B, EMP_ORG C WHERE A.LEFT BETWEEN B.LEFT AND B.RIGHT AND C.LEFT BETWEEN B.LEFT AND B.RIGHT AND A.LEFT BETWEEN C.LEFT AND C.RIGHT GROUP BY A.EMPNO, B.EMPNO;
The final step is to write the query which shows the hierarchical structure of the organization.
SELECT EMPNO, MAX(LEVEL_0) "LVL 0", MAX(LEVEL_1) "LVL 1", MAX(LEVEL_2) "LVL 2" FROM EMP_ORG_VIEW GROUP BY EMPNO; EMPNO LVL 0 LVL 1 LVL 2 ----- ----- ----- ----- 7844 7698 7839 7839 7782 7839 7698 7839 7521 7698 7839 7902 7566 7839 7788 7566 7839 7654 7698 7839 7934 7782 7839 7566 7839 7499 7698 7839 7876 7788 7566 7839 7900 7698 7839 7369 7902 7566 7839 14 rows selected.
As an alternative, you can replace empno with ename (make changes as appropriate for the data type and references), and generate a name display.
ENAME LVL 0 LVL 1 LVL 2 ------------ ------- ------- ------- ALLEN BLAKE KING JONES KING FORD JONES KING MILLER CLARK KING CLARK KING SMITH FORD JONES KING WARD BLAKE KING TURNER BLAKE KING MARTIN BLAKE KING ADAMS SCOTT JONES KING SCOTT JONES KING BLAKE KING JAMES BLAKE KING KING
Either way (by empno or ename), the results reflect the same organizational structure, but the ordered representation is somewhat hard to follow. Unfortunately, that is not the worst problem with a nested set. The number one weakness of this model is its inflexibility. What happens if a leaf node/element (Smith) leaves the organization? The structure survives, but what if you need to add a new element? For example, how do you add a new level under Ward?
The “connect by prior” version is indifferent to changes like that. Simply add the new employee, assign a manager (Ward), and the query output reflects the change. In the nested set model, every position after the insertion requires modification. Another real world example of a structure-breaking change is that of adding a new branch/department. Not only is there the reordering issue, but we have now introduced another problem – updating the structure of the view (adding a new level). Clearly, this type of model, although useful for academic purposes, is a maintenance nightmare. In fact, rigid models such as this are analogous to the rigid structure of hierarchical databases before the days of the relational model.
Build in some flexibility
As another example, who says the numbers you use in the ordered pair have to be consecutive? What matters is the relative ordering of the numbers (increasing in value as you walk the chart, but not necessarily increasing by one). Let’s take the ename version and induce a range (multiply each position value by 10).
CREATE TABLE ENAME_BY10_ORG (ENAME VARCHAR2(12) NOT NULL, LEFT NUMBER NOT NULL, RIGHT NUMBER NOT NULL); INSERT INTO ENAME_BY10_ORG VALUES ('KING',10,280); INSERT INTO ENAME_BY10_ORG VALUES ('CLARK',20,50); INSERT INTO ENAME_BY10_ORG VALUES ('MILLER',30,40); INSERT INTO ENAME_BY10_ORG VALUES ('BLAKE',60,170); INSERT INTO ENAME_BY10_ORG VALUES ('TURNER',70,80); INSERT INTO ENAME_BY10_ORG VALUES ('MARTIN',90,100); INSERT INTO ENAME_BY10_ORG VALUES ('WARD',110,120); INSERT INTO ENAME_BY10_ORG VALUES ('JAMES',130,140); INSERT INTO ENAME_BY10_ORG VALUES ('ALLEN',150,160); INSERT INTO ENAME_BY10_ORG VALUES ('JONES',180,270); INSERT INTO ENAME_BY10_ORG VALUES ('SCOTT',190,220); INSERT INTO ENAME_BY10_ORG VALUES ('ADAMS',200,210); INSERT INTO ENAME_BY10_ORG VALUES ('FORD',230,260); INSERT INTO ENAME_BY10_ORG VALUES ('SMITH',240,250);
Let’s add someone between Martin and Ward.
INSERT INTO ENAME_BY10_ORG VALUES ('COLE',101,102);
You can see that Cole picks up the same hierarchy as her siblings.
So far, so good. Now let’s add someone between Martin and Cole. Oops, we didn’t plan far enough ahead in the insertion/addition scheme because there isn’t any room (numerically speaking, using whole numbers) between 100 (the end of Martin) and 101 (the beginning of Cole). Without having to resort to real numbers (which includes values such as 100.2, 100.74, etc.), one way to preserve the use of whole numbers is to make the range wide enough, and then pack in additions close to the boundary, but not too close so as to leave room for future additions.
With this modification to the nested set model, adding new nodes within existing levels is quite easy due to the increased range of numbers, but we are still faced with the maintenance issue regarding the addition of new levels. You can always create the view with more levels than currently being used, so that helps to minimize the maintenance problem.
Coming back to Oracle
What do connect by prior, start with, and level mean in this query?
select rpad( '*', (level-1)*2, '*' ) || table_name table_name from temp_constraints start with fkey_constraint is null connect by prior pkey_constraint = r_constraint_name;
The SQL Reference guide defines these items as such:
- START WITH specifies the root row(s) of the hierarchy.
- CONNECT BY specifies the relationship between parent rows and child rows of the hierarchy. In a hierarchical query, one expression in condition must be qualified with the PRIOR operator to refer to the parent row.
- LEVEL (pseudocolumn) returns 1 for a root row, 2 for a child of a root, and so on for each row returned by a hierarchical query
“Connect by” is not a SQL-92 ANSI standard, and other versions of SQL use different constructs to perform the same function. You can begin to appreciate the power and usefulness of this feature, especially when faced with having to query hierarchical relationships using nested sets or other set models.
In Closing
One of the hidden tools or algorithms behind hierarchical queries is recursion. The best way to learn to recursion is to first learn recursion. That’s an old joke, but recursion helps drive complex queries, and queries involving hierarchies and trees are certainly among the most complex. Fortunately for us mere mortals, masters of the trade such as Joe Celko and Tom Kyte have reduced the seemingly complex into more understandable chunks of information. Understanding and have some adeptness at using SQL is one of the key requirements for becoming a better DBA.