Data Densification, Demystified: Oracle 10g SQL Enhancements, Part 2

Synopsis. Oracle 10g has extended the already robust features of Structured Query Language (SQL) with a plethora of intriguing new capabilities. This article – the second in this series – reviews the new capabilities Oracle 10g provides that, in concert with analytical functions, permit the creation of “densified” output with a few simple queries.


The previous article in this series reviewed several new SQL features in Oracle 10g, including improved query and access methods for nested tables, upgrades to the MERGE command, and hierarchical query enhancements. Impressive as these new capabilities are, Oracle 10g’s new data modeling features allow any Oracle developer or DBA to produce sophisticated, complex calculations using just the database engine, thus eliminating the need to massage data output via a third-party tool. Many of the new Oracle 10g query features depend upon the analytical functions established in Oracle 9i, so I will first delve into those features before discussing partitioned outer joins and data densification.


Analytical Functions: A Brief Introduction


I have been using SQL since 1985, and in my humble but biased opinion, Oracle’s extensions to standard ANSI SQL are superior to those of its competitors. Oracle 9i added a complete suite of analytical functions to provide some impressive reporting and calculation capabilities.


Just about every major system I have worked on has at least one historical table, i.e. a table that stores multiple historical entries to track changes to a particular data domain over time. A good example of an historical table is the JOB_HISTORY table in the HR demo schema, which contains multiple rows for each employee detailing that employee’s job assignments over time. A typical reporting requirement usually required against historical tables is the retrieval of either the least or most recent entry- for example, the least recent job assignment for a specific employee based on the employee’s start date in that position.


If I were using another DBMS, I would probably have little choice but to construct the following SQL query and utilize a subquery to find the minimum starting date, and then use the result from that subquery to isolate the remainder of the employee’s job history information:

SQL> TTITLE ‘Earliest Employee Job History Entry (Traditional Method)’
SQL> COL employee_id FORMAT 9999 HEADING ‘EmpID’
SQL> COL full_name FORMAT A24 HEADING ‘Employee Name’
SQL> COL job_id FORMAT A10 HEADING ‘JobID’
SQL> COL start_date FORMAT A10 HEADING ‘Start Date’
SQL> COL end_date FORMAT A10 HEADING ‘End Date’
SQL>
SQL> SELECT
2 E.employee_id
3 ,E.last_name || ‘, ‘ || E.first_name full_name
4 ,JH.job_id
5 ,TO_CHAR(JH.start_date, ‘mm-dd-yyyy’) start_date
6 ,TO_CHAR(JH.end_date, ‘mm-dd-yyyy’) end_date
7 FROM
8 hr.employees E
9 ,hr.job_history JH
10 WHERE E.employee_id = JH.employee_id
11 AND JH.start_date = (
12 SELECT MIN(start_date)
13 FROM hr.job_history
14 WHERE employee_id = 114)
15 AND JH.employee_id = 114;
Earliest Employee Job History Entry (Traditional Method)
EmpID Employee Name JobID Start Date End Date
—– ———————— ———- ———- ———-
114 Raphaely, Den ST_CLERK 03-24-1998 12-31-1999

As of Oracle 9i, however, I have an alternative to this method: I can use an Oracle analytical function to return just the first entry in the list of job history entries for the specific employee. Briefly, here is how analytical functions perform their tasks:


Execution Order. A SQL statement that uses an analytical function first executes all joins and then processes all WHERE, GROUP BY, and HAVING statements. Oracle then hands off the result set to the analytical function for calculation and processing, and then finally any query-level ORDER BY statements are processed.


Partitioning of the Result Set. If the analytical function’s interior query has any GROUP BY statements, Oracle will logically divide the result set into partitions (not to be confused with partitioned tables or indexes) based on the columns specified. A single partition can be as large as all the rows in the result set, or the partitions can be relatively tiny, depending on the chosen columns. An analytical function can operate against aggregated values in each result set partition.


The Current Row and “Windowing.” As the analytical function processes each row in the result set partition, that row is marked as the current row. The current row becomes a reference point for any windowing operations. Windowing operations occur whenever an analytical function defines a sliding window to limit the selection of rows within a result set partition. This means that analytical processing can be limited to a range of rows based on a displacement of a specific number of rows from the current row in the result set partition (e.g., plus or minus 3 rows from the current row). Analytical processing can also be based on specified inclusion criteria.


Analytical functions are typically used heavily in decision-support and data warehousing reporting because they are extremely powerful for calculating rankings and percentiles within a result set, as well as calculating moving and cumulative aggregations based on a displacement from a specific row in the result set. It is also possible to calculate lagging and leading values within a partition, or first and last values within a partition. Here is an example of how to return the same results using Oracle 9i’s FIRST_VALUE analytical function:

SQL> TTITLE ‘Earliest Employee Job History Entry (Analytic Function)’
SQL> COL eid FORMAT 9999 HEADING ‘EmpID’
SQL> COL name FORMAT A24 HEADING ‘Employee Name’
SQL> COL jobid FORMAT A10 HEADING ‘JobID’
SQL> COL begin_date FORMAT A10 HEADING ‘Start Date’
SQL> COL stop_date FORMAT A10 HEADING ‘End Date’
SQL>
SQL> SELECT
2 eid
3 ,NAME
4 ,jobid
5 ,begin_date
6 ,stop_date
7 FROM (
8 SELECT
9 JH.employee_id eid
10 ,E.last_name || ‘, ‘ || E.first_name NAME
11 ,JH.job_id jobid
12 ,JH.START_DATE begin_date
13 ,JH.end_date stop_date
14 ,FIRST_VALUE(JH.start_date) OVER (
15 ORDER BY JH.employee_id, JH.start_date
16 ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
17 ) recent_history
18 FROM
19 hr.employees E
20 ,hr.job_history JH
21 WHERE E.employee_id = JH.employee_id
22 AND JH.employee_id IN (114)
23 )
24 WHERE begin_date = recent_history
25 ORDER BY eid;
Earliest Employee Job History Entry (Analytic Function)
EmpID Employee Name JobID Start Date End Date
—– ———————— ———- ———- ———-
114 Raphaely, Den ST_CLERK 03/24/1998 12/31/1999

In this example, note that the FIRST_VALUE analytical function indicates the retrieval of a single result set partition, sorted within Employee ID and Start Date based on its ORDER BY clause. The ROWS directive tells Oracle to use all entries in each result set partition to determine which entry has the first (and therefore earliest) start date in each partition.


Listing 2.1 shows the DML statements I issued to add rows into the original JOB_HISTORY table to demonstrate these queries and the original queries themselves. An expanded query that shows how the PARTITION BY clause could be used within an analytical query to return the first job history entry for each employee is shown in Listing 2.2.

Jim Czuprynski
Jim Czuprynski
Jim Czuprynski has accumulated over 30 years of experience during his information technology career. He has filled diverse roles at several Fortune 1000 companies in those three decades - mainframe programmer, applications developer, business analyst, and project manager - before becoming an Oracle database administrator in 2001. He currently holds OCP certification for Oracle 9i, 10g and 11g. Jim teaches the core Oracle University database administration courses on behalf of Oracle and its Education Partners throughout the United States and Canada, instructing several hundred Oracle DBAs since 2005. He was selected as Oracle Education Partner Instructor of the Year in 2009. Jim resides in Bartlett, Illinois, USA with his wife Ruth, whose career as a project manager and software quality assurance manager for a multinational insurance company makes for interesting marital discussions. He enjoys cross-country skiing, biking, bird watching, and writing about his life experiences in the field of information technology.

Latest Articles