Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted Aug 26, 2005

Data Densification, Demystified: Oracle 10g SQL Enhancements, Part 2 - Page 2

By Jim Czuprynski

Oracle 10g Enhancements to Analytical Functions

One morning I arrived early at my client site to find a developer running multiple SQL queries, then feverishly transferring the resulting output into a Microsoft Excel spreadsheet. When I asked her the reason for her frantic activity, she told me she needed to perform some complex financial calculations on the results, but did not know how to perform them from within Oracle.

I took an informal poll of the developers I supported, and I found out that many of them were still creating queries to extract raw data from tables and/or views, but then dumping that output to comma-delimited files and then into a spreadsheet so that the raw data could be manipulated via Excel's analytical functions. Some typical requirements these developers were pursuing included these scenarios:

  • A Sales account executive asked for a subset of revenue data to create a series of graphs for her accounts. The graph needed to break out sales within each state in the Midwest and across different products and annual periods. The account executive also specified that any missing values for states or time periods in which no sales took place for a particular product could be filled in with meaningful default values.
  • A user in Marketing needed a report that broke down sales within time periods, product types, and regions, including percentages of the total sales calculated at different control breaks.
  • An Accounting department user needed a report that calculated the eventual value of a company's outstanding cash balance if particular assets were invested at different rates of interest, or over different time periods.

The good news is that Oracle 10g has beefed up analytical function processing so that all these scenarios can be handled by simply running an appropriate query directly against the database. I will cover the first scenario in the following sections, and I will tackle the last two scenarios in my next article when I cover the MODEL clause in depth.

Using Partitioned Outer Joins To "Densify" Data

Oracle 10g offers a new type of join, a partitioned outer join, that not only lets me join disparate sets of data, but also allows me to fill in "gaps" in the result sets because some data might be missing. For example, even though sales of a particular product may exist in most of the states in the Midwest, it is entirely possible that no sales exist in a particular calendar year for that product because of the time period in which it was introduced.

Oracle calls the concept of filling in missing data with partitioned outer joins data densification. To illustrate, I will create a new view, SH.CALENDAR_YEARS, that contains all existing calendar years in the SH.TIMES time periods table and adds in all current calendar years up to and including 2005. I will also create a new view named SH.SALES_MIDWEST_ONLY that will gather only a small subset of sales data from the sales history table, SALES. (I am using a small subset only to delineate the results of data densification; in real life, of course, I would use all the available sales data.)

Next, I will construct a query that uses a partitioned outer join to link together sales data from the SH.SALES_MIDWEST_ONLY view with a larger subset of Middle Western U.S. states to illustrate how data densification can be implemented:

SQL> TTITLE 'Partitioned Outer Join Example #1'
SQL> COL prod            FORMAT A32              HEADING 'Product'
SQL> COL state           FORMAT A4               HEADING 'State'
SQL> COL year            FORMAT 9999             HEADING 'JobID'
SQL> COL tot_sales       FORMAT 999,999,999.99   HEADING 'Total Sales'
SQL>
SQL> SELECT
  2       SLS.prod
  3      ,RGNS.state
  4      ,SLS.year
  5      ,NVL(SLS.sales,0) tot_sales
  6  FROM
  7      (SELECT
  8           MWO.state
  9          ,MWO.prod
 10          ,MWO.year
 11          ,SUM(MWO.sale) sales
 12        FROM
 13            sh.sales_midwest_only MWO
 14       GROUP BY
 15           MWO.state
 16          ,MWO.prod
 17          ,MWO.YEAR
 18      ) SLS
 19      PARTITION BY (SLS.prod)
 20      RIGHT OUTER JOIN
 21      (SELECT
 22          DISTINCT cust_state_province state
 23         FROM sh.customers
 24        WHERE cust_state_province IN ('IA', 'IL','IN','MI','OH','WI')
 25      ) RGNS
 26      ON (RGNS.state = SLS.state)
 27   ORDER BY SLS.prod, SLS.state, SLS.year;
                       Partitioned Outer Join Example #1
Product                          Stat JobID     Total Sales
-------------------------------- ---- ----- ---------------
And 2 Crosscourt Tee Kids        IL    1998          826.00
And 2 Crosscourt Tee Kids        IL    1999        1,187.90
And 2 Crosscourt Tee Kids        IL    2000        2,619.40
And 2 Crosscourt Tee Kids        IN    1998          266.00
And 2 Crosscourt Tee Kids        IN    2000        2,086.00
And 2 Crosscourt Tee Kids        MI    1999           53.20
And 2 Crosscourt Tee Kids        MI    2000        1,209.60
And 2 Crosscourt Tee Kids        WI    1998          686.00
And 2 Crosscourt Tee Kids        WI    1999          504.00
And 2 Crosscourt Tee Kids        WI    2000          378.00
And 2 Crosscourt Tee Kids        IA                     .00
And 2 Crosscourt Tee Kids        OH                     .00
Coin Pocket Twill Cargo Trousers MI    1999          234.00
Coin Pocket Twill Cargo Trousers MI    2000           78.00
Coin Pocket Twill Cargo Trousers WI    1999          722.15
Coin Pocket Twill Cargo Trousers WI    2000          390.00
Coin Pocket Twill Cargo Trousers IA                     .00
Coin Pocket Twill Cargo Trousers IL                     .00
Coin Pocket Twill Cargo Trousers OH                     .00
Coin Pocket Twill Cargo Trousers IN                     .00
Gurfield& Murks Pleated Trousers IL    1998        9,100.00
Gurfield& Murks Pleated Trousers IL    1999       13,825.00
Gurfield& Murks Pleated Trousers IL    2000        2,100.00
Gurfield& Murks Pleated Trousers IN    1999        7,525.00
Gurfield& Murks Pleated Trousers MI    1998       26,040.00
Gurfield& Murks Pleated Trousers MI    1999       12,600.00
Gurfield& Murks Pleated Trousers MI    2000       19,425.00
Gurfield& Murks Pleated Trousers WI    1998          175.00
Gurfield& Murks Pleated Trousers IA                     .00
Gurfield& Murks Pleated Trousers OH                     .00
Kahala Pleated Chino Short       IL    1998          504.00
Kahala Pleated Chino Short       IL    1999        3,738.00
Kahala Pleated Chino Short       IA                     .00
Kahala Pleated Chino Short       IN                     .00
Kahala Pleated Chino Short       MI                     .00
Kahala Pleated Chino Short       OH                     .00
Kahala Pleated Chino Short       WI                     .00
37 rows selected.

Listing 2.3 shows the creation of both of these views, the initial query and this complete result set.

N-Dimensional Data Densification

The previous query does produce zeroed totals for those remaining states (i.e. Ohio and Iowa) which are present in the second subquery but not in the first subquery. That is better than nothing, but unfortunately, data is only present for the years 1999, 2000 and 2001 (the only time periods present in the SALES table's data). However, there is nothing to stop me from employing multiple partitioned outer joins to populate more than one data dimension.

Here is an example of how to accomplish this via one of my favorite, elegant SQL features that Oracle provides -- the WITH clause -- to specify the three component subqueries before invoking the multiple partitioned outer joins. The resulting output of this query will contain a fully "densified" result set:

SQL> TTITLE 'Partitioned Outer Join Densifying Across Two Dimensions'
SQL> COL prod            FORMAT A32              HEADING 'Product'
SQL> COL state           FORMAT A4               HEADING 'State'
SQL> COL year            FORMAT 9999             HEADING 'JobID'
SQL> COL tot_sales       FORMAT 999,999,999.99   HEADING 'Total Sales'
SQL> 
SQL> WITH
  2      -- Gather all Midwest-only sales data
  3      Q1 AS (
  4          SELECT
  5               prod
  6              ,state
  7              ,year
  8              ,SUM(sale) sale
  9            FROM
 10               sh.sales_midwest_only
 11           GROUP BY state, prod, year
 12      ),
 13      -- Gather a list of distinct states
 14      Q2 AS (
 15          SELECT DISTINCT cust_state_province state
 16         FROM sh.customers
 17        WHERE cust_state_province IN ('IA', 'IL','IN','MI','OH','WI')
 18      ),
 19      -- Gather a list of distinct calendar years
 20      Q3 AS (
 21          SELECT calendar_year year
 22            FROM sh.calendar_years
 23      )
 24  SELECT
 25       Q4.prod
 26      ,Q4.state
 27      ,Q3.year
 28      ,NVL(Q4.sale, 0) tot_sales
 29    FROM
 30      (SELECT
 31           Q1.prod
 32          ,Q2.state
 33          ,Q1.year
 34          ,Q1.sale
 35        FROM Q1
 36          PARTITION BY (prod)
 37          RIGHT OUTER JOIN Q2
 38          ON (Q1.state = Q2.state)
 39      ) Q4
 40          PARTITION BY (prod,state)
 41          RIGHT OUTER JOIN Q3
 42          ON (Q4.YEAR = Q3.year)
 43   ORDER BY 1, 2, 3;

A fully documented version of this query as well as its complete result set is shown in Listing 2.4.

Conclusion

The new analytical function enhancements in Oracle 10g give developers and DBAs significantly more horsepower for advanced data modeling -- without the use of third-party software to perform these analyses. These new features make Structured Query Language within Oracle 10g even more robust and flexible. The next article in this series will concentrate on the new MODEL clause, an extremely powerful enhancement that Oracle 10g provides that significantly extends Oracle's robust analytical reporting capabilities.

References and Additional Reading

While there is no substitute for direct experience, reading the manual is not a bad idea, either. I have drawn upon the following Oracle 10g documentation for the deeper technical details of this article:

B10736-01 Oracle Database Data Warehousing Guide, Ch. 21

B10750-01 Oracle Database New Features Guide

B10759-01 SQL Reference

» See All Articles by Columnist Jim Czuprynski



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM