Oracle 10g Enhancements to Analytical Functions
One morning I arrived early at my client site to find a
developer running multiple SQL queries, then feverishly transferring the
resulting output into a Microsoft Excel spreadsheet. When I asked her the
reason for her frantic activity, she told me she needed to perform some complex
financial calculations on the results, but did not know how to perform them
from within Oracle.
I took an informal poll of the developers I supported, and I
found out that many of them were still creating queries to extract raw data
from tables and/or views, but then dumping that output to comma-delimited files
and then into a spreadsheet so that the raw data could be manipulated via
Excel's analytical functions. Some typical requirements these developers were
pursuing included these scenarios:
-
A Sales account executive asked for a subset of revenue data to
create a series of graphs for her accounts. The graph needed to break out sales
within each state in the Midwest and across different products and annual
periods. The account executive also specified that any missing values for
states or time periods in which no sales took place for a particular product
could be filled in with meaningful default values.
-
A user in Marketing needed a report that broke down sales within
time periods, product types, and regions, including percentages of the total
sales calculated at different control breaks.
-
An Accounting department user needed a report that calculated the
eventual value of a company's outstanding cash balance if particular
assets were invested at different rates of interest, or over different time
periods.
The good news is that Oracle 10g has beefed up analytical
function processing so that all these scenarios can be handled by simply
running an appropriate query directly against the database. I will cover the
first scenario in the following sections, and I will tackle the last two
scenarios in my next article when I cover the MODEL clause in depth.
Using Partitioned Outer Joins To "Densify" Data
Oracle 10g offers a new type of join, a partitioned outer
join, that not only lets me join disparate sets of data, but also allows me
to fill in "gaps" in the result sets because some data might be
missing. For example, even though sales of a particular product may exist in
most of the states in the Midwest, it is entirely possible that no sales exist
in a particular calendar year for that product because of the time period in
which it was introduced.
Oracle calls the concept of filling in missing data with
partitioned outer joins data densification. To illustrate, I will create
a new view, SH.CALENDAR_YEARS, that
contains all existing calendar years in the SH.TIMES time periods table and adds in all
current calendar years up to and including 2005. I will also create a new view
named SH.SALES_MIDWEST_ONLY
that will gather only a small subset of sales data from the sales history
table, SALES. (I am
using a small subset only to delineate the results of data densification; in
real life, of course, I would use all the available sales data.)
Next, I will construct a query that uses a partitioned outer
join to link together sales data from the SH.SALES_MIDWEST_ONLY view with a larger subset
of Middle Western U.S. states to illustrate how data densification can be
implemented:
SQL> TTITLE 'Partitioned Outer Join Example #1'
SQL> COL prod FORMAT A32 HEADING 'Product'
SQL> COL state FORMAT A4 HEADING 'State'
SQL> COL year FORMAT 9999 HEADING 'JobID'
SQL> COL tot_sales FORMAT 999,999,999.99 HEADING 'Total Sales'
SQL>
SQL> SELECT
2 SLS.prod
3 ,RGNS.state
4 ,SLS.year
5 ,NVL(SLS.sales,0) tot_sales
6 FROM
7 (SELECT
8 MWO.state
9 ,MWO.prod
10 ,MWO.year
11 ,SUM(MWO.sale) sales
12 FROM
13 sh.sales_midwest_only MWO
14 GROUP BY
15 MWO.state
16 ,MWO.prod
17 ,MWO.YEAR
18 ) SLS
19 PARTITION BY (SLS.prod)
20 RIGHT OUTER JOIN
21 (SELECT
22 DISTINCT cust_state_province state
23 FROM sh.customers
24 WHERE cust_state_province IN ('IA', 'IL','IN','MI','OH','WI')
25 ) RGNS
26 ON (RGNS.state = SLS.state)
27 ORDER BY SLS.prod, SLS.state, SLS.year;
Partitioned Outer Join Example #1
Product Stat JobID Total Sales
-------------------------------- ---- ----- ---------------
And 2 Crosscourt Tee Kids IL 1998 826.00
And 2 Crosscourt Tee Kids IL 1999 1,187.90
And 2 Crosscourt Tee Kids IL 2000 2,619.40
And 2 Crosscourt Tee Kids IN 1998 266.00
And 2 Crosscourt Tee Kids IN 2000 2,086.00
And 2 Crosscourt Tee Kids MI 1999 53.20
And 2 Crosscourt Tee Kids MI 2000 1,209.60
And 2 Crosscourt Tee Kids WI 1998 686.00
And 2 Crosscourt Tee Kids WI 1999 504.00
And 2 Crosscourt Tee Kids WI 2000 378.00
And 2 Crosscourt Tee Kids IA .00
And 2 Crosscourt Tee Kids OH .00
Coin Pocket Twill Cargo Trousers MI 1999 234.00
Coin Pocket Twill Cargo Trousers MI 2000 78.00
Coin Pocket Twill Cargo Trousers WI 1999 722.15
Coin Pocket Twill Cargo Trousers WI 2000 390.00
Coin Pocket Twill Cargo Trousers IA .00
Coin Pocket Twill Cargo Trousers IL .00
Coin Pocket Twill Cargo Trousers OH .00
Coin Pocket Twill Cargo Trousers IN .00
Gurfield& Murks Pleated Trousers IL 1998 9,100.00
Gurfield& Murks Pleated Trousers IL 1999 13,825.00
Gurfield& Murks Pleated Trousers IL 2000 2,100.00
Gurfield& Murks Pleated Trousers IN 1999 7,525.00
Gurfield& Murks Pleated Trousers MI 1998 26,040.00
Gurfield& Murks Pleated Trousers MI 1999 12,600.00
Gurfield& Murks Pleated Trousers MI 2000 19,425.00
Gurfield& Murks Pleated Trousers WI 1998 175.00
Gurfield& Murks Pleated Trousers IA .00
Gurfield& Murks Pleated Trousers OH .00
Kahala Pleated Chino Short IL 1998 504.00
Kahala Pleated Chino Short IL 1999 3,738.00
Kahala Pleated Chino Short IA .00
Kahala Pleated Chino Short IN .00
Kahala Pleated Chino Short MI .00
Kahala Pleated Chino Short OH .00
Kahala Pleated Chino Short WI .00
37 rows selected.
Listing 2.3
shows the creation of both of these views, the initial query and this complete
result set.
N-Dimensional Data Densification
The previous query does produce zeroed totals for those
remaining states (i.e. Ohio and Iowa) which are present in the second subquery
but not in the first subquery. That is better than nothing, but unfortunately,
data is only present for the years 1999, 2000 and 2001 (the only time periods
present in the SALES table's data). However, there is nothing to stop me from
employing multiple partitioned outer joins to populate more than one
data dimension.
Here is an example of how to accomplish this via one of my
favorite, elegant SQL features that Oracle provides -- the WITH clause
-- to specify the three component subqueries before invoking the multiple
partitioned outer joins. The resulting output of this query will contain a
fully "densified" result set:
SQL> TTITLE 'Partitioned Outer Join Densifying Across Two Dimensions'
SQL> COL prod FORMAT A32 HEADING 'Product'
SQL> COL state FORMAT A4 HEADING 'State'
SQL> COL year FORMAT 9999 HEADING 'JobID'
SQL> COL tot_sales FORMAT 999,999,999.99 HEADING 'Total Sales'
SQL>
SQL> WITH
2 -- Gather all Midwest-only sales data
3 Q1 AS (
4 SELECT
5 prod
6 ,state
7 ,year
8 ,SUM(sale) sale
9 FROM
10 sh.sales_midwest_only
11 GROUP BY state, prod, year
12 ),
13 -- Gather a list of distinct states
14 Q2 AS (
15 SELECT DISTINCT cust_state_province state
16 FROM sh.customers
17 WHERE cust_state_province IN ('IA', 'IL','IN','MI','OH','WI')
18 ),
19 -- Gather a list of distinct calendar years
20 Q3 AS (
21 SELECT calendar_year year
22 FROM sh.calendar_years
23 )
24 SELECT
25 Q4.prod
26 ,Q4.state
27 ,Q3.year
28 ,NVL(Q4.sale, 0) tot_sales
29 FROM
30 (SELECT
31 Q1.prod
32 ,Q2.state
33 ,Q1.year
34 ,Q1.sale
35 FROM Q1
36 PARTITION BY (prod)
37 RIGHT OUTER JOIN Q2
38 ON (Q1.state = Q2.state)
39 ) Q4
40 PARTITION BY (prod,state)
41 RIGHT OUTER JOIN Q3
42 ON (Q4.YEAR = Q3.year)
43 ORDER BY 1, 2, 3;
A fully documented version of this query as well as its
complete result set is shown in Listing 2.4.
Conclusion
The new analytical function enhancements in Oracle 10g
give developers and DBAs significantly more horsepower for advanced data
modeling -- without the use of third-party software to perform these
analyses. These new features make Structured Query Language within Oracle 10g even
more robust and flexible. The next article in this series will concentrate on
the new MODEL clause, an extremely powerful enhancement that Oracle 10g
provides that significantly extends Oracle's robust analytical reporting
capabilities.
References and Additional Reading
While there is no substitute for direct experience, reading
the manual is not a bad idea, either. I have drawn upon the following Oracle
10g documentation for the deeper technical details of this article:
B10736-01 Oracle Database
Data Warehousing Guide, Ch. 21
B10750-01 Oracle Database
New Features Guide
B10759-01 SQL Reference
»
See All Articles by Columnist Jim Czuprynski