Oracle provides quite an array of functions when it comes to

manipulating data via SQL. The Oracle9i SQL Reference guide (for release 2)

lists five categories of SQL functions, with each category containing one or

more functions within a category. The major functions are single-row,

aggregate, analytic, object reference and user defined.

The single-row function category contains the following

functions: number, character, datetime, conversion and miscellaneous single

row. What are single-row functions? The definition shown in the SQL Reference

guide states that single-row "functions return a single result row for

every row of a queried table or view. These functions can appear in select

lists, `WHERE`

clauses, `START`

`WITH`

and `CONNECT`

`BY`

clauses, and `HAVING`

clauses." Aggregate

functions are also quite powerful. These functions "return a single result

row based on groups of rows, rather than on single rows." In general,

single-row and aggregate functions complement one another.

Of particular interest for this new series are the functions

related to numbers. How does this relate to your job as a DBA? Answer: in

several ways. First, you may have to support a Decision Support group. Decision

support groups frequently use analytic functions and statistical methods to

support a business decision. It helps to understand (or at least recognize the

name) some of the analytic tools being used. Second, you may be working in a

data warehouse type of environment where YOU are the decision support guru.

Granted, your job may be to simply massage or extract the data for others to

analyze, but you are the SQL expert by virtue of being the DBA. If you are

asked to find the regression line for quantity produced versus sales, you will

make a better impression by not having that deer in the headlights look come

across your face.

A third reason has to do to adding value to your company by

saving your company some money. How does that work? Well, companies that need

to analyze data often purchase sophisticated tools such as SPSS and SAS. These

products are not cheap. What frequently happens is that users of these high-end

statistical packages wind up using very few features. Many of these features

are the same ones found in Excel. Already you are thinking about how to extract

data into a comma-separated value flat file for use in Excel. However, what if

you didn’t need to do that "export" and could do the same exact thing

within Oracle itself? You have already paid for Oracle, and likely have Excel

on every PC, but have the overhead of extracting the data and then having to do

whatever in Excel. How about skipping the Excel step and performing the data

analysis within Oracle? Note: by "Oracle," I mean the RDBMS product

and not anything having to do with Oracle Apps. We are talking about using the

database as a high-end calculator, so to speak.

One thing that goes hand-in-hand with analysis is some type

of pictorial representation of what is being analyzed. Although Oracle does

provide other tools for accomplishing this, that falls more into the Forms

& Reports developer realm, so we’ll say that is a possible solution and

leave it at that. So even if there is a heavy chart/graph/histogram/whatever

requirement to be met by using Excel (or something else), you can still use

Oracle as a backup to check the work performed by someone else using a

different tool.

Let’s look at the linear regression aggregate function as an

example of what Oracle can do as an analysis engine. If you hate math, the next

few paragraphs may be painful, but you’ll be able to follow along without

getting buried in the theorems and formulas.

What is linear regression? Let’s say you have an input, like

list price of some product your company produces. The lower the price, the more

your company sells of that product. Conversely, the higher the price, the fewer

your company sells. Over time, you plot selling price versus quantity sold. When

you look at the dots or plots on a piece of graph paper, perhaps the

arrangement of the dots tends to suggest drawing a straight line, which

generally comes close to connecting most of the dots. If the dots are pretty

close to the line, then you may have a strong relationship between the X and Y

(selling price and quantity produced). In fact, the line may be such a good fit

that you can come close to predicting the quantity sold if given the selling

price, and vice versa.

On the other hand, the points, when plotted, may look like a

hazy cloud on the graph paper – there is no distinct line or trend between the

X and Y values. Because there is no line which "fits" or connects the

points, there is a weak relationship between X and Y (a list price of 52 may be

just as likely to result in a quantity sold of 12, 17, or 18 – there is no or

very little predictive power knowing one thing or the other).

The general equation of a line is y = mx + b. In English,

that means "Y" is equal to the slope ("m") times "X"

plus a constant "b." That constant is actually called the y-intercept

(if the input of x = 0, then the output of y = b, or that is where the line

crosses the Y axis). The y-intercept may or may not have any real meaning. The

slope is important because it shows the rate of change between X and Y, and the

sign of the slope reflects the direction of the relationship (a positive slope

shows that as X increases, so does Y; a negative slope shows that as X

increases, Y decreases).

So, getting back to what Oracle can do for you, with respect

to linear regression: quite a bit. The linear regression function, REGR_"component,"

outputs nine items of interest typically used when evaluating data. A component

is selected by specifying the related REGR phrase and the two inputs – expr1

and expr2 – or the independent and dependent variables. I could have said the "X"

and the "Y," and I will later, but not for now.

To look at the example Oracle provides, you will need to

install the sample schema named SH (for sales history). The sample schemas are

referenced in the Oracle9i Sample Schemas guide (http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96539/toc.htm).

If you installed the sample schemas during the Oracle9i installation, you will

also need to unlock the SH user account (and give it a password you can

remember, like "SH"). If you have never seen a million-row table,

this is your opportunity. As the user SH, doing a "select count(*) from

sales" shows just over a million rows (1016271, to be exact).

The example shown in the SQL Reference guide has five channel_id’s,

but we will concentrate on the first one (identified as "C"). Here is

the select statement and its output as shown in a SQL*Plus session as the user

named SH:

SQL> SELECT

2 s.channel_id,

3 REGR_SLOPE(s.quantity_sold, p.prod_list_price) SLOPE ,

4 REGR_INTERCEPT(s.quantity_sold, p.prod_list_price) INTCPT ,

5 REGR_R2(s.quantity_sold, p.prod_list_price) RSQR ,

6 REGR_COUNT(s.quantity_sold, p.prod_list_price) COUNT ,

7 REGR_AVGX(s.quantity_sold, p.prod_list_price) AVGLISTP ,

8 REGR_AVGY(s.quantity_sold, p.prod_list_price) AVGQSOLD

9 FROM sales s, products p

10 WHERE s.prod_id=p.prod_id AND

11 p.prod_category=’Men’ AND

12 s.time_id=to_DATE(’10-OCT-2000′)

13 GROUP BY s.channel_id;C SLOPE INTCPT RSQR COUNT AVGLISTP AVGQSOLD

– ———- ———- ———- —— ———- ———-

C -.0683687 16.627808 .051342581 20 65.495 12.15

I .019710295 14.8113924 .001631488 46 51.4804348 15.826087

P -.01247359 12.854546 .017039788 30 81.87 11.8333333

S .006155886 13.9919243 .000898438 83 69.813253 14.4216867

T -.00411314 5.22717214 .008132242 27 82.2444444 4.88888889

Let’s add another AND to the WHERE clause to get just the

first row:

SQL> SELECT

2 s.channel_id,

3 REGR_SLOPE(s.quantity_sold, p.prod_list_price) SLOPE ,

4 REGR_INTERCEPT(s.quantity_sold, p.prod_list_price) INTCPT ,

5 REGR_R2(s.quantity_sold, p.prod_list_price) RSQR ,

6 REGR_COUNT(s.quantity_sold, p.prod_list_price) COUNT ,

7 REGR_AVGX(s.quantity_sold, p.prod_list_price) AVGLISTP ,

8 REGR_AVGY(s.quantity_sold, p.prod_list_price) AVGQSOLD

9 FROM sales s, products p

10 WHERE s.prod_id=p.prod_id AND

11 p.prod_category=’Men’ AND

12 s.time_id=to_DATE(’10-OCT-2000′)

13 AND s.channel_id = ‘C’

14 group by s.channel_id;C SLOPE INTCPT RSQR COUNT AVGLISTP AVGQSOLD

– ———- ———- ———- —— ———- ———-

C -.0683687 16.627808 .051342581 20 65.495 12.15

Overall, there were 206 rows involved, but we are only

interested in the 20 rows for "C."