Beginning with later versions of Oracle8, Oracle has
provided a means of generating random numbers. This built-in package, DBMS_RANDOM, is fairly simple to use, and can generate
random numbers which are generally good enough for the needs of most users. If
you need to generate a large amount of data without having to provide a lot
thought about how random the data is, then DBMS_RANDOM will suit your needs.
If you need to encrypt sensitive data, then you should use
Oralce9i’s DBMS_OBSFUCATION_TOOLKIT feature. Oracle tells users "Do not
use DBMS_RANDOM as it is unsuitable for cryptographic key generation." Is
there something wrong with DBMS_RANDOM? Aren’t the numbers returned random
enough? Don’t you get the same output when given the same input? The answers
are a combination of "yes" and "no."
By un-wrapping the package Oracle uses to create the random
number generator, we will learn quite a bit about how DBMS_RANDOM works and
what its limitations are. Before looking at the package and some examples of
how it can be used, the meaning of "random" needs to be clarified.
Technically speaking, generating random numbers by a known method removes the
potential for true randomness. When generated in this manner, the numbers can
be properly described as pseudo-random numbers. However, if the pseudo-random
numbers meet several conditions or tests (chiefly, the numbers being
independent and identically distributed, or "iid"), then they are
considered to be random. Ideally, the distribution of the numbers is uniform
over the interval of 0 to 1 (and inclusive of the endpoints).
Knowing the parameters of the distribution helps us in
evaluating how random the numbers are. Conversely, observing the numbers and
calculating the mean and variance helps identify the distribution. Given that
our random numbers are (ideally) uniformly distributed over [0,1], we know that
the mean should be 1/2 and the variance should turn out to be 1/12. There are
many other tests which can be performed against the generated numbers. Having a
mean of 1/2 and a variance of 1/12 are rough indicators of a good uniform distribution,
but the real tests are more concerned with uniformity and independence. Your random
numbers can have a mean of 1/2, for example, but not be uniformly distributed.
The following properties of a good random generator – fast,
portable, long enough cycle, replicable results and output being uniformly "iid"
– are present with Oracle. In fact, by using the same seed value used in the
following examples, you should be able to produce the same results. Oracle’s
SQL Reference Guide lists four arguments or procedures you can use with
DBMS_RANDOM: initialize, seed, random, and terminate. There are several points
missing in this documentation. First, the range of numbers is from (-)231
to (+)231, or +/- 2147483648. Second is that the number of digits
may be as many as ten, not eight. Lastly, there are other undocumented
functions. One such function is DBMS_RANDOM.VALUE, and it will return the type
of value we are more interested in (a number between zero and one). The other
hidden functions you can use return normally distributed numbers and strings of
varying length and case.
Let’s look at some output from the DBMS_RANDOM package and
see how Oracle’s random number generator performs. We will use a 6-digit seed
number (123456) and start by generating 1,000 numbers, then increasing by a
factor of ten up to ten million. The table name is RAND and has columns named
LINE and RNO (for random number).
SQL> DECLARE
2 v_rand number;
3 BEGIN
4 DBMS_RANDOM.INITIALIZE (123456);
5 FOR i IN 1..1000 LOOP
6 v_rand := DBMS_RANDOM.value;
7 INSERT into rand values (i,v_rand);
8 END LOOP;
9 END;
10 /PL/SQL procedure successfully completed.
Selecting the first 10 rows shows:
SQL> select * from rand
2 where line < 11;LINE RNO
———- —————————————-
1 0.9253168129811330987378779577193159262
2 0.3703059867076638894717777425502136731
3 0.8562787602662748879896983860530778367
4 0.8747769791015347163677476210098089609
5 0.8538887894283505001033221816233701639
6 0.0139762421028966557398918466225500621
7 0.6789827768885798969202524863427842743
8 0.1219758197605125529485878115247706788
9 0.6384861881298654042162612548721038633
10 0.506041552777518563552277905896430016110 rows selected.
How did the average and variance "perform?"
SQL> select avg(rno), variance(rno)
2 from rand;AVG(RNO) VARIANCE(RNO)
———- ————-
.505209167 .081572912
The average and variance we would expect is .50000000 and
.08333333. Continuing on with the output from tables with 10,000 to 10,000,000
rows, we will see an improvement in those indicators:
# of Rows |
Average |
Variance |
Time to generate |
1,000 |
.505209167 |
.081572912 |
00:00:00.01 |
10,000 |
.502495652 |
.082522109 |
00:00:00.05 |
100,000 |
.498821863 |
.083579021 |
00:00:06.01 |
1,000,000 |
.500032274 |
.083360802 |
00:01:01.05 |
10,000,000 |
.500036405 |
.083323331 |
00:12:33.02 |
Up until a million rows, the average and variance both
tended to converge to (but not actually reach) their expected values. At the
ten million row mark, only the variance improved. Again, the mean and variance
are not the true tests of uniformity and independence. Other tests, which
include the following – frequency, runs, autocorrelation, gap and poker – could
be used to test uniformity and independence. For example, if the numbers were
uniformly distributed, we would expect to see the same count of numbers in
whatever intervals we were interested in.
Using RANDOM instead of VALUE in the million row table
reflects the transformation of the Uniform(0,1) range of numbers to plus or
minus 2147483648. You can see the minimum and maximum numbers are close to 2147483648
and that there is very little repetition of numbers. Out of a million
generated numbers, 109 numbers were duplicated (a rate around .01%).
SQL> select min(rno), max(rno), count(distinct(rno))
2 from rand;MIN(RNO) MAX(RNO) COUNT(DISTINCT(RNO))
————- ———- ——————–
-2147479960 2147480366 999891
Looking at the scripts behind DBMS_RANDOM shows how the
numbers from DBMS_RANDOM.RANDOM are created. You can look at the scripts which
create this package, or view the text selected from all_source. Here is the
first part of the source:
SQL> select text from all_source where name = ‘DBMS_RANDOM’;
TEXT
——————————————————————————-
PACKAGE dbms_random AS————
— OVERVIEW
—
— This package should be installed as SYS. It generates a sequence of
— random 38-digit Oracle numbers. The expected length of the sequence
— is about power(10,28), which is hopefully long enough.
—
——–
— USAGE
—
— This is a random number generator. Do not use for cryptography.
— For more options the cryptographic toolkit should be used.
—
— By default, the package is initialized with the current user
— name, current time down to the second, and the current session.
—
— If this package is seeded twice with the same seed, then accessed
— in the same way, it will produce the same results in both cases.