## Unwrapping Oracle's DBMS Packages: Understanding Oracle's Random Number GeneratorApril 22, 2004 Beginning with later versions of Oracle8, Oracle has provided a means of generating random numbers. This built-in package, DBMS_RANDOM, is fairly simple to use, and can generate random numbers which are generally good enough for the needs of most users. If you need to generate a large amount of data without having to provide a lot thought about how random the data is, then DBMS_RANDOM will suit your needs. If you need to encrypt sensitive data, then you should use Oralce9i's DBMS_OBSFUCATION_TOOLKIT feature. Oracle tells users "Do not use DBMS_RANDOM as it is unsuitable for cryptographic key generation." Is there something wrong with DBMS_RANDOM? Aren't the numbers returned random enough? Don't you get the same output when given the same input? The answers are a combination of "yes" and "no." By un-wrapping the package Oracle uses to create the random number generator, we will learn quite a bit about how DBMS_RANDOM works and what its limitations are. Before looking at the package and some examples of how it can be used, the meaning of "random" needs to be clarified. Technically speaking, generating random numbers by a known method removes the potential for true randomness. When generated in this manner, the numbers can be properly described as pseudo-random numbers. However, if the pseudo-random numbers meet several conditions or tests (chiefly, the numbers being independent and identically distributed, or "iid"), then they are considered to be random. Ideally, the distribution of the numbers is uniform over the interval of 0 to 1 (and inclusive of the endpoints). Knowing the parameters of the distribution helps us in evaluating how random the numbers are. Conversely, observing the numbers and calculating the mean and variance helps identify the distribution. Given that our random numbers are (ideally) uniformly distributed over [0,1], we know that the mean should be 1/2 and the variance should turn out to be 1/12. There are many other tests which can be performed against the generated numbers. Having a mean of 1/2 and a variance of 1/12 are rough indicators of a good uniform distribution, but the real tests are more concerned with uniformity and independence. Your random numbers can have a mean of 1/2, for example, but not be uniformly distributed. The following properties of a good random generator - fast,
portable, long enough cycle, replicable results and output being uniformly "iid"
- are present with Oracle. In fact, by using the same seed value used in the
following examples, you should be able to produce the same results. Oracle's
SQL Reference Guide lists four arguments or procedures you can use with
DBMS_RANDOM: initialize, seed, random, and terminate. There are several points
missing in this documentation. First, the range of numbers is from (-)2 Let's look at some output from the DBMS_RANDOM package and see how Oracle's random number generator performs. We will use a 6-digit seed number (123456) and start by generating 1,000 numbers, then increasing by a factor of ten up to ten million. The table name is RAND and has columns named LINE and RNO (for random number). SQL> DECLARE 2 v_rand number; 3 BEGIN 4 DBMS_RANDOM.INITIALIZE (123456); 5 FOR i IN 1..1000 LOOP 6 v_rand := DBMS_RANDOM.value; 7 INSERT into rand values (i,v_rand); 8 END LOOP; 9 END; 10 / PL/SQL procedure successfully completed. Selecting the first 10 rows shows: SQL> select * from rand 2 where line < 11; LINE RNO ---------- ---------------------------------------- 1 0.9253168129811330987378779577193159262 2 0.3703059867076638894717777425502136731 3 0.8562787602662748879896983860530778367 4 0.8747769791015347163677476210098089609 5 0.8538887894283505001033221816233701639 6 0.0139762421028966557398918466225500621 7 0.6789827768885798969202524863427842743 8 0.1219758197605125529485878115247706788 9 0.6384861881298654042162612548721038633 10 0.5060415527775185635522779058964300161 10 rows selected. How did the average and variance "perform?" SQL> select avg(rno), variance(rno) 2 from rand; AVG(RNO) VARIANCE(RNO) ---------- ------------- .505209167 .081572912 The average and variance we would expect is .50000000 and .08333333. Continuing on with the output from tables with 10,000 to 10,000,000 rows, we will see an improvement in those indicators:
Up until a million rows, the average and variance both tended to converge to (but not actually reach) their expected values. At the ten million row mark, only the variance improved. Again, the mean and variance are not the true tests of uniformity and independence. Other tests, which include the following - frequency, runs, autocorrelation, gap and poker - could be used to test uniformity and independence. For example, if the numbers were uniformly distributed, we would expect to see the same count of numbers in whatever intervals we were interested in. Using RANDOM instead of VALUE in the million row table reflects the transformation of the Uniform(0,1) range of numbers to plus or minus 2147483648. You can see the minimum and maximum numbers are close to 2147483648 and that there is very little repetition of numbers. Out of a million generated numbers, 109 numbers were duplicated (a rate around .01%). SQL> select min(rno), max(rno), count(distinct(rno)) 2 from rand; MIN(RNO) MAX(RNO) COUNT(DISTINCT(RNO)) ------------- ---------- -------------------- -2147479960 2147480366 999891 Looking at the scripts behind DBMS_RANDOM shows how the numbers from DBMS_RANDOM.RANDOM are created. You can look at the scripts which create this package, or view the text selected from all_source. Here is the first part of the source: SQL> select text from all_source where name = 'DBMS_RANDOM'; TEXT ------------------------------------------------------------------------------- PACKAGE dbms_random AS ------------ -- OVERVIEW -- -- This package should be installed as SYS. It generates a sequence of -- random 38-digit Oracle numbers. The expected length of the sequence -- is about power(10,28), which is hopefully long enough. -- -------- -- USAGE -- -- This is a random number generator. Do not use for cryptography. -- For more options the cryptographic toolkit should be used. -- -- By default, the package is initialized with the current user -- name, current time down to the second, and the current session. -- -- If this package is seeded twice with the same seed, then accessed -- in the same way, it will produce the same results in both cases. |