One of Oracle’s most
critical and important functions is how it keeps track of events. Even within a
single processor, “simple” database system, keeping track of events is just as
important as recording and maintaining the proper order of events in a
distributed system. Recording the “when” in “when did event X take place?” can
be problematic, so a process that is complete (in the mathematical
sense) is essential. In terms of an algorithm, does an algorithm result in the
same/consistent answer each and every time the same circumstances or conditions
exist?
Let’s take an astrological
event as an example to help frame what “when” means. Suppose you were born in
1970. Twenty years later you see a star go supernova. In one sense or frame of
reference, the event occurred in 1990. In another frame of reference, the
event, given that the star was 20 light-years away, took place the same year
our notional observer was born. Depending on the context, the event time stamps
of 1970 and 1990 are equally valid.
In an example closer to
home, how would you keep track of when database events (both data manipulation
and definition) take place? Within a subset of the database, using your schema and
transactions as an example, keeping track of your ordered set of operations is
simpler to maintain than when comparing or ordering your transactions with
respect to someone else’s. Your ordered set is only partially ordered. Clearly,
the ordering of events is essential for many reasons, perhaps the most
important being for recovery. Did your commit take place before mine, or did
mine occur before yours but was recorded later?
How do you order events?
If you were to describe the situation
of recording a series of ordered events, perhaps a title of “Time, Clocks and the Ordering of Events in a Distributed
System” would aptly describe the situation. One way to keep track of events
is to record their timestamps, but this can introduce another factor, that of
maintaining the accuracy and synchronization of distributed clocks. But what
type of clock are we referring to? No doubt, the clock you have in mind is a
physical clock.
What if the clock were
logical instead of physical? This is exactly the point raised by Dr. Leslie Lamport
in his 1978 paper titled “Time, Clocks and the Ordering of Events in a
Distributed System.” In such a system, a clock condition can be described as:
For
any events a, b, if a occurs before b, then C(a) < C(b).
The clock time of event a,
however the clock is defined, is before that of event b. In a computing system
with multiple processes, the clock condition is met when the two conditions
stated below hold.
C1.
If a and b are events in process Pi, and a comes before b, the Ci(a)
< Ci(b).
C2.
If a is the sending of a message by process Pi and b is the receipt
of that message by process Pj, then Ci(a) < Cj(b).
Now that we have some
conditions, the next item to consider is their implementation. One rule (applies
chiefly to condition one) is that process Pi increments Ci
between any two successive events. A second rule implements the sending of a
timestamp along with a message. Upon receipt of a message from another process,
the receiving process sets its time to be greater than or equal to its present
value and greater than the timestamp of the message.
Dr. Lamport’s paper presents
an algorithm which orders events totally. The context of “totally” is across
the entire system, not just partially ordered within one process out of many.
If our system of interest is a database with multiple users, all of whom are
vying for the same resource, then the algorithm must satisfy three conditions.
-
A process which has been
granted the resource must release it before it can be granted to another
process. -
Different requests for the
resource must be granted in the order in which they are made. -
If every process which is
granted the resource eventually releases it, then every request is eventually
granted.
The algorithm’s five steps
can be summarized as follows.
-
To request a resource, a process
sends a time stamped message to all other processes. -
A receiving process places the
“request resource” message in its own request queue and sends a time stamped acknowledgement
back to the sender. -
To release a resource, the owning
process sends a similar time stamped message (“releases resource”) to all other
processes. -
Upon receipt of a release
resource message, the “request resource” message is removed from the request
queue. -
A requesting process is granted
the resource when its time stamped request is ordered before any other requests
and when it has received an acknowledgement message time stamped later than the
initial request.
The relevance of Lamport’s algorithm to Oracle
By now the time stamping of
messages, use of clocks, and ordering of events must sound very much like “a
stamp that defines a committed version of a database at a point in time.” In
other words, the implementation of Dr. Lamport’s algorithm within Oracle is
what we know as an SCN generation scheme. A search for “Lamport SCN generation”
in Oracle’s documentation will show several references (primarily in RAC) to
this scheme, but none of them really defines what the scheme is or how it
works.
The astute reader may have
noticed I did not say this is “the” scheme (as in the one and only scheme).
Oracle uses several schemes in addition to the Lamport SCN generation scheme. Peek
inside your alert log and you will see something along the lines of what is
shown below.
Picked
broadcast on commit scheme to generate SCNs
Picked
latch-free SCN scheme 2
Picked
latch-free SCN scheme 3
MetaLink Note 260304.1 defines
what broadcast on commit is or does: “At every commit, the most recent local
SCN will be broadcasted. Read consistency is fully guaranteed for any kind of
workload, since every change in the local SCN is immediately propagated.” Aside
from this description, there is a general lack of public information about how
these schemes work.
In addition to the alert
log, a hidden parameter will also reveal the current SCN generation scheme. Is
the generation scheme modifiable? A query like the one below shows the answer.
SELECT X.KSPPINM NAME,
DECODE(BITAND(KSPPIFLG/256, 1), 1, ‘TRUE’, ‘FALSE’) SESMOD,
DECODE( BITAND(KSPPIFLG/65536, 3), 1, ‘IMMEDIATE’, 2, ‘DEFERRED’, 3, ‘IMMEDIATE’, ‘FALSE’ ) SYSMOD,
KSPPDESC DESCRIPTION
FROM SYS.X_$KSPPI X
WHERE X.INST_ID = USERENV(‘INSTANCE’)
AND TRANSLATE(KSPPINM,’_’,’#’) LIKE ‘#%’
AND X.KSPPINM like ‘%scn_scheme%’;
NAME SESMOD SYSMOD DESCRIPTION
———— ——- ———- ————
_scn_scheme FALSE FALSE SCN scheme
This query results in no
rows returned if using 10g Release 2 (the parameter is not present), which goes
to the familiar bromide of don’t count on being able to use hidden parameters.
In Closing
The time stamping and
ordering of events within Oracle is of prime importance and goes to the heart
of how transaction control and serialization are even possible. The total
ordering of disparate events, whether within a simple system or a distributed
one, is not just essential, but mandatory. Take the date of Dr. Lamport’s paper
into consideration and then recall when Oracle’s version 2 was released (no
version 1, long marketing spin story documented elsewhere). The tie-in is not
hard to miss. There was some very clever thought that went into the design of
Oracle, and at the same time, some very clever adaptation of other people’s
work, the algorithm stated in Dr. Lamport’s paper being one such example.
References:
Lamport, Leslie, “Time, Clocks
and the Ordering of Events in a Distributed System.”
Communications of the ACM 21, 7 (July 1978), 558-565.
Reprinted in several collections, including Distributed Computing: Concepts
and Implementations, McEntire et al., ed. IEEE Press, 1984.
See www.lamport.org for other related articles.