What we would like to do is to run the end-of-day report that
tells us how much money is in the bank. That is an extremely
simple query:
select sum(account_balance)
from accounts;
And, of course, in this example the answer is obvious: $1250.
However, what happens if we read row 1, and while we're reading
rows 2 and 3, an Automated Teller Machine (ATM) generates
transactions against this table, and moves $400 from account 123
to account 456? Our query counts $500 in row 4 and comes up with
the answer of $1650, doesn't it? Well, of course, this is to be
avoided, as it would be an error — at no time did this sum
of money exist in the account balance column. It is the way in
which Oracle avoids such occurrences, and how Oracle's methods
differ from every other database, that you need to understand.
In practically every other database, if you wanted to get a
'consistent' and 'correct' answer to this query, you would either
have to lock the whole table while the sum was calculated or you
would have to lock the rows as you read them. This would prevent
people from changing the answer as you are getting it. If you
lock the table up-front, you'll get the answer that was in the
database at the time the query began. If you lock the data as you
read it (commonly referred to as a shared read lock, which
prevents updates but not other readers from accessing the data),
you'll get the answer that was in the database at the point the
query finished. Both of these methods inhibit concurrency a great
deal. The table lock would prevent any updates from taking place
against the entire table for the duration of your query (for a
table of four rows, this would only be a very short period
— but for tables with hundred of thousands of rows, this
could be several minutes). The 'lock as you go' method would
prevent updates on data you have read and already processed and
could actually cause deadlocks between your query and other
updates.
Now, I said earlier that you would not be able to take full
advantage of Oracle if you did not understand the concept of
multi-versioning. Here is one reason why that is true. Oracle
uses multi-versioning to get the answer, as it existed at the
point in time the query began, and the query will take place
without locking a single thing (while our account transfer
transaction updates rows 1 and 4, these rows will be locked to
other writers — but not locked to other readers, such as
our SELECT SUM...query). In fact, Oracle doesn't have a 'shared
read' lock common in other databases — it does not need it.
Everything inhibiting concurrency that can be removed, has been
removed.
So, how does Oracle get the correct, consistent answer ($1250)
during a read without locking any data— in other words,
without decreasing concurrency? The secret lies in the
transactional mechanisms that Oracle uses. Whenever you modify
data, Oracle creates entries in two different locations. One
entry goes to the redo logs where Oracle stores enough
information to redo or 'roll forward' the transaction. For an
insert this would be the row inserted. For a delete, it is a
message to delete the row in file X, block Y, row slot Z. And so
on. The other entry is an undo entry, written to a rollback
segment. If your transaction fails and needs to be undone, Oracle
will read the 'before' image from the rollback segment and
restore the data. In addition to using this rollback segment data
to undo transactions, Oracle uses it to undo changes to blocks as
it is reading them — to restore the block to the point in
time your query began. This gives you the ability to read right
through a lock and to get consistent, correct answers without
locking any data yourself.
So, as far as our example is concerned, Oracle arrives at its
answer as follows:
At time T6, Oracle is effectively 'reading through' the lock
placed on row 4 by our transaction. This is how non-blocking
reads are implemented — Oracle only looks to see if the
data changed, it does not care if the data is currently locked
(which implies that it has changed). It will simply retrieve the
old value from the rollback segment and proceed onto the next
block of data.
This is another clear demonstration of multi-versioning —
there are multiple versions of the same piece of information, all
at different points in time, available in the database. Oracle is
able to make use of these 'snapshots' of data at different points
in time to provide us with read-consistent queries and non-
blocking queries.
This read-consistent view of data is always performed at the SQL
statement level, the results of any single SQL statement are
consistent with respect to the point in time they began. This
quality is what makes a statement like the following insert a
predictable set of data:
for x in (select * from t)
loop
insert into t values (x.username, x.user_id, x.created);
end loop;
The result of the SELECT * FROM T is preordained when the query
begins execution. The SELECT will not see any of the new data
generated by the INSERT. Imagine if it did — this statement
might be a never-ending loop. If, as the INSERT generated more
rows in CUSTOMER, the SELECT could 'see' those newly inserted
rows — the above piece of code would create some unknown
number of rows. If the table T started out with 10 rows, we might
end up with 20, 21, 23, or an infinite number of rows in T when
we finished. It would be totally unpredictable. This consistent
read is provided to all statements so that an INSERT such as the
following is predicable as well:
insert into t select * from t;
The INSERT statement will with be provided a read-consistent view
of T — it will not see the rows that it itself just
inserted, it will only insert the rows that existed at the time
the INSERT began. Many databases won't even permit recursive
statements such as the above due to the fact that they cannot
tell how many rows might actually be inserted.
So, if you are used to the way other databases work with respect
to query consistency and concurrency, or you have never had to
grapple with such concepts (no real database experience), you can
now see how understanding how this works will be important to
you. In order to maximize Oracle's potential, you need to
understand these issues as they pertain to Oracle — not how
they are implemented in other databases.