Concurrency control is one area where databases differentiate
themselves. It is an area that sets a database apart from a file
system and that sets databases apart from each other. As a
programmer, it is vital that your database application works
correctly under concurrent access conditions, and yet this is
something people fail to test time and time again. Techniques
that work well if everything happens consecutively do not work so
well when everyone does them simultaneously. If you don't have a
good grasp of how your particular database implements concurrency
control mechanisms, then you will:
- Corrupt the integrity of your data.
- Run slower than you should with a small number of users.
- Decrease your ability to scale to a large number of users.
Notice I don't say, 'you might...' or 'you run the risk of...'
but rather that invariably you will do these things. You will do
these things without even realizing it. Without correct
concurrency control, you will corrupt the integrity of your
database because something that works in isolation will not work
as you expect in a multi-user situation. You will run slower than
you should because you'll end up waiting for data. You'll lose
your ability to scale because of locking and contention issues.
As the queues to access a resource get longer, the wait gets
longer and longer. An analogy here would be a backup at a
tollbooth. If cars arrive in an orderly, predictable fashion, one
after the other, we never have a backup. If many cars arrive
simultaneously, queues start to form. Furthermore, the waiting
time does not increase in line with the number of cars at the
booth. After a certain point we are spending considerable
additional time 'managing' the people that are waiting in line,
as well as servicing them (in the database, we would talk about
context switching).
Concurrency issues are the hardest to track down — the
problem is similar to debugging a multi-threaded program. The
program may work fine in the controlled, artificial environment
of the debugger but crashes horribly in the 'real world'. For
example, under 'race conditions' you find that two threads can
end up modifying the same data structure simultaneously. These
kinds of bugs are terribly hard to track down and fix. If you
only test your application in isolation and then deploy it to
dozens of concurrent users, you are likely to be (painfully)
exposed to an undetected concurrency issue. Over the next two
sections, I'll relate two small examples of how the lack of
understanding concurrency control can ruin your data or inhibit
performance and scalability.
Implementing Locking
The database uses locks to ensure that, at most, one transaction
is modifying a given piece of data at any given time. Basically,
they are the mechanism that allows for concurrency —
without some locking model to prevent concurrent updates to the
same row, for example, multi-user access would not be possible in
a database. However, if overused or used improperly, locks can
actually inhibit concurrency. If you or the database itself locks
data unnecessarily, then fewer people will be able to
concurrently perform operations. Thus, understanding what locking
is and how it works in your database is vital if you are to
develop a scalable, correct application.
What is also vital is that you understand that each database
implements locking differently. Some have page-level locking,
others row level; some implementations escalate locks from row-
level to page-level, some do not; some use read locks, others do
not; some implement serializable transactions via locking and
others via read-consistent views of data (no locks). These small
differences can balloon into huge performance issues or downright
bugs in your application if you do not understand how they work.
The following points sum up Oracle's locking policy:
- Oracle locks data at the row level on modification only.
There is no lock escalation to a block or table level, ever.
- Oracle never locks data just to read it. There are no locks
placed on rows of data by simple reads.
- A writer of data does not block a reader of data. Let me
repeat — reads are not blocked by writes. This is
fundamentally different from almost every other database, where
reads are blocked by writes.
- A writer of data is blocked only when another writer of data
has already locked the row it was going after. A reader of data
never blocks a writer of data.
These facts must be taken into consideration when developing your
application and you must also realize that this policy is unique
to Oracle. A developer who does not understand how his or her
database handles concurrency will certainly encounter data
integrity issues (this is particularly common when a developer
moves from another database to Oracle, or vice versa, and
neglects to take the differing concurrency mechanisms into
account in their application.
One of the side-effects of Oracle's 'non-blocking' approach is
that if you actually want to ensure that no more than one user
has access to a row at once, then you, the developer, need to do
a little work yourself. Consider the following example. A
developer was demonstrating to me a resource-scheduling program
(for conference rooms, projectors, etc.) that he had just
developed and was in the process of deploying. The application
implemented a business rule to prevent the allocation of a
resource to more than one person, for any given period of time.
That is, the application contained code that specifically checked
that no other user had previously allocated the time slot (as
least the developer thought it did). This code queried the
schedules table and, if no rows existed that overlapped that time
slot, inserted the new row. So, the developer was basically
concerned with two tables:
create table resources ( resource_name varchar2(25) primary key, ... );
create table schedules( resource_name varchar2(25) references resources,
start_time date,
end_time date );
And, before making, say, a room reservation, the application would query:
select count(*)
from schedules
where resource_name = :room_name
and (start_time between :new_start_time and :new_end_time
rr
end_time between :new_start_time and :new_end_time)
It looked simple and bullet-proof (to the developer anyway); if
the count came back zero, the room was yours. If it came back
non-zero, you could not reserve it for that period. Once I knew
what his logic was, I set up a very simple test to show him the
error that would occur when the application went live. An error
that would be incredibly hard to track down and diagnose after
the fact — one would be convinced it must be a database
bug.