Oracle DBA Job Interview - Nailing a Fairly Common Question

While not that technical in nature, this one question gives you an opportunity to list reasons or possibilities of a technical nature, which in turn allow you to demonstrate your knowledge and understanding of how Oracle Database works, and perhaps in this case, how it may not be working.

Oracle DBA job interviews
seem to be of three types: almost completely non-technical, a smattering of
questions generally at the easy to cream puff level, and the truly nitty-gritty
down and dirty. You can almost never over-study if you encounter the last type,
and you can rest assured that anyone at any time can be asked a question that
produces a less than completely accurate answer. If you run into either the
first or second types, don’t be discouraged and feel like your studying was a
waste of time because chances are, you did learn something during the
preparation effort.

Regardless of the degree of
technical difficulty, one question frequently encountered during interviews,
predicated on a user scenario, is this: A user comes up to you (or calls you)
and informs you that application X is running slow. What do you do or what
would you look at? Application X, of course, is an application you have no
control over.

Although this question is not
that technical in nature, it does you give you an opportunity to list reasons
or possibilities of a technical nature, which in turn allow you to demonstrate
your knowledge and understanding of how Oracle works, and perhaps in this case,
how Oracle may not be working.

With some of the reasons
listed below, you can argue that you would already know (before the user does)
that a problem is taking place within Oracle. Maybe, maybe not. For a problem
that would otherwise generate alarms, alerts, bells and whistles, maybe you are
not at your desk, you are not the on-call DBA with the pager, you are working
from home, or for whatever reason, your monitoring/alert/notification system
has not reached out and touched you yet. And besides, it may not be an Oracle problem
to begin with.

Database is hung
Statistics
Indexes
Underlying structure change
Blocking
Intermediate connections/other systems involved
Application changes
Network issues
Tying it all together
In Closing

Database is hung

Encountering a full archived redo log file destination is probably the most well known cause of a hung
database, and the answer is likewise also well known: add more space, move
files, or change the destination. What happens when you are using more than one
log destination and you have specified a “min succeed” value, and further, one
of those destinations cannot be written to? Disk space usage issues aside, what
if that particular destination is simply not available? You would probably
encounter ORA-00272: error writing archive log string, and the corrective
action is to “Check that the output device is still available and correct any
device errors that may have occurred.”

What is another destination
susceptible to space issues? What happens when the Flash Recovery Area becomes
full? The FRA may not have reached its max size as a result of newly archived
redo logs, but rather, it may have become full due to increases in the size of
flashback logs and backups. Then again, it could also be due to archived redo
logs because as you may recall, ARCHIVE_LOG_DEST_10 defaults to the FRA.

Speaking of running out of
space, what happens if space runs out elsewhere? For a tablespace, using the
resumable timeout feature can create the appearance of a hung database as
Oracle waits for you to take care of the space problem.

So, with a short, practiced
answer, you can take the “common” archiver hung problem/resolution and turn it
into a demonstration regarding your knowledge of archiving, the Flash Recovery
Area, and the resumable mode feature. It may not have been the user who filled
up the tablespace, but some other session running a bulk load, so it may worth
mentioning that too (coordinate with the user to resume his operation after
you’ve added space).

Statistics

Instead of just saying “bad
statistics,” be more specific. What does bad mean? One example of bad has to do
with a grossly inaccurate row count. You could have inherited an old database,
poked around and saw that tables hadn’t been analyzed lately (or never), and
gathered statistics on them. At the time stats were being gathered, Table A
could have had a very low row count, so low in fact, that any statement against
it would cause the optimizer to use a full table scan.

You just caught the table at
a bad time, that is, it is normally much larger in size and simply hasn’t been
populated for today’s daily load or end of month processing. When the
user/application hits the table, Oracle goes for a full table scan against
something with hundreds of thousands of rows compared to the stats information
of maybe one hundred. Oops, that is your bad. A loss of statistics is also
another consideration (maybe another DBA did something to clear statistics, or
the table was recently recovered, etc.).

Indexes

Maybe something happened to
one or more indexes. For example, the user could have been running a query
against a partitioned table. Someone else may have truncated a partition
without keeping indexes usable, and boom, there goes your global index of
interest.

Underlying structure change

Since we’re on the topic of
partitions, what goes along those? Parallel query operations. Perhaps the
degree of parallelism changed on a table used by the application. The change
can go both ways: too much and not enough, but either way, a change in DOP can
cause a normally fast query to suffer a decrease in performance.

Blocking

Always a silent performance
killer, unless you go looking for it. In this area, mention the ways in which
blocking can be detected, give a simple example of how it can take place (maybe
the application does a select for update and another user hasn’t committed
yet), and for bonus points, contrast blocking with deadlocks. If you mention
blocking, be prepared to answer how it can be prevented or reduced.

Intermediate connections/other systems involved

Leave it to (bad) .NET
developers for this one: the application connects to a Windows server, and that
server then connects to Oracle. The application can connect to the server, but
for whatever reason, the server has lost contact with Oracle (typically some
service failed or failed to start after a reboot). The user is left with an
endlessly spinning hourglass or a fully-grown status bar on the application’s
interface, and no indication that there is a problem elsewhere. You try to find
the user’s session and it does not exist. Your knowledge of existing
systems/how things work within the company is critical here.

Related to this problem
category, perhaps the application uses multiple connections, maybe even
involving other database systems. Oracle may be working fine, but the
application’s connection to SQL Server has problems. Do you know how the
application works?

Application changes

The situation could be that
the application just underwent an upgrade or patch release, and the user is the
first to have found a bug or untested piece of the puzzle.

Network issues

The user could be on a
subnet where a huge demand upon its bandwidth has been initiated. For example,
a nightly backup failed, so system admins/backup operators decide to run the
backup during business hours. Normally, there is no conflict or bandwidth
contention during the day because the user load is so (relatively) low. But,
when combined with what takes place during a backup, the mix is not so good.

Tying it all together

Now that you’re armed with a
list of possibilities, how are you going to explore the issues? You might
mention the use of a combination of tools. Do you have canned scripts that can
query some of these issues? Which GUI tool would you mention? You could bring
up an example of how to detect item Y in Enterprise Manager (Database Control)
and how to do the same in Toad. Right away, that shows you have experience or
are at least knowledgeable about two different GUI tools. You could say how you
would use Toad to find the user’s session, look at the current SQL, see if
there is any IO taking place, look at current wait events, and perhaps turn on
tracing in a remote session (the user’s session).

In Closing

For a fairly common question
presented to you during a job interview, which at face value appears
simplistic, it is really an opportunity for you to shine by being able to
rattle off a fairly comprehensive list of possible suspects, followed by a
polished answer on how you would employ a methodology (this is what I’d look
for versus this is how I would look for it). Overall, it is worthwhile to have
a polished and practiced answer to this commonly asked question.

»

See All Articles by ColumnistSteve Callan

Oracle DBA Job Interview – Nailing a Fairly Common Question