Oracle DBA Job Interview - Nailing a Fairly Common Question
February 24, 2010
While not that technical in nature, this one question gives you an opportunity to list reasons or possibilities of a technical nature, which in turn allow you to demonstrate your knowledge and understanding of how Oracle Database works, and perhaps in this case, how it may not be working.
Oracle DBA job interviews seem to be of three types: almost completely non-technical, a smattering of questions generally at the easy to cream puff level, and the truly nitty-gritty down and dirty. You can almost never over-study if you encounter the last type, and you can rest assured that anyone at any time can be asked a question that produces a less than completely accurate answer. If you run into either the first or second types, dont be discouraged and feel like your studying was a waste of time because chances are, you did learn something during the preparation effort.
Regardless of the degree of technical difficulty, one question frequently encountered during interviews, predicated on a user scenario, is this: A user comes up to you (or calls you) and informs you that application X is running slow. What do you do or what would you look at? Application X, of course, is an application you have no control over.
Although this question is not that technical in nature, it does you give you an opportunity to list reasons or possibilities of a technical nature, which in turn allow you to demonstrate your knowledge and understanding of how Oracle works, and perhaps in this case, how Oracle may not be working.
With some of the reasons listed below, you can argue that you would already know (before the user does) that a problem is taking place within Oracle. Maybe, maybe not. For a problem that would otherwise generate alarms, alerts, bells and whistles, maybe you are not at your desk, you are not the on-call DBA with the pager, you are working from home, or for whatever reason, your monitoring/alert/notification system has not reached out and touched you yet. And besides, it may not be an Oracle problem to begin with.
Database is hung
Encountering a full archived redo log file destination is probably the most well known cause of a hung database, and the answer is likewise also well known: add more space, move files, or change the destination. What happens when you are using more than one log destination and you have specified a min succeed value, and further, one of those destinations cannot be written to? Disk space usage issues aside, what if that particular destination is simply not available? You would probably encounter ORA-00272: error writing archive log string, and the corrective action is to Check that the output device is still available and correct any device errors that may have occurred.
What is another destination susceptible to space issues? What happens when the Flash Recovery Area becomes full? The FRA may not have reached its max size as a result of newly archived redo logs, but rather, it may have become full due to increases in the size of flashback logs and backups. Then again, it could also be due to archived redo logs because as you may recall, ARCHIVE_LOG_DEST_10 defaults to the FRA.
Speaking of running out of space, what happens if space runs out elsewhere? For a tablespace, using the resumable timeout feature can create the appearance of a hung database as Oracle waits for you to take care of the space problem.
So, with a short, practiced answer, you can take the common archiver hung problem/resolution and turn it into a demonstration regarding your knowledge of archiving, the Flash Recovery Area, and the resumable mode feature. It may not have been the user who filled up the tablespace, but some other session running a bulk load, so it may worth mentioning that too (coordinate with the user to resume his operation after youve added space).
Instead of just saying bad statistics, be more specific. What does bad mean? One example of bad has to do with a grossly inaccurate row count. You could have inherited an old database, poked around and saw that tables hadnt been analyzed lately (or never), and gathered statistics on them. At the time stats were being gathered, Table A could have had a very low row count, so low in fact, that any statement against it would cause the optimizer to use a full table scan.
You just caught the table at a bad time, that is, it is normally much larger in size and simply hasnt been populated for todays daily load or end of month processing. When the user/application hits the table, Oracle goes for a full table scan against something with hundreds of thousands of rows compared to the stats information of maybe one hundred. Oops, that is your bad. A loss of statistics is also another consideration (maybe another DBA did something to clear statistics, or the table was recently recovered, etc.).
Maybe something happened to one or more indexes. For example, the user could have been running a query against a partitioned table. Someone else may have truncated a partition without keeping indexes usable, and boom, there goes your global index of interest.
Underlying structure change
Since were on the topic of partitions, what goes along those? Parallel query operations. Perhaps the degree of parallelism changed on a table used by the application. The change can go both ways: too much and not enough, but either way, a change in DOP can cause a normally fast query to suffer a decrease in performance.
Always a silent performance killer, unless you go looking for it. In this area, mention the ways in which blocking can be detected, give a simple example of how it can take place (maybe the application does a select for update and another user hasnt committed yet), and for bonus points, contrast blocking with deadlocks. If you mention blocking, be prepared to answer how it can be prevented or reduced.
Intermediate connections/other systems involved
Leave it to (bad) .NET developers for this one: the application connects to a Windows server, and that server then connects to Oracle. The application can connect to the server, but for whatever reason, the server has lost contact with Oracle (typically some service failed or failed to start after a reboot). The user is left with an endlessly spinning hourglass or a fully-grown status bar on the applications interface, and no indication that there is a problem elsewhere. You try to find the users session and it does not exist. Your knowledge of existing systems/how things work within the company is critical here.
Related to this problem category, perhaps the application uses multiple connections, maybe even involving other database systems. Oracle may be working fine, but the applications connection to SQL Server has problems. Do you know how the application works?
The situation could be that the application just underwent an upgrade or patch release, and the user is the first to have found a bug or untested piece of the puzzle.
The user could be on a subnet where a huge demand upon its bandwidth has been initiated. For example, a nightly backup failed, so system admins/backup operators decide to run the backup during business hours. Normally, there is no conflict or bandwidth contention during the day because the user load is so (relatively) low. But, when combined with what takes place during a backup, the mix is not so good.
Tying it all together
Now that youre armed with a list of possibilities, how are you going to explore the issues? You might mention the use of a combination of tools. Do you have canned scripts that can query some of these issues? Which GUI tool would you mention? You could bring up an example of how to detect item Y in Enterprise Manager (Database Control) and how to do the same in Toad. Right away, that shows you have experience or are at least knowledgeable about two different GUI tools. You could say how you would use Toad to find the users session, look at the current SQL, see if there is any IO taking place, look at current wait events, and perhaps turn on tracing in a remote session (the users session).
For a fairly common question presented to you during a job interview, which at face value appears simplistic, it is really an opportunity for you to shine by being able to rattle off a fairly comprehensive list of possible suspects, followed by a polished answer on how you would employ a methodology (this is what Id look for versus this is how I would look for it). Overall, it is worthwhile to have a polished and practiced answer to this commonly asked question.