As a DBA: Do you take things at face value? Do you try
things first? Why is it you do what you do?
It has been quite a while since I have made an installment
in my DBA Call to Action series. I never intended to write a series but I am
confronted with things, just as we all are, day to day that make me question
why I do things the way I do them; and sometimes, I question why my readers do
the things they do. I really do enjoy questions from my readers; everyone needs
help now and then, including me. However, sometimes I just wonder if we all
give ourselves enough credit to be able to find solutions on our own.
Problem solving is, in my opinion, high on the skill set
that all DBAs should have. DBAs don’t necessarily need to know how to solve
everything but they should have the ability to define a problem, research
solutions, and ultimately take action. I am often confused with the approach
taken by many. Instead, they will:
1. Assume
something is not working
2. Do a
lazy search on the net for a solution and ask friends and family
3. Implement
the solution
This scares the dickens out of me as there are definite
holes in this approach:
1. How
do we validate something is wrong or not working
2. Consider
the source. Not everything on the Net is true, much has never been tested, much
is to sell you something, many have no knowledge of YOUR problem, and the list
goes on
3. A
simple search, without any validation in test or QA, will be implemented
without question—opening up a complete can-of-worms for utter disaster
So, as many people before me have done, I’ll attempt to
explain away the problem solving cycle, with a bit of DBA clarity to boot. As
an example, I’ll be going through a recent question I received from a reader
about something I wrote.
Our Example: To get the use case going, I recently wrote
about how to configure Oracle to automatically start and stop on a Linux
system. I provided step by step instructions along with a start/stop script
called dbora, a common practice. In my script, I also called the dbstart script,
which every DBA should have at least heard of. Well, to get things going here,
the reader questioned me about not putting a call to start the Oracle listener
in my dbora script. As did a previous reader, I might add.
1. Know
the issue inside and out. This is the proverbial “don’t jump before
looking”. Validate the reasons for your understanding of a problem or issue. So
often, I see engineers begin to solve a problem only because someone said there
was a problem. There is no validation process of the problem. We are often
incident driven, meaning some event happens and then we react, only to find out
latter that it was a one time occurrence.
In our Example: My readers
obviously questioned whether I should be putting a call to start the Oracle
listener in my dbora script. They even pointed to an external website, which I won’t
mention, to prove their point that I should be including this call to LSNRCTL.
Now the real question here is how well did these readers actually research and
investigate the perceived problem? I never did go off and read the article that
they pointed to, wait one second, let me go look now. Ok, I’m back. You’ll
never guess this one, maybe you would have, but right at the top of the script,
this article references an Oracle 9.2.0 database, AND at the top of the article
it talks about Oracle 10.2. Now, I haven’t read the whole article and depending
on what they are trying to do, this 10.2 .vs. 9.2 might be totally in line. However,
I do know that my article was an Oracle 11.1.0.6 so things could be different.
This in itself should have triggered at least a slight concern for the readers.
Maybe it did, I’m just pointing out the version issue. Moreover, there are
other issues that should make you question a lot of what you find out on the
Net. Basically, does the Web content match my particular situation? Trust me,
things move too fast, not much hits 100% all the time.
2. Understand
what you’ve got. This is all about knowing if you really have a problem.
This goes back to asking yourself how you determine a problem really exists.
Does your system behave a particular way all the time, some of the time, or
just when you hear it from some user? If you don’t have anything in place to
tell you when something is wrong, how do you know when something IS wrong?
Clock-time, user-opinion, and hot C-level breath does not necessarily mean
something is wrong.
In our Example: I am
assuming that because the readers responded to my auto- start and stop article
that they actually had a system they have to start by hand. In this case, I
would congratulate the readers in finding my article and asking the question.
They obviously were doing research and trying to put the pieces together. Now
if the readers were just trying to find fault with my article, which does
happen quite often, then all I can say is skip to number three below.
3. Understand
what you’re working with. Know your system, trust your research, and
converge the two. This is probably the most difficult part of problem solving.
You have to merge what you know, what your system is doing, and the research
you have found. A miscalculation could be disastrous. That is why within this
step you must bring these items together and actually test what you postulate.
Thorough trial and error, especially when first starting out, is the only way
you can validate what you have and where you want to go.
In our Example: This is the
part that I honestly have to be a bit confused about. Regardless of the
scenario in step 2, all researchers should validate their hypothesis. This is
where you gain wisdom through trial and error. Through investigation and
understanding what they have to work with, these readers should have soon found
out that the dbstart script with Oracle 11 does a call to start the listener;
so no need to put the call in my script. Let me make this clear, it isn’t about
my article against someone else’s. It’s about making sure your problem
can be remedied by my solution. Taking this a step further, when I answered
these readers, I didn’t just type out a response re-stating my
solution/position. I stripped down my machine and re-tested what I wrote. Then
and only then did I answer the question that was posed to me. Yes, this took
about 1 hour of my time. Regardless, those of us who write solutions need to be
very careful not to lead someone astray.
4. Plan
of attack. This should be semi-obvious but depending on if you are in the
testing stage or pre-deployment to production stage, you should be planning
what you will do. I suggest an almost cook-book approach, putting your steps on
paper and then following them up with outputs as you execute the steps in your
process.
In our Example: I have
already stated that I re-tested my scripts before giving an answer back to my
readers. My readers should have done the same. Moreover, when I re-tested, I
did follow my step-by-step approach, giving commands and following up with
output and explanation. Your plan of attack, no matter how much you trust your
understanding of a solution should always be ready to catch flaws that might
creep in because of some un-known variable.
5. Execution
with validation: Ok, just do it. But again, for posterity’s sake, record
your findings, make sure they align with your test results, and then sleep
easy.
In solving any problem, research is inevitable. Trusting and
validating that research is another thing. It is hard to validate everything we
come up against. Systems are just too complex to have one sitting in a corner
waiting to be used. However, when it comes to solving problems that will be deployed
to a production environment it is our responsibility as DBAs to test these
solutions before we accept them as fact.