DBA Call to Action: How You Do What You Do
September 10, 2008
As a DBA: Do you take things at face value? Do you try things first? Why is it you do what you do?
It has been quite a while since I have made an installment in my DBA Call to Action series. I never intended to write a series but I am confronted with things, just as we all are, day to day that make me question why I do things the way I do them; and sometimes, I question why my readers do the things they do. I really do enjoy questions from my readers; everyone needs help now and then, including me. However, sometimes I just wonder if we all give ourselves enough credit to be able to find solutions on our own.
Problem solving is, in my opinion, high on the skill set that all DBAs should have. DBAs dont necessarily need to know how to solve everything but they should have the ability to define a problem, research solutions, and ultimately take action. I am often confused with the approach taken by many. Instead, they will:
1. Assume something is not working
2. Do a lazy search on the net for a solution and ask friends and family
3. Implement the solution
This scares the dickens out of me as there are definite holes in this approach:
1. How do we validate something is wrong or not working
2. Consider the source. Not everything on the Net is true, much has never been tested, much is to sell you something, many have no knowledge of YOUR problem, and the list goes on
3. A simple search, without any validation in test or QA, will be implemented without questionopening up a complete can-of-worms for utter disaster
So, as many people before me have done, Ill attempt to explain away the problem solving cycle, with a bit of DBA clarity to boot. As an example, Ill be going through a recent question I received from a reader about something I wrote.
Our Example: To get the use case going, I recently wrote about how to configure Oracle to automatically start and stop on a Linux system. I provided step by step instructions along with a start/stop script called dbora, a common practice. In my script, I also called the dbstart script, which every DBA should have at least heard of. Well, to get things going here, the reader questioned me about not putting a call to start the Oracle listener in my dbora script. As did a previous reader, I might add.
1. Know the issue inside and out. This is the proverbial dont jump before looking. Validate the reasons for your understanding of a problem or issue. So often, I see engineers begin to solve a problem only because someone said there was a problem. There is no validation process of the problem. We are often incident driven, meaning some event happens and then we react, only to find out latter that it was a one time occurrence.
In our Example: My readers obviously questioned whether I should be putting a call to start the Oracle listener in my dbora script. They even pointed to an external website, which I wont mention, to prove their point that I should be including this call to LSNRCTL. Now the real question here is how well did these readers actually research and investigate the perceived problem? I never did go off and read the article that they pointed to, wait one second, let me go look now. Ok, Im back. Youll never guess this one, maybe you would have, but right at the top of the script, this article references an Oracle 9.2.0 database, AND at the top of the article it talks about Oracle 10.2. Now, I havent read the whole article and depending on what they are trying to do, this 10.2 .vs. 9.2 might be totally in line. However, I do know that my article was an Oracle 220.127.116.11 so things could be different. This in itself should have triggered at least a slight concern for the readers. Maybe it did, Im just pointing out the version issue. Moreover, there are other issues that should make you question a lot of what you find out on the Net. Basically, does the Web content match my particular situation? Trust me, things move too fast, not much hits 100% all the time.
2. Understand what youve got. This is all about knowing if you really have a problem. This goes back to asking yourself how you determine a problem really exists. Does your system behave a particular way all the time, some of the time, or just when you hear it from some user? If you dont have anything in place to tell you when something is wrong, how do you know when something IS wrong? Clock-time, user-opinion, and hot C-level breath does not necessarily mean something is wrong.
In our Example: I am assuming that because the readers responded to my auto- start and stop article that they actually had a system they have to start by hand. In this case, I would congratulate the readers in finding my article and asking the question. They obviously were doing research and trying to put the pieces together. Now if the readers were just trying to find fault with my article, which does happen quite often, then all I can say is skip to number three below.
3. Understand what youre working with. Know your system, trust your research, and converge the two. This is probably the most difficult part of problem solving. You have to merge what you know, what your system is doing, and the research you have found. A miscalculation could be disastrous. That is why within this step you must bring these items together and actually test what you postulate. Thorough trial and error, especially when first starting out, is the only way you can validate what you have and where you want to go.
In our Example: This is the part that I honestly have to be a bit confused about. Regardless of the scenario in step 2, all researchers should validate their hypothesis. This is where you gain wisdom through trial and error. Through investigation and understanding what they have to work with, these readers should have soon found out that the dbstart script with Oracle 11 does a call to start the listener; so no need to put the call in my script. Let me make this clear, it isnt about my article against someone elses. Its about making sure your problem can be remedied by my solution. Taking this a step further, when I answered these readers, I didnt just type out a response re-stating my solution/position. I stripped down my machine and re-tested what I wrote. Then and only then did I answer the question that was posed to me. Yes, this took about 1 hour of my time. Regardless, those of us who write solutions need to be very careful not to lead someone astray.
4. Plan of attack. This should be semi-obvious but depending on if you are in the testing stage or pre-deployment to production stage, you should be planning what you will do. I suggest an almost cook-book approach, putting your steps on paper and then following them up with outputs as you execute the steps in your process.
In our Example: I have already stated that I re-tested my scripts before giving an answer back to my readers. My readers should have done the same. Moreover, when I re-tested, I did follow my step-by-step approach, giving commands and following up with output and explanation. Your plan of attack, no matter how much you trust your understanding of a solution should always be ready to catch flaws that might creep in because of some un-known variable.
5. Execution with validation: Ok, just do it. But again, for posteritys sake, record your findings, make sure they align with your test results, and then sleep easy.
In solving any problem, research is inevitable. Trusting and validating that research is another thing. It is hard to validate everything we come up against. Systems are just too complex to have one sitting in a corner waiting to be used. However, when it comes to solving problems that will be deployed to a production environment it is our responsibility as DBAs to test these solutions before we accept them as fact.