Monitoring SQL Servers Availability

July 31, 2006

How many SQL Server instances are you running? Do you ever have customers calling you stating that their application is not working, then when you research the problem you find that the instance that supports their application is unavailable? Have you ever had someone mistakenly shutdown the SQL Agent service and forget to restart it, causing a number of scheduled jobs to not be run? One of the tasks of a DBA is to monitor the availability of all SQL Server instances and services. If you have a large number of instances this monitoring task could take you a fair amount of time. This article will discuss why a DBA should monitor instance availability, and provides an approach to automating this process.

Why Monitor Server Availability

A DBA needs to monitor the availability of the SQL Server machines they support. Without monitoring, you can never be sure whether all your SQL Server machines are available and performing as expected. If you don’t monitor then your first indication of a problem is when someone reports they cannot connect to one of your servers. No DBA wants to hear that an instance of SQL Server is not available.

If you are going to monitor, you need to do it routinely. The goal of monitoring is to identify a problem soon after it occurs. By monitoring routinely and frequently, you can be more pro-active in resolving server availability issues. The sooner you know about a problem, the sooner you can resolve it. Ideally, you want to resolve problems before they are discovered by your customers. Monitoring server and service availability can be a time consuming task. Because of this, it is best if the monitoring process can be automated.

What to Monitor

One of the first things you need to determine is what should you monitor. If you think about what needs to be available to make your customer applications work then you can easily develop a list of things to monitor. In my shop, I wanted to make sure that an application/customer could connect to SQL Server. If an application/customer could not connect then either the MSSQLSERVER service was not running, the box was shutdown, powered off, or there was some kind of network connectivity issue. I also wanted to verify that SQL Agent was running. If SQL Agent is not running then routine scheduled jobs might not be started as scheduled.

The Automated process

The automated process to monitor my servers consisted of a table, 4 different stored procedures, and two different monitoring servers. Let me discuss each one of these pieces.

The two different monitoring servers are just two different SQL Server machines, on two different networks with different power supplies. One server is defined as the primary monitoring server, and the other one as the backup monitoring server. The reason for two monitoring servers is to allow the backup monitoring server to take over the duties of monitoring should the primary monitoring server become unavailable. Each SQL Server machine has connectivity to all servers that are being monitored. Linked server definitions are defined on the two monitoring servers for every machine that is being monitored. These linked server definitions are used to submit commands to verify that the monitoring server can connect to the server being monitored, and that all necessary services are running.

On each monitoring server, I have created the following table:

 CREATE TABLE MonitorServerServices (
 Server nvarchar (100),
 Service nvarchar (100))

This table contains a list of servers and services that will be monitored. Each monitoring server has an identical list of servers and services. Here might be a typical set of records for this table for a shop that only has three different servers to monitor:

Server                      Service
--------------------------- --------------------------------
SERVERA                     MSSQLSERVER 
SERVERA                     SQLSERVERAGENT
SERVERB                     MSSQLSERVER    
SERVERB                     SQLSERVERAGENT 
SERVERC                     MSSQLSERVER
SERVERC                     SQLSERVERAGENT 

Here you can see I am only monitoring the "MSSQLSERVER" and "SQLSERVERAGENT" services. If in your shop, you want to monitor additional services on your machines this process can handle that by just adding additional table entries for the additional services you wish to monitor.

The 4 different stored procedures to support this process are: "sp_send_cdosysmail", "usp_check_connection", "usp_services_monitor", and "usp_service_monitor_backup". I have placed each of these stored procedures on both monitoring servers. Although the "usp_service_monitor_backup" stored procedure (SP) really only needs to be placed on the backup monitoring server. The first two stored procedures play a supporting role in the monitoring process. The third SP, in the list above, does the actual monitoring, while the forth SP is used to detect whether the primary monitoring server is working.

The "sp_send_cdosysmail" SP provides a mechanism to send SMTP mail. I obtained this SP from Microsoft; therefore, I have not included the code for this SP. I like using SMTP mail, instead of using "xp_sendmail". If you want to use SMTP mail and do not already have this SP, you can find the code at the link below. Or, you can replace my executions of this SP with however your shop supports mail from T-SQL. Here is the link to a Microsoft page that describes using SMTP mail and the "sp_send_cdosysmail" SP:

http://support.microsoft.com/default.aspx?scid=kb;en-us;312839&sd=tech

The next supporting SP is "usp_check_connection". This SP is used to verify that the monitoring SP can connect to the servers being monitored. This SP is critical to this process since it keeps the process from failing when it cannot connect to a server being monitored. Testing connectivity is accomplished using OLE automation. This SP tests by connecting to the server being monitored via a linked server definition from the primary or backup monitoring servers. This allows for this process to not to have to store passwords, since the passwords are stored in the linked server security definitions. Here is the code for this SP:

CREATE proc usp_check_connection ( 
  @server varchar(100),
  @linkedserver varchar(100))
as
--
-- Written by: Gregory A. Larsen   Date: 12/29/2004
-- This procedures is used to determine if a server (@server), 
-- Can connect to a linked server (@linkedserver).  If 
-- the server cannot connect to the linked server then it 
-- sets the return code to 99.
--
-- Parameters: 
--    @server       - This variable identifies the server that   
--                    this sp will try to connect to, to verify the 
--                    linked server connection. It is either the
--                    primary or backup monitoring server.
--    @linkedserver - This variable identifies the linked 
--                    server that the sp tries to connect to.
-- 
---------------------
-- Begin Section A --
---------------------
-- Declare Local Variables
DECLARE @object int
DECLARE @hr int
DECLARE @src varchar(255), @desc varchar(255)
DECLARE @location varchar(100)
DECLARE @CMD nvarchar(1000)
DECLARE @objresults int
-- Set the location in the code for error processing
set @location = 'Create SQLServer Object'
-- Create a SQL Server object 
EXEC @hr = sp_OACreate 'SQLDMO.SQLServer', @object OUT
-- check to see if object creation had an error
IF @hr <> 0 goto ERROR
-- set the Connection properties to use Window Authentication
EXEC @hr = sp_OASetProperty @object, 'LoginSecure', 'True'
-- Set the location in the code for error processing
set @location = 'Connect to Server - ' + rtrim(@server)
-- Connect to server
EXEC @hr = sp_OAMethod @object, 'Connect',Null,@server
-- Check to see if connection to server failed
IF @hr <> 0 goto ERROR
-------------------
-- End Section A --
-------------------
---------------------
-- Begin Section B --
---------------------
-- Set the location in the code for error processing
set @location = 'Execute TSQL to Linked Server - ' + rtrim(@linkedserver)
-- Set the command to be executed on the linked server
set @cmd = 'ExecuteWithResults("select top 1 name From ' + 
           rtrim(@linkedserver) + '.master.dbo.sysobjects")'
-- execute command on linked server
EXEC @hr = sp_OAMethod @object,@cmd, @objResults OUT
 
-- check to see if process could connect to linked server
IF @hr <> 0 goto ERROR
-- Successfully connected, return without error code
RETURN
-------------------
-- End Section B --
-------------------
---------------------
-- Begin Section C --
---------------------
-- Error Handling Routine
ERROR:
-- Print the location of the error
Print @location
-- Print SQL-DMO error
EXEC sp_OAGetErrorInfo @object, @src OUT, @desc OUT 
SELECT hr=convert(varbinary(4),@hr), Source=@src, Description=@desc
-- Connection Failure, return with a 99 return code
RETURN 99
-------------------
-- End Section C --
-------------------

As you can see this SP creates a SQLServer object, sets the "LoginSecure" property to "True" and then makes a connection to the monitoring server using Windows Authentication. Once connected to the monitoring server this SP builds a simple dynamic query using a linked server "SELECT" statement to determine if the server being monitored is reachable from the monitoring server. If it cannot connect to either the monitoring server or the server being monitored then an error is returned to the calling SP.

The reason I go through all this logic just to verify that a server is up and running is because a normal inline T-SQL linked server "SELECT" command would cause my SP to stop executing if the "SELECT" command failed because a server was down. By using OLE automation to submit my T-SQL command, this SP is able to detect the outage, and continue running, so it can monitor the rest of the servers. Now that you understand the supporting SPs, let's look at the two different monitoring stored procedures.

Prior to reviewing the code in the monitoring SP, let me first explain the method I use to determine if a particular service is running. Microsoft provides an undocumented extended stored procedure called "xp_servicecontrol". This undocumented stored procedure can perform a number of service related functions. One of these functions, "QueryState", returns the status of a particular service. A status for a service like "SQLSERVERAGENT" can have 3 different possible values: "Running.", "Stopped.", or "Starting...". I use the "QueryState" function of "xp_servicecontrol" to verify that the service I want to monitor has a status of "Running".

As mentioned earlier, I have a stored procedure "usp_service_monitor_backup" that verifies that the primary monitor appears to be functioning. The "usp_service_monitor_backup" SP is scheduled to run every 10 minutes, on the backup server, via a SQL Agent job. If this SP cannot connect to the primary server, it assumes it needs to take over monitoring duties and executes "usp_service_monitor" the primary monitoring SP. Here is the code for this backup monitoring SP:

CREATE proc usp_service_monitor_backup (
@masterserver varchar(100),
@email_dist varchar(1000))
as 
-- Written by: Gregory A. Larsen
--
-- Date: January 3, 2005
--
-- Description:
--
--   This SP is used as a backup monitor for the availability of windows services of specific SQL Server machine.
--   This backup monitor should be run on a different server, different network, and different power source
--    then the primary monitor.  This is done some in case the server running the primary monitor is down
--    this backup monitor will be able to perform the monitoring.  This backup monitor only monitors if 
--    if cannot contact the server doing the primary monitoring. 
--
-- Parameters:
--   
--  @masterserver - name of server doing primary monitoring.
--  @email_dist -  This parameter identifies the distribution list for any emails sent by this process.
set nocount on 
--------------------
-- Begin Section 1 -
--------------------
-- declare local variables
declare @rc int
declare @cmd nvarchar(1000)
declare @cmd2 nvarchar(1000)
declare @state1 varchar(100)
-- determine if backup monitor can connect to primary monitor
exec @rc=dba.dbo.usp_check_connection @@servername,@masterserver
-- Is the server doing primary monitoring unreachable?
if @rc = 99
begin 
-- Display message that primary monitor is down
  print 'Master server "' + rtrim(@masterserver) + '" for monitoring is not available.'
-- Perform monitoring
  exec  dba.dbo.usp_service_monitor @email_dist
end
------------------
-- End Section 1 -
------------------
--------------------
-- Begin Section 2 -
--------------------
else begin
  -- Build command to determine state of SQLSERVERAGENT service on Master Server
  SET @CMD = 'create table #state (state varchar(2000))' + char(10) + 
             'declare @cmdx varchar(1000)' + char(10) + 
             'insert into #state  EXEC master..xp_servicecontrol ''''QueryState'''', ''''SQLSERVERAGENT''''' + 
             + char(10) + 'select @state=state from #state' + char(10) + 
             'drop table #state' 
  -- Build command to execute command that determines state of service being monitored
  set @cmd2 = 'declare @state varchar(100)' + char(10) +
              'exec ' + rtrim(@masterserver) + '.master.dbo.sp_executesql N''' + @CMD + ''',' + 
              'N''@state varchar(100) out'',' +  
              '@state out' + char(10) +
              'set @state1 = @state' 
  -- Execute command and return state of service being monitored 
  exec master.dbo.sp_executesql @cmd2,N'@state1 varchar(100) out',@state1 out
  -- Is the service that was monitored not running
  if @state1 <1 'Running.' 
  begin
  -- Display message that primary monitor is down
    print 'Master server "' + rtrim(@masterserver) + '" for monitoring is not available.'
  
  -- Perform monitoring
    exec  dba.dbo.usp_service_monitor @email_dist
  end
  else
    print 'Master Server is running'
end
------------------
-- End Section 2 -
------------------

Here you can see I use the "usp_check_connection" SP to determine if I can connect to the primary monitoring machine from the backup monitoring machine. If I cannot then this SP assumes it needs to perform the monitoring and executes the "usp_service_monitor" SP, which is the main monitoring SP. If this SP can connect to the primary monitor, it then checks to verify that the SQL Server Agent is up and running. If SQL Server Agent is not running then this SP takes over the primary monitoring duties by executing "usp_service_monitor". If this SP finds that SQL Server Agent is running then it assumes that the primary monitor is functioning normally and therefore the backup machine does not need to take over primary monitoring duties.

I have already mentioned that the "usp_service_monitor" SP is the primary SP for monitoring SQL Server services. This SP needs to be placed on both the primary and backup monitoring machines. Here is the code for this SP:

CREATE proc usp_service_monitor 

@email_dist varchar(1000) 

as 

-- Written by: Gregory A. Larsen

--

-- Date: Dec 22, 2004

--

-- Description:

--

--   This SP is used to monitor the availability of windows services of specific SQL Server machine.

--    If a service is not available then an email is send to the DBAs.  The SQL Server machines to check

--    and the services to monitor are contained in table "MonitorServerService" in the DBA database.

--

-- Parameters:

--   

--  @email_dist -  This parameter identifies the distribution list for any emails sent by this process.

set nocount on 

--------------------

-- Begin Section 1 -

--------------------

-- declare local variables needed

declare @state1 varchar(100)

declare @CMD nvarchar(4000)

declare @CMD2 nvarchar(4000)

declare @server varchar(100)

declare @subject varchar(100)

declare @body varchar(1000)

declare @save_server varchar(100) 

declare @service varchar(100)

declare @save_service varchar(100)

declare @rc int

-- Initialize the variables used to control the while loop and reading of servers and services

-- from table MonitorServerServices

set @save_server = ''

set @save_service = ''

-- get the first server and service to monitor

select top 1 @server=Server,@service=Service  from dba.dbo.MonitorServerServices 

order by Server, Service 

-- Process all servers and services

while @save_server + @save_service < @server + @service

begin

------------------

-- End Section 1 -

------------------

--------------------

-- Begin Section 2 -

--------------------

  

-- Are we processing the same server

  if @save_server <> @server

-- Display server being processed.

    print 'Processing Server ' +  rtrim(@server) + '-----------------------------------'

-- Display service being processed.

  print 'Checking Service ' + rtrim(@service)

-- Save the current server and service being monitored

  set @save_server = @server

  set @save_service = @service

-- Determine if server being process is reachable from monitoring server

  exec @rc=usp_check_connection @@servername,@server

------------------

-- End Section 2 -

------------------

--------------------

-- Begin Section 3 -

--------------------

-- Is server reachable

  if @rc <> 99

  Begin

-- Initialize state of service

    set @state1 = ''

-- Build command to determine state of service being monitored.

    SET @CMD = 'create table #state (state varchar(2000))' + char(10) + 

               'declare @cmdx varchar(1000)' + char(10) + 

               'insert into #state  EXEC master..xp_servicecontrol ''''QueryState'''', ''''' + 

               rtrim(@service) + '''''' + char(10) + 

               'select @state=state from #state' + char(10) + 

               'drop table #state' 

-- Build command to execute command that determines state of service being monitored

    set @cmd2 = 'declare @state varchar(100)' + char(10) +

                'exec ' + rtrim(@server) + '.master.dbo.sp_executesql N''' + @CMD + ''',' + 

                'N''@state varchar(100) out'',' +  

                '@state out' + char(10) +

                'set @state1 = @state' 

-- Execute command and return state of service being monitored 

    exec master.dbo.sp_executesql @cmd2,N'@state1 varchar(100) out',@state1 out

------------------

-- End Section 3 -

------------------

--------------------

-- Begin Section 4 -

--------------------

-- Is the service that was monitored not running

    if @state1 <> 'Running.' 

    begin

-- Build and send email for service not running

      set @subject = rtrim(@server) + '- ' + rtrim(@service) + ' service Not Running'

      set @body = 'The ' + rtrim(@service) + ' service is not running on machine ' + rtrim(@server) + '.'  + char(10) +

                           'The ' + rtrim(@service) + ' has the current state - ' + @state1

      exec master.dbo.sp_send_cdosysmail 

          @From='ServiceMonitor@dbazine.com',

          @To=@email_dist,

          @Subject=@subject,

          @Body=@body

-- Display message about service not running

      Print @body

    end  

    else

-- Print state of service

      Print 'The ' + rtrim(@service) + ' is ' + rtrim(@state1) + '.'

  End

------------------

-- End Section 4 -

------------------

--------------------

-- Begin Section 5 -

--------------------

 

  else

  begin

-- build and send mail message if the server can't be reached

    set @subject = rtrim(@server) + '- Can not connect to Server'

    set @body = 'Can not seem to connect to server ' + rtrim(@server) +

              '.  Either the MSSQLSERVER services is not running, the ' + 

              ' machine is powered off, or the machine can''t be reached' +

              ' from machine ' + rtrim(@@servername) + '.'  

    exec master.dbo.sp_send_cdosysmail 

         @From='SQLServerMonitor@dbazine.com',

         @To=@email_dist,

         @Subject=@subject,

         @Body=@body

        Print @body

  end

-- get next server and services to monitor

  select top 1 @server=Server,@service=Service  from dba.dbo.MonitorServerServices 

    where @save_server + @save_service < server + service

    order by Server, Service

end

------------------

-- End Section 5 -

------------------


This code processes through a list of servers and services found in table "MonitorServerServices" one record at a time. For each record, this code calls "usp_check_connection" to determine if the server to be monitored is reachable. If this SP cannot connect to the server being monitored then an error message is sent to the email distribution address passed as a parameter to this SP using the "sp_send_cdosysmail" SP. If the server is reachable, then this SP uses the "xp_servicecontrol" SP to determine the STATE of the service being monitored. If the service being monitored is found to not be running then this SP sends an email to the email distribution list. This email notifies the individuals on the distribution list of the status of the service, so they can quickly respond to SQL Server service problems. If the current service is running then the next server and service to be monitored is obtained from the "MonitorServerServices" table and the while loop is repeated. This process continues until all servers and services have been monitored

Conclusion

I have found this monitor process provides an early indication on whether MSSQLSERVER or SQLSERVERAGENT services are not running. Every so often, our System Administrators will take down both SQL Agent and SQL Server, but only restart SQL Server. When this happens, this process lets the DBAs know that SQL Server Agent needs to be started to prevent possible loss of work due to SQL Server Agent being down. If you have experienced outages of services on your critical SQL Server machines then you already know how nice it would be to have a service monitoring process.

» See All Articles by Columnist Gregory A. Larsen








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers