DBAs are often faced with the encompassing task of trying to reduce the use of system resources. In doing so, DBAs have the ominous task of having to relate the consumption of system resources by a database to the available resources on the database server it runs on. Sometimes this is a very difficult task.
Take for instance the use of network resources. When we look at database statistics accumulating from within the database we get a glimpse of what is actually happening with network traffic for the database as a whole or for individual sessions. What is often looked at are the SQL*Net message from/to series of statistics. Take for example two of the most common of these types of statistics:
Common Network Database Statistics
SQL*Net message from client – This is the time it took for a message to arrive from the client since the last message was sent to the client.
SQL*Net message to client – This is the time it took for the sending of a message to the client.
While these give an indication of how long it took for messages to send, these statistics really don’t give a good indication of how busy the network is for database activity. For this type of information we must begin our investigation with statistics that are measured at the operating system / network level. From this level we can then relate network resource consumption, possibly abusers of the network, to database sessions that are executing within the database. Take the following example:
Like any database monitoring endeavor, trapping the use of a resource is the first trick in determining who is using the resource and who might be abusing it. For our purposes, the UNIX utility netstat (network statistics) is a sufficient tool to trap network users. The netstat utility is a command line utility that displays both incoming and outgoing network traffic and is often the utility of choice for finding problems within the network and determining the amount of traffic within the network that might be impacting performance. Like any utility, when issuing the command there are many options, netstat is no different. I’d encourage you to investigate the various options. For purposes of this example, we will focus on the following parameters:
Netstat Parameters
-t : Displays only TCP connections
-p : Shows which processes are using which sockets
Netstat Statistics Returned from -t & -p Parameter Options
Proto : The protocol (tcp, udp, raw) used by the socket, for our example this will be tcp
Recv-Q : Receiving queue, the count of bytes not copied by the user program connected to this socket
Send-Q : Sending queue, the count of bytes not acknowledged by the remote host
Local Address : Address and port number of the local end of the socket, this is the address/connection on the database server side
Foreign Address : Address and port number of the remote end of the socket, this is the address of the issuing client
State : State of the socket
PID/Program name : PID (process ID) and program name, very important for linking PID on the operating system side to the session/process within the database
To trap output from those UNIX commands that do not have a refresh parameter, I like to put together a little script that will do something similar. For netstat I use the following, which accepts three parameters: parameters for the sleep time between samples, the number of rows to return for display, and the network queue you are interested in. When executed, this script behaves a lot like the top command; continually refreshing the screen for the number of top sessions you’ve selected.
#!/bin/sh # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- # monitorNETSTAT.sh # Use netstat to display tcp network usage by process # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- if [ $# != 3 ] then echo "Usage monitorNETSTAT.sh sleep topN sortQ" echo "Where sleep - seconds to sleep between samples" echo " topN - top number of rows to return" echo " sortQ - [r|s] to sort on (r)ecv-q or (s)end-q" exit 1 fi SLEEP=$1 TOPN=$2 if [ "${3}" = "r" ] then SORTQ=2 else SORTQ=3 fi while [ 1 ] do netstat -t -p | grep Recv-Q > netstatHEAD.lst netstat -t -p | grep tcp | sort -k ${SORTQ} -g -r > netstatDETAIL.lst clear cat netstatHEAD.lst head -n ${TOPN} netstatDETAIL.lst echo " -- -- -- -- -- -- -" echo "Status Counts" echo " -- -- -- -- -- -- -" cat netstatDETAIL.lst | grep tcp | cut -c 77-88 | sort -u | while read netstatSTATUS do statusCnt=`cat netstatDETAIL.lst | grep $netstatSTATUS | wc -l` echo $netstatSTATUS $statusCnt done sleep ${SLEEP} done
But before executing this script, let’s first log into a database and gather some static information on the session we’d like to capture for this example. To log into the database I use the following connect string:
C:Usersjkoopmann>sqlplus auser/apassword@//192.168.35.112:1521/ORADB SQL*Plus: Release 11.2.0.2.0 Production on Fri Apr 6 18:27:16 2012 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 10g Release 10.2.0.4.0 - 64bit Production
Now, I can execute the following SQL to show my current session information. Obviously you won’t know the session / process that is causing network usage problems but this gives us a nice starting point to just take a look at what we should expect.
SQL> set heading off SQL> select 'Sid, Serial#, Aud sid : '|| s.sid||' , '||s.serial#||' , '||s.audsid a, 'DB User / OS User : '||s.username||' / '||s.osuser a, 'OS Process Ids : '||s.process||' (Client) '||p.spid||' (Server)' a, 'Client Program Name : '||s.program a from v$process p, v$session s where p.addr = s.paddr and s.audsid = userenv('SESSIONID'); Sid, Serial#, Aud sid : 1897 , 16112 , 206249940 DB User / OS User : AUSER / jkoopmann OS Process Ids : 6576:3416 (Client) 4107 (Server) Client Program Name : sqlplus.exe
Key points to notice from the above sequence is that I’ve connected to a database as user AUSER, the IP address of the database, the port number, the service name, and the OS process ID (4107).
It is equally important to see what the IP address is on my local client, as this will also be reported within the netstat output. For this I just issue the ipconfig command and not that my local IP:
C:Usersjkoopmann>ipconfig /all Ethernet adapter Local Area Connection 2: …. Connection-specific DNS Suffix . : somewhere.net Description . . . . . . . . . . . : Cisco Systems VPN Adapter for 64-bit Windows Physical Address. . . . . . . . . : xx-xx-xx-xx-xx-xx DHCP Enabled. . . . . . . . . . . : No Autoconfiguration Enabled . . . . : Yes Link-local IPv6 Address . . . . . : xxxx::xxxx:xxxx:xxxx:xxxx%xx (Preferred) IPv4 Address. . . . . . . . . . . : 192.168.5.123(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0
To generate some network traffic, just issue some form of large Cartesian join:
SQL> SELECT a.*, b.*, c.*, d.* FROM dba_tables a, dba_tables b, dba_tables c, dba_tables d;
Run the monitorNETSTAT.sh script:
C:Usersjkoopmann> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 13969 192.168.35.112:1521 192.168.5.123:58097 ESTABLISHED 4107/oracleORADB tcp 0 360 ::ffff:192.168.35.112:22 ::ffff:192.168.5.123:53822 ESTABLISHED - tcp 0 17 192.168.35.112:1521 192.168.35.234:3083 ESTABLISHED 15296/oracleORADB
Notice that the Send-Q for PID 407 is backlogged for my connection to the ORADB.
Key points to notice are that you can see the IP of the database server, the IP address of my client machine, and most importantly the PID (process ID) for the connection. This PID can now be taken and used with any of your favorite SQL scripts that use V$PROCESS.SPID and monitor session activity within the database.
By using netstat, you can easily see if there is a network bottleneck. Recv-Q and Send-Q for all intents and purposes should nearly always be zero unless there is a network issue. Under normal circumstances Recv-Q and Send-Q may have some value but they should not pile up and remain non-zero for long. If these queues remain non-zero it is a simple enough task to find the process ID within the database and determine the abusive SQL. Now, the next time someone asks you if the database is consuming network resources or is having network usage issues, you can easily turn to netstat for your answer.