Queries On Multiple Databases

I work with an accounting system that is currently comprised of 150 databases and growing
by three or four databases per year. My coworkers and I frequently need queries to run against some or all of the databases at once. Since the system’s report writer is slow and constrained, I’ve developed both scripts and stored
procedures, relying on either unions or cursors, as an alternative.

Unions

Unions are straightforward and well-documented in BOL. They can easily reach across multiple databases, and even multiple servers.

If I have three databases with similar schema, and I want a count of rows in a particular table, by database,
I could run the following query:

Click here for code example 1.

I specify the object I want by using its "fully qualified"
name,
more commonly called three-part; i.e., NJ1.dbo.apinpchg. NJ1 is the
database,
dbo is the object’s owner, and apinpchg is the object, a table in this case.
On
my server, the dbo owns every object except INFORMATION_SCHEMA’s views, so I
could refer to this table as NJ1..apinpchg. Don’t assume the dbo owns
everything
on your server(s) or that there aren’t identically named objects owned by
different users; there are several counter-examples elsewhere in my
company.

One point of
interest is MA1, which is on another server, srv2. I access it with
four-part
syntax. The SA would have had to set up
srv2 as a linked server and mapped my local login to a login on srv2 that
had
SELECT rights on MA1.dbo.apinpchg.

I recommend using ALL with the UNION operator
if
possible to increase performance. ALL tells the server
not to
remove duplicate rows from the union, an unproductive and potentially costly
operation.

The reader is referred to BOL for criteria for using UNION.
Note
that the SELECT statements can be as complex as you need. Also
note
that the tables within each SELECT can be drawn from multiple
databases.

Click here for code example 2.

Cursors

I do most of my work with cursors and dynamic SQL. Large unions are
tiresome
to code, read, and maintain. They may not run at all on 6.5, even if they’re
within the maximum batch size.

The following stored procedure returns every transaction in every company
for
the specified GL account for FY2000. It can dump the results into Excel via
Microsoft Query. The only way to do this with my system’s report writer is
to
run 150 transaction detail reports and copy and paste them together.

The SP is an example. It doesn’t have even rudimentary error checking,
and is
inflexible.

Click here for code example 3.

A few notes. The cursor needs a table of database names to work with. If
necessary, you can use sysdatabases in the master database and control the
rows
with a WHERE. I’ve also used derived tables occasionally.

This example uses exec. Microsoft recommends
sp_executesql
if your server is 7.0 or greater. I’m used to exec, and in any
event I doubt you will notice a performance difference between the
alternatives.
Like everything else in SQL Server, performance is mainly governed by
whether
the data is cached and the quality of your T-SQL, indexes, and
statistics.

Unless your users work in ISQL/W or Query Analyzer, you’ll likely want to
develop a cursor solution as a stored procedure. Be aware that the SP won’t
give
users rights to databases and objects that they don’t already have;
exec
checks the rights of the user who invoked the procedure, not the user who
created it.

Which to Use?











 

Union

Cursor

Advantages



  1. Easy to follow–not much more complex than one SELECT
    statement
  2. Easy to code. Mostly a copy-and-paste operation in Windows
  3. Works well with MSQuery and other programs that allow only one
    statement per batch
  4. Can be put in a view
  5. Server can show you the entire query plan if you need
  6. Good choice for fixed queries on a small number of databases


  1. Very flexible. You can control the databases queried at runtime
    and
    you have the power of dynamic SQL available
  2. No meaningful limit on the number of databases that can be
    queried
    at once
  3. The logic is contained in a loop rather than sprawled out over
    hundreds of lines.
  4. Modifications only have to be done in one place

  5. Good choice for a large number of databases or ad hoc queries or
    both

Disadvantages



  1. Very cumbersome syntax as the number of databases or the query
    complexity or both increases
  2. Inflexible–the particular databases queried are hardcoded
  3. Modifications may have to be made throughout the union
  4. In unusual circumstances, the server may not be able to handle
    the
    query, even though it’s within the maximum batch size. I had
    considerable problems with 6.5 giving me “DBPROCESS is dead”
    errors.


  1. Requires some expertise to code and debug
  2. Must often be put into a stored procedure

  3. Creates a batch every time exec runs, which the
    server
    must compile

  4. More difficult to optimize than a union

Latest Articles