So far in our series of articles, we have been discussing relatively generic
SQL Server 2000 security related issues that every database administrator is
likely to encounter on a daily basis. Now it is time to look into less common
scenarios that deal with one specific area of management – database
replication. We will start by presenting its architecture and implementation
options, and follow with their security implications.
The main purpose of replication is data distribution. Placing multiple
copies of the same data at different locations allows you to load balance or
reduce network traffic generated by remote database clients, merge data from multple
sources into one comprehensive target, or separate databases based on their
purpose (e.g. OLTP access can be isolated from read only queries by creating
two data sources, with the read only one being periodically updated). Copying
critical data to a secure location also serves frequently as the basis for
disaster recovery or high availability solutions.
There are different variations of replication, which differ in scope of data
(from stored procedures or user defined functions to multiple tables),
direction of data transfer (one or two directional, with merge functionality
and conflict resolution provisions), propagation method (push or pull), and
update mechanism (more about this shortly). In order to make principles of
replication easier to understand, SQL Server designers decided to apply
terminology derived from the publishing industry when naming its main
components:
-
publisher – SQL Server instance that serves as a source of
replicated data, -
distributor -SQL Server instance which manages and monitors data
flow from a publisher to subscribers, using the distribution database created
specifically for this purpose, -
subscriber – SQL Server instance functioning as the target for
replicated data, -
article of publication – the smallest unit of replicated data,
consisting of a table, its subset formed by either selected columns (filtered
vertically), rows (filtered horizontally), or combination of both, a view,
user-defined function, or stored procedure. -
publication – collection of one or more articles grouped in order
to be replicated together.
Keep in mind that the three architectural elements of replication –
publisher, distributor and subscriber – are, in essence, roles that a
particular SQL Server can serve. These roles are not mutually exclusive – there
are situations where the same system functions as a publisher and a distributor
or as a publisher and a subscriber within the same instance of replication.
Similarly, with multiple replication instances, a system can be a subscriber
for one, and a distributor for another.
There are three types of replication:
-
snapshot – is the simplest type from the conceptual and
implementation perspective. It involves capturing content of a publication at a
point in time and copying it to subscribers. This content consists of scripts
to create database objects included in a publication that are subsequently
executed on the subscriber, as well as data residing in these objects. Its main
benefit is simplified maintenance and troubleshooting as well as decreased load
on replication participants, since they are not responsible for keeping track
of database changes in between snapshots. On the other hand, lack of
transactional support is also the main drawback of snapshot replication. Since
incremental changes to a publisher are not reflected in the subscriber until
the next snapshot, data between the two tend to be out of synch most of the
time. In addition, size of replicated data is the same size as the size of the publication,
every time the snapshot gets replicated – affecting network bandwidth and
processor utilization across all replication participants. This is the main
reason that snapshot replication is typically used with small and relatively
static publications. There is no mechanism to propagate updates on subscribers back
to the publisher. -
transactional – overcomes the limitations of snapshot replication
by keeping track of incremental changes to a publication (taking advantage of
database transaction logs), propagating them to distributor, which in turn
distributes them to subscribers, shortly after they take effect (note that
transactional replication starts with a single snapshot, which provides initial
synchronization between publisher and subscribers). This ensures that
subscribers represent an up-to-date state of publisher (degree of
synchronization between the two is configurable). At the same rate, though,
this places an additional load on publisher, distributor, and subscribers, and
increases complexity of configuration and management. This type of replication is
used in scenarios where data on subscribers needs to remain consistent with the
original publication. Subscribers are typically treated as read only, although
there are (relatively limited) configuration options (immediate updating
subscribers and queued updating subscribers) allowing updates applied to
subscribers to replicate back to the publisher. In general, though, it is
recommended to handle updateable subscribers by configuring merge replication
(presented next). -
merge – intended for scenarios where content of a publication can
be updated at multiple locations. Initial snapshot is used to synchronize
remote subscribers with a publisher, but from this point on, the distinction
between them becomes less clear. Changes can occur at multiple locations (all
subscribers and publisher), which introduces a number of issues that need to be
properly addressed, such as mutual synchronization and conflict resolution. In
addition, merge replication offers some unique enhancements, such as dynamic
filtering (which filters replicated data based on parameters provided
dynamically by subscribers). This type of replication is used in situations
that involve remote offices or mobile users working with subsets of a
publication, which are rolled back into the publisher.
Actual work involved in replication is performed by a number of automated
processes, which are implemented as SQL Server Agent jobs, created
automatically when you set up replication via SQL Server Enterprise Manager.
This implies that they run in the security context of the SQL Server Agent
account (on Windows NT and later operating systems) or a logged on user (on
Widows 9x computers). These jobs fall into several distinct categories, known
as replication agents, each based on a separate executable from Program
FilesMicrosoft SQL Server80COM folder or a stored procedure from the
distribution database, and delivering different type of functionality. The
primary ones are as follows:
-
snapshot agent (SNAPSHOT.EXE) – running on distributor, it is
responsible for creating a snapshot of a publication. The snapshot is then
copied to a shared folder on distributor, which subsequently is copied to
subscribers with the help of distribution agent (refer to the next item below).
This agent is used by every type of replication, since each one involves the creation
of a snapshot at least once. -
distribution agent (DISTRIB.EXE) – copies and applies the initial
snapshot created by the snapshot agent, and, if appropriate, transactional
changes, prepared by the log reader agent (discussed in the next item below),
from distributor to subscribers. It is used by snapshot and transactional
replication. You can configure either one shared agent serving all subscribers
running on distributor (push subscription) or individual agents operating
locally on each subscriber (pull subscription). -
log reader agent (LOGREAD.EXE) – running on distributor, log
reader agent pulls entries from the transaction log on publisher (the ones
marked for replication), making them available to the distribution agent. It is
used only in transactional replication. -
merge agent (REPLMERG.EXE) – specific to merge replication,
synchronizes publication content between publisher and its subscribers. Note
that, in addition to snapshot agent, this is the only one that merge
replication relies on (providing combined functionality of log reader and
distribution agents used with transactional replication). Merge agent can
reside either on a publisher (push subscriptions) or on subscribers (pull
subscriptions). -
queue reader agent (QRDRSVC.EXE) – is optional in transactional
replication (not used for the other two types), applicable only when queued
updating subscribers in transactional replication are configured (in cases
where transactional changes to subscribers need to be replicated back to the
publisher).
There are also several other replication agents whose roles are secondary,
performing mostly cleanup and monitoring tasks, such as agent history clean up
agent, distribution clean up agent, expired subscription clean up agent,
reinitialize subscriptions having data validation failures agent, or
replication agent checkup agent, which are not relevant to our security-focused
discussion, so we will not go beyond just listing them here.
This introductory information will serve as the basis for a discussion on
replication security, in our next article.