In our series dedicated to SQL Server 2005 Express Edition, we have been
focusing mainly on characteristics that emphasize its unique capabilities,
making it a preferred choice in variety of scenarios, despite the apparent
superiority of its full-featured counterparts. However, it is fairly obvious
that its small footprint and free-of-charge use also have negative
implications, imposing restrictions on functionality available in the Standard
or Enterprise editions. This is especially conspicuous in the case of
replication, which we will explore starting with this installment.
Before we turn our attention to aspects of replication specific to SQL
Server 2005 Express Edition, let’s briefly review its terminology and
principles that are important to our discussion. The main purpose of
replication is to provide a mechanism for creating multiple copies of the same
data (including underlying metadata) and keeping them synchronized, regardless
of where updates originate. While its basic terminology is derived from the
world of traditional magazine publishing, some of its concepts appear to be
better represented by Web 2.0-based online technologies, such as wikis or blogs
(which allow partial or multi-directional updates to publications). In general,
though, both of these analogies work well when analyzed from the point of view
of three main entities participating in the replication:
-
publisher – a database instance, which contains the original
data. Its portions that are designated for replication are known as
publications and consist of one or more units called articles, representing a
database object such as a table, view, or stored procedure (which, incidentally,
can be further filtered to limit the amount of replicated data). -
subscriber – a database instance, which receives a copy of data
that was made available by a publisher. Depending on configuration, a
subscriber might be allowed to make changes to the local data and replicate it
either back to the publisher or to other subscribers (in this way assuming the
dual role of the publisher and subscriber) -
distributor – a database instance responsible for hosting
metadata needed to control and monitor data copy and its synchronization
between a publisher and its subscribers.
Note that, as we mentioned, some of these objectives can be assigned (for
example a subscriber can function as a publisher) to the same database,
although distributor is always separate and unique per publisher. (However,
this still allows for publisher databases to be hosted on the same SQL Server
2005 instance as the distributor, which, in such configurations, is known as
"local"). In addition, it is possible to intermingle multiple
replication arrangements (e.g. a single subscriber can receive distinct
publications from several, independent publishers). Despite the wide range of
possible variations, there are three basic replication categories, determined
by the roles of participants in the process of data modification as well as the
mechanism and frequency of updates:
-
snapshot – the simplest but also the least efficient model, which
involves copying entire publication (in the form of its snapshot) from a
publisher to its subscribers. This is applicable mainly in scenarios where the
frequency and number of changes is very limited (or one-time data copy is
needed). Since updates are propagated unidirectionally,
any changes to subscribers are automatically overwritten (hence, their data is
assumed to be read-only). -
transactional – relies on transactions to apply individual
modifications to subscribers shortly after they take place on a publisher (by
monitoring the transaction log of the original database and replaying each
entry that affects content of the publication). In the most standard form,
changes are only allowed on the publisher, but there are a couple of variations
to this basic scheme known as updating subscribers and peer-to-peer that allow
bidirectional updates. (Potential conflicts that occur when two replication
partners attempt to simultaneously modify the same article are resolved
according to an arbitrarily defined resolution policy) -
merge – the most complex but offering the highest degree of
flexibility, with changes that occur independently on a publisher and its
subscribers (accommodating scenarios when connection between them is available
only occasionally). Any intermediate updates are stored in system tables, which
are subsequently used during periodical replication to propagate changes and
apply conflict resolution on the row-by-row basis (rather than by employing
transactional mechanisms).
All the replication arrangements described above can be configured as either
push (with publisher and distributor controlling the frequency of updates) or
pull (changes are applied following explicit requests from a subscriber).
Individual tasks that facilitate data exchange are handled by specialized
processes called agents (implemented as separate executables located in the Program Files/Microsoft SQL Server/90/COM
folder and operating typically within the confines of SQL Server Agent jobs),
and include the following:
-
Snapshot Agent – running on distributor, it is responsible for
creating a copy of a publication (residing on a publisher) and transferring it
to an intermediary location (from which it is subsequently delivered by the
Distribution Agent to its subscribers via network or removable media, depending
on configuration). Note that this agent plays an active role in all three
replication types (although its use is considerably less frequent when
employing a transactional or merge approach). -
Distribution Agent – delivers a copy of the snapshot (in case of
snapshot and transactional replication) and transactions (for transactional
replication) from the distributor to subscribers. In the case of push
replication, it runs on the distributor, but with pull subscriptions, it
operates on subscribers. It is not involved in merge replication. -
Log Reader Agent – residing on distributor, handles parsing of
the transaction log on the publisher in transactional replication (it is not
used with other replication categories) in search for relevant entries (marked
for replication) and copies them to the distributor. -
Queue Reader Agent – is specific to the transactional replication
model with updating subscribers. It operates on the distributor, facilitating
updates initiated by subscribers by reading transactions recorded in a queue
hosted in a subscriber database and applying them to the publisher database. -
Merge Agent – applicable to merge replication only, it collects
incremental changes from all subscribers and the publisher since the most
recent reconciliation, resolves any potential conflicts, and produces a
consistent outcome that is applied to all replication partners. In the case of
the push replication scenarios, the agent runs on distributor, but with pull
implementations, it is hosted on subscribers. It also performs the initial copy
of a snapshot to subscribers, eliminating the need for involvement of the
Distribution Agent (which is responsible for this activity in snapshot and
transactional replication).
Following this brief introduction to replication, let’s concentrate on its
characteristics that are relevant within the context of our series. First of
all, it is important to realize that SQL Server 2005 Express Edition instances
are limited to the role of a subscriber, although with its ability to participate
as a subscriber in all types of replication topology, you can actually publish
its changes in a bi-directional merge replication. (Incidentally, this makes it
inferior to its predecessor – Microsoft SQL Server Desktop Engine, better known
as MSDE, which, in addition to functioning as a subscriber, supported the roles
of a publisher and a local distributor in snapshot or merge replication
arrangements). Furthermore, since SQL Server
2005 Express Edition does not include SQL Server Agent (also present in MSDE),
it does not directly support replication agents running on subscribers
(such as merge or distribution agents in the pull configuration). In order to
emulate their functionality, you will need to resort to workarounds that
involve the use of such mechanisms as Windows Synchronization Manager, invoking
replication-specific executables (such as replmerg.exe), or coding with
.NET-based Replication Management Objects (RMO). The Windows Synchronization
utility has been available since the release of Windows Server 2000 and
Internet Explorer 5.0, via Synchronize shortcut (rebranded as Sync Center in
Windows Vista) in the Accessories program group. While its primary purpose was
to provide a central point of administering synchronization for e-mail, offline
files, and Web pages, its capabilities have been extended to accommodate such
aspects of SQL Server 2005 Express Edition replication as synchronizing,
reinitializing, deleting, or modifying an update mode (when dealing with
updating subscribers) for its subscriptions. Replication Management Objects
assembly, combined with managed code, offers programming methods of
manipulating responsibilities of replication agents.
You should keep in mind that replication functionality is not incorporated
by default in the SQL Server 2005 Express Edition installation. The option
controlling this behavior is accessible by expanding the Database Services node
on the Feature Selection page of the setup wizard and can be modified by
assigning "Will be installed on local hard drive" value to its
Replication entry. In addition, if you intend to take advantage of the
connectivity and Replication Management Objects (RMO), you should apply the
same setting to the Connectivity Components subnode of the Clients Components node
on the same page of the wizard. In case you missed these steps during initial
setup, simply launch SQLEXPR32.EXE (or SQLEXPR.EXE for 64-bit systems) to
modify an existing instance (for the background information regarding this
process, refer to our
earlier article).
By leveraging technologies described in this article, you can deploy
arbitrarily filtered portions of a database residing on a designated SQL Server
2005 Standard or Enterprise instance to multiple SQL Server 2005 Express
Edition systems and maintain consistency between them. This approach provides a
cost-efficient solution in situations where remote users must be able to not
only remain fully operational even if network connectivity to a central office
is limited, but also maintain access to up-to-date data through on-demand
synchronization. In our next article of this series, we will review a few
examples illustrating such implementations.