SQL Server 2005 Express Edition – Part 14 – Replication Support

In our series dedicated to SQL Server 2005 Express Edition, we have been
focusing mainly on characteristics that emphasize its unique capabilities,
making it a preferred choice in variety of scenarios, despite the apparent
superiority of its full-featured counterparts. However, it is fairly obvious
that its small footprint and free-of-charge use also have negative
implications, imposing restrictions on functionality available in the Standard
or Enterprise editions. This is especially conspicuous in the case of
replication, which we will explore starting with this installment.

Before we turn our attention to aspects of replication specific to SQL
Server 2005 Express Edition, let’s briefly review its terminology and
principles that are important to our discussion. The main purpose of
replication is to provide a mechanism for creating multiple copies of the same
data (including underlying metadata) and keeping them synchronized, regardless
of where updates originate. While its basic terminology is derived from the
world of traditional magazine publishing, some of its concepts appear to be
better represented by Web 2.0-based online technologies, such as wikis or blogs
(which allow partial or multi-directional updates to publications). In general,
though, both of these analogies work well when analyzed from the point of view
of three main entities participating in the replication:

  • publisher – a database instance, which contains the original
    data. Its portions that are designated for replication are known as
    publications and consist of one or more units called articles, representing a
    database object such as a table, view, or stored procedure (which, incidentally,
    can be further filtered to limit the amount of replicated data).
  • subscriber – a database instance, which receives a copy of data
    that was made available by a publisher. Depending on configuration, a
    subscriber might be allowed to make changes to the local data and replicate it
    either back to the publisher or to other subscribers (in this way assuming the
    dual role of the publisher and subscriber)
  • distributor – a database instance responsible for hosting
    metadata needed to control and monitor data copy and its synchronization
    between a publisher and its subscribers.

Note that, as we mentioned, some of these objectives can be assigned (for
example a subscriber can function as a publisher) to the same database,
although distributor is always separate and unique per publisher. (However,
this still allows for publisher databases to be hosted on the same SQL Server
2005 instance as the distributor, which, in such configurations, is known as
"local"). In addition, it is possible to intermingle multiple
replication arrangements (e.g. a single subscriber can receive distinct
publications from several, independent publishers). Despite the wide range of
possible variations, there are three basic replication categories, determined
by the roles of participants in the process of data modification as well as the
mechanism and frequency of updates:

  • snapshot – the simplest but also the least efficient model, which
    involves copying entire publication (in the form of its snapshot) from a
    publisher to its subscribers. This is applicable mainly in scenarios where the
    frequency and number of changes is very limited (or one-time data copy is
    needed). Since updates are propagated unidirectionally,
    any changes to subscribers are automatically overwritten (hence, their data is
    assumed to be read-only).
  • transactional – relies on transactions to apply individual
    modifications to subscribers shortly after they take place on a publisher (by
    monitoring the transaction log of the original database and replaying each
    entry that affects content of the publication). In the most standard form,
    changes are only allowed on the publisher, but there are a couple of variations
    to this basic scheme known as updating subscribers and peer-to-peer that allow
    bidirectional updates. (Potential conflicts that occur when two replication
    partners attempt to simultaneously modify the same article are resolved
    according to an arbitrarily defined resolution policy)
  • merge – the most complex but offering the highest degree of
    flexibility, with changes that occur independently on a publisher and its
    subscribers (accommodating scenarios when connection between them is available
    only occasionally). Any intermediate updates are stored in system tables, which
    are subsequently used during periodical replication to propagate changes and
    apply conflict resolution on the row-by-row basis (rather than by employing
    transactional mechanisms).

All the replication arrangements described above can be configured as either
push (with publisher and distributor controlling the frequency of updates) or
pull (changes are applied following explicit requests from a subscriber).
Individual tasks that facilitate data exchange are handled by specialized
processes called agents (implemented as separate executables located in the Program Files/Microsoft SQL Server/90/COM
folder and operating typically within the confines of SQL Server Agent jobs),
and include the following:

  • Snapshot Agent – running on distributor, it is responsible for
    creating a copy of a publication (residing on a publisher) and transferring it
    to an intermediary location (from which it is subsequently delivered by the
    Distribution Agent to its subscribers via network or removable media, depending
    on configuration). Note that this agent plays an active role in all three
    replication types (although its use is considerably less frequent when
    employing a transactional or merge approach).
  • Distribution Agent – delivers a copy of the snapshot (in case of
    snapshot and transactional replication) and transactions (for transactional
    replication) from the distributor to subscribers. In the case of push
    replication, it runs on the distributor, but with pull subscriptions, it
    operates on subscribers. It is not involved in merge replication.
  • Log Reader Agent – residing on distributor, handles parsing of
    the transaction log on the publisher in transactional replication (it is not
    used with other replication categories) in search for relevant entries (marked
    for replication) and copies them to the distributor.
  • Queue Reader Agent – is specific to the transactional replication
    model with updating subscribers. It operates on the distributor, facilitating
    updates initiated by subscribers by reading transactions recorded in a queue
    hosted in a subscriber database and applying them to the publisher database.
  • Merge Agent – applicable to merge replication only, it collects
    incremental changes from all subscribers and the publisher since the most
    recent reconciliation, resolves any potential conflicts, and produces a
    consistent outcome that is applied to all replication partners. In the case of
    the push replication scenarios, the agent runs on distributor, but with pull
    implementations, it is hosted on subscribers. It also performs the initial copy
    of a snapshot to subscribers, eliminating the need for involvement of the
    Distribution Agent (which is responsible for this activity in snapshot and
    transactional replication).

Following this brief introduction to replication, let’s concentrate on its
characteristics that are relevant within the context of our series. First of
all, it is important to realize that SQL Server 2005 Express Edition instances
are limited to the role of a subscriber, although with its ability to participate
as a subscriber in all types of replication topology, you can actually publish
its changes in a bi-directional merge replication. (Incidentally, this makes it
inferior to its predecessor – Microsoft SQL Server Desktop Engine, better known
as MSDE, which, in addition to functioning as a subscriber, supported the roles
of a publisher and a local distributor in snapshot or merge replication
arrangements). Furthermore, since SQL Server
2005 Express Edition does not include SQL Server Agent (also present in MSDE),
it does not directly support replication agents running on subscribers
(such as merge or distribution agents in the pull configuration). In order to
emulate their functionality, you will need to resort to workarounds that
involve the use of such mechanisms as Windows Synchronization Manager, invoking
replication-specific executables (such as replmerg.exe), or coding with
.NET-based Replication Management Objects (RMO). The Windows Synchronization
utility has been available since the release of Windows Server 2000 and
Internet Explorer 5.0, via Synchronize shortcut (rebranded as Sync Center in
Windows Vista) in the Accessories program group. While its primary purpose was
to provide a central point of administering synchronization for e-mail, offline
files, and Web pages, its capabilities have been extended to accommodate such
aspects of SQL Server 2005 Express Edition replication as synchronizing,
reinitializing, deleting, or modifying an update mode (when dealing with
updating subscribers) for its subscriptions. Replication Management Objects
assembly, combined with managed code, offers programming methods of
manipulating responsibilities of replication agents.

You should keep in mind that replication functionality is not incorporated
by default in the SQL Server 2005 Express Edition installation. The option
controlling this behavior is accessible by expanding the Database Services node
on the Feature Selection page of the setup wizard and can be modified by
assigning "Will be installed on local hard drive" value to its
Replication entry. In addition, if you intend to take advantage of the
connectivity and Replication Management Objects (RMO), you should apply the
same setting to the Connectivity Components subnode of the Clients Components node
on the same page of the wizard. In case you missed these steps during initial
setup, simply launch SQLEXPR32.EXE (or SQLEXPR.EXE for 64-bit systems) to
modify an existing instance (for the background information regarding this
process, refer to our
earlier article
).

By leveraging technologies described in this article, you can deploy
arbitrarily filtered portions of a database residing on a designated SQL Server
2005 Standard or Enterprise instance to multiple SQL Server 2005 Express
Edition systems and maintain consistency between them. This approach provides a
cost-efficient solution in situations where remote users must be able to not
only remain fully operational even if network connectivity to a central office
is limited, but also maintain access to up-to-date data through on-demand
synchronization. In our next article of this series, we will review a few
examples illustrating such implementations.

»


See All Articles by Columnist
Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles