The Missing Sync

Rich Internet Applications
The Mobile Connection
Data Synchronization: A Surprisingly Knotty Problem
A Simple Example
What Next?
6.Maintaining Data
7. And After That…

Rich Internet Applications

Once upon a time, there were two
competing visions of computer applications. The powerful but aging desktop
application had rich feature sets and history on its side. The energetic and
brash but still scrawny web-based application had key benefits on its side such
as ease of software updates, availability from any computer, and ease of
collaboration, but it was obviously lacking in features.

However, this competition doesn’t end with a single victor;
instead, each competitor takes on facets of the other. Desktop applications are
integrating internet features, including storage “in the cloud”. Meanwhile
web-based applications are adding offline capabilities and building in a
desktop component. There is a single emerging hybrid architecture called Rich
Internet Application (RIA) that has the best features of desktop and online
applications.

Major companies like Google, Adobe and Microsoft are
investing in this hybrid architecture. They are building platforms for RIAs
that take advantage of almost-permanent connectivity and also provide a local offline
store in which to store data for offline use or to give a performance boost.
The system as a whole is not isolated (like a traditional desktop application)
or tightly connected (like a hosted web-based application) but is loosely
connected. Live data resides in many places, and must move around the system as
and when it can.

This loosely connected architecture promises many benefits
for application users. From the web-based world, it gets collaboration, secure
storage, and access from any computer. From the desktop world, it gets a rich
feature set, the performance benefit of local storage, and access at any time
whether or not connected.

The Mobile Connection

The loosely connected architecture is not new. One part of
the computer industry has had to deal with issues of intermittent connectivity
for years, as well as balancing the costs and benefits of network access: the
mobile enterprise computing industry. Many mobile enterprise applications have adopted
an “always-available” or “occasionally-connected” architecture: they store data
on the device so that the application can be used whether or not a network is
available, and synchronize that data periodically. As with web-based
applications, the ultimate home of the data is not the device but is a central
data store on a server. As with other collaborative applications, any piece of
data may be shared among many devices.

Looking at the occasionally connected mobile computing
architecture shines a light on a key technology that is still unrealized in the
RIA world: data synchronization. In the RIA world, data synchronization is
still on the horizon; those who have to tackle it try to do so in an ad-hoc
manner as best they can and those who are developing the platforms, while
realizing how important data synchronization is, are unsure where it belongs.
Is it an application feature or is it a platform-level feature? Moreover, if it
belongs at the platform level, then how should it be implemented?

The experience of mobile enterprise computing shows that
while much of the logic of synchronization is application specific, there is a
common set of core features that belongs in the platform. It is time to take a
closer look at the data synchronization problem, and the mobile computing world
is a good place to start.

Data Synchronization: A Surprisingly Knotty Problem

At first blush, data synchronization does not sound too
complex. You need to send data from the application’s local store to a server,
and also send data from the server down to the local store. That doesn’t sound
too challenging does it? Upload a few items; download a few items. Done. However,
it’s not so simple. Data synchronization is more complicated than you might
think and online applications cannot be automatically and simply ported to work
in an “occasionally connected” manner. Writing a data synchronization layer is
a bit like writing a database – usually, it is not something you want to do
yourself. Data management and data synchronization should be considered a
discrete layer (a data platform) below the general application layer; this article
looks at what services that layer needs to provide.

A Simple Example

To get a feel for data synchronization, let’s look at a
simple example that is one small piece of many applications: a list of
contacts.

At this stage, we’ll say nothing about how the data is
stored: in principle, you could store a set of these records as a plain text
file, an XML file, a database, or in an object store. But, being a list, it
makes sense to store each contact in a separate record. Most of the fields in
the record will be descriptive information, such as name, address, time of most
recent contact, and so on. In addition, the record itself needs to be
distinguished so that it can be uniquely accessed (there may be two Jane Smiths,
or two contacts at the same address, and so on), so we give it a unique ID
value.

A typical record looks like this:

Contact ID	Name	Address	City	Last Contact
102	Jane Smith	123 Evergreen Terrace	Springfield	2007-05-31 10:00

The “home” of the contact list is on the servers of your
organization, and the first challenge for data synchronization is to deliver to
each mobile user a copy of the contacts that he or she needs, and only those.

We’ll divide up the contacts by city, so that one group of users
gets the contacts in Springfield, another group gets the contacts in
Shelbyville, a group of managers gets both sets of contacts, and so on. This
kind of division, in varying forms, is typical for all sorts of data in
business applications: think of delivery locations, customer addresses, and so
on.

The second challenge for data synchronization is to keep
this list of contacts correct as changes are made. Some changes are made at the
server (imagine a contact calling the company to tell them of a change of
address, for example) and some are made at the application (the Last Contact time
will be updated when an application user talks to a customer). To see what these
two synchronization challenges really involve, lets walk through a few simple
scenarios and see what jobs the data synchronization layer has to do.

4.1. Downloading the Contact Lists

The first time a user starts her application, it downloads
the list of contacts from the server. The server could store the data in any
format but for now let’s assume it’s a database.

It would be possible to just download the complete contact
list to each and every user, but that would be wasteful in a number of ways:

Network traffic – downloading unnecessary data over a wireless
connection is a waste of money, and over any connection, it is a waste of time.
Data storage – storing unnecessary data on devices uses up memory
and may require the purchase of more expensive hardware. For some applications,
unfiltered data will not fit on a single device.
Application performance – filtering unnecessary items out of
searches or lists will slow down all aspects of application behavior.
User experience – if the application exposes irrelevant data to
the user, it also has to help them navigate around this data.

To solve the problem, the synchronization layer needs to partition
the data. When a mobile user connects, they need to be identified to the
system. Information associated with that user can be exploited to download the correct
information. For example, the user could be mapped to a separate city (or list
of cities) in a table in the consolidated database.

Row ID	User ID	City
1	0001	Springfield
2	0002	Shelbyville
3	0003	Springfield
4	0003	Shelbyville
5	…	…

The rows for user 0002 can then be downloaded by matching
all rows with a City field of Shelbyville, while user 0003 gets both the
Shelbyville and the Springfield contacts.

One way a data synchronization layer may handle this is by
using a SQL query against the database to select the rows to be downloaded; SQL
allows them to use joins to reach across tables and download rows associated
with a given user, even if there is nothing in the particular row that directly
identifies the user.

4.2. Adding a Contact

Now that each user has their list of contacts, what happens
when one of them adds a new contact?

Imagine user 0002 adding this contact:

Contact ID	Name	Address	City	Last Contact
903	Eric New	123 Deciduous Drive	Shelbyville	2007-05-31 2:15PM

This new record needs to be uploaded to the server, and it
also needs to go to user 0003.

Thinking about this problem makes it clear that what is
being sent back and forth during efficient data synchronization is not “the
data” but changes to the data: we want to send just the new row and
no others to the server. A data synchronization system needs to find a way
to pick out the new contacts from all the others. If the data synchronization
technology is separate from the data management technology then this task has
to be bolted on to the side – perhaps by adding a timestamp to the table and
tracking the last time synchronization took place. If the data synchronization
layer is integrated with data management then the change tracking can be
implemented at a more efficient lower level.

Here is another challenge: looking at the new contact
record, you will see that it is assigned a Contact ID value of 903. This value must
be unique across the entire system, and yet the application is not guaranteed
to have access to records being added by other users on other devices. How can
we guarantee the uniqueness of key values?

One mechanism is to construct keys as Universal Unique
Identifiers (UUIDs) – long strings of alphanumeric values constructed from
device-specific and time-specific data in such a way as to be guaranteed unique.
Another way is to partition the set of possible keys across the applications and
maintain a pool of ID values in each local store. Whichever is right for your
case, the synchronization layer has to help manage it.

4.3. Deleting a Contact

Things get more tricky when you think about deleting
records. Imagine that an application user learns that a contact has resigned
from his job, so that his contact information needs to be deleted from the
whole system. The user deletes the record from the data store in her application.
That delete needs to be sent up to the server as part of the synchronization,
but (of course) the record is no longer present to be sent. How can we send up
a record that no longer exists?

If you were implementing your own synchronization system,
you would need to hold a tracking table that keeps deleted rows around until
the delete operation has been sent to the server and the server has confirmed
receipt, and then clears out the tracking table so that it doesn’t get sent
again. A synchronization layer needs to have this feature built in as part of its
general change-tracking mechanism, so that application developer does not need
to implement the additional code to track these deletes.

If a contact is deleted by a user accessing the server directly,
rather than an application, then a separate solution is needed to ensure
that the corresponding row is deleted at the application. A solution may
involve accessing transaction logs from the server, or maintaining shadow
tables or a status column.

4.4. Updating a Contact

Something similar happens when you update a contact’s
information. Imagine that two users update information about the same contact,
but the first one to update their local data is unable to synchronize until the
next day, while the second user synchronizes immediately.

When the second user synchronizes, he will send up an
updated “Last Contact” time, which replaces the value at the server. When the
first user synchronizes the next day, her (earlier) “Last Contact” time is sent
up. In this case, the correct behaviour is that the older “Last Contact” time
should not replace the newer “Last Contact” time at the server.

The synchronization layer needs to do two things here. The
first is to identify the fact that a conflict has occurred: a record has been
changed at two separate applications, and just applying the changes in the
order they are synchronized does not always do the right thing. The second is
to take the right action to resolve the conflict: in this case, keep the later
of the two times.

Depending on the nature of the data that is in conflict, and
the business rules governing that data, different rules need to be implemented
to resolve the change. For example, in an inventory table you might want
uploads to be additive while in another situation you may want one user’s input
to overwrite another’s.

To identify a conflict, you need to send up the “old”
version of the values as well as the “new” version, so that you can check if
the data on the server has been changed since the last time it was downloaded
to the device. This is similar to the case of deleted records: information that
is no longer in the database must nevertheless be sent up to the server. The
role of a synchronization layer is to handle such problems automatically, or to
provide the application developer with a mechanism for implementing custom
conflict resolution mechanisms.

4.5. Non-synchronized Deletes

We have walked through the synchronization of deletes, but
there are often cases where, for reasons of performance, space or security, you
want to delete rows at the application but not at the server. A typical
example is if you are keeping a diary of events, and only want to keep the most
recent month’s events locally. Obviously, you don’t want to wipe out the old
events from the entire system, just from your local store.

In this case, a synchronization system must provide a way to
turn off change-tracking so that you can clean out old records, and then resume
change-tracking. It’s just one more non-obvious feature that you need to look
for in a serious data synchronization system.

Synchronization is starting to look complicated now: tracking
changes at the application and at the server, sending the right changes, making
sure that unique key values are preserved, identifying and resolving conflicts
when they occur – this is a substantial list of tasks for a synchronization
system to implement.

But we are not finished yet.

What Next?

The simple changes we have looked at so far are all for a
single list, with a well-defined set of rules for who gets which contacts. However,
in the real world things change. Some users will be added, others will change
duties; new versions of the application will be rolled out, from minor tweaks
to major revisions. A data synchronization system has to provide the facilities
you need to ensure that your application can continue to evolve.

5.1. Reassigning Data

Here is one common scenario: a promotion results in user
0002 working with contacts from Springfield rather than from Shelbyville. At
the next synchronization, his Shelbyville contacts need to be deleted from his
application and the Springfield contacts need to be downloaded.

This kind of reassignment of data happens in many circumstances.
For example, schedule changes for field service workers can lead to a
rearrangement of routes. If a repair worker is held up at an appointment, other
customer visits have to be reassigned to other workers’ schedules.

The synchronization system has to make the following
changes:

1. Complete
a final upload from user 0002’s device of the “old” set of data (any changes to
Shelbyville contacts).

2. Download
a set of operations that delete the Shelbyville records.

3. Download
the Springfield contacts to user 0002.

Doing this for the contact list is one thing. If each
contact has separate information associated with them, held in other tables
(items purchased, say, or appointments) then that information must be cleared
up as well, while respecting foreign key constraints.

In this situation, no records have been deleted from the
system or added to the system. Nevertheless, records do need to be deleted from
and added to individual devices. What’s more, this is not even a matter of
changing updates (change of owner) into deletes or inserts: some records that
have not even been updated (such as appointments) linked to the user ID will
need to be deleted from some devices and added to others.

To implement this kind of feature, a synchronization system
must have a separate mechanism for downloading deletes so that the reassignment
can trigger the appropriate operations at the applications. When a contact is
deleted from an application, relational database features such as cascading
deletes are one way of ensuring that all associated data in other tables can be
automatically cleaned out, without having to take up the network bandwidth of
downloading the deletes for each record explicitly. (The synchronization
software must be smart enough, of course, not to synchronize these deletes back
up to the server). It must also be able to identify all the new data at the
server that is needed on each application. Data reassignment is a problem that
needs to be thought through properly in any synchronization system.

5.2. Application changes

Applications are constantly changing, and data
synchronization has to be able to handle application changes, whether they are minor
or major system upgrades, perhaps pilot projects, and other special cases where
a non-standard application must be used. A new application may need different
synchronization logic to the old application. Additional columns, new tables,
and more elaborate logic may be added on.

The upgrade problem is complicated by the loosely-connected nature
of RIA applications: an upgrade cannot happen instantaneously. For some period,
there will inevitably be two versions of an application running against the
same server. If you carry out pilot projects with small portions of the
workforce, then multiple application versions may be the norm.

The ability to implement multiple sets of synchronization
logic on a single server requires a separation of the logic from the underlying
schema and also the ability to upgrade the schema of local data stores.

6.Maintaining Data

Getting synchronization logic to work in a testing
environment, with a reliable high-speed network and a handful of users, is one
thing. Making it work in a production environment – with many users,
intermittent networks, no IT access to the applications and so on – is something
else. Events that are rare in small-scale, controlled environments become
important in the field. Here are a few such events.

6.1. Interrupted synchronization

If network coverage is lost part-way during a
synchronization, there are several concerns:

The data on each side must be left in a correct and consistent
state so that applications can continue working properly. Anything that leaves
a possibility of an inconsistent state is going to lead to problems. Incorrect
query results are one symptom of inconsistent data, of course, but application
errors can also result when – for example – a row that is expected to be
present does not exist. Moreover, once you start making changes to incorrect
data and synchronizing those changes, the lack of correctness can propagate
throughout the system.
To enforce data correctness, each data upload or download must be
atomic (take place in a single transaction). If referential integrity is broken
at the server, for example, the consistency of the central repository of the
data can be compromised. Conflict resolution must take place within the same
transaction as the other changes that are being made. On the other hand, error
reporting operations must take place outside the main transaction so that the
error report is not lost if the transaction fails.
The synchronization system must successfully track what the state
of that data is so that it can send the proper set of changes in subsequent
synchronizations. This requires an acknowledgement step that is guaranteed to be
atomic.
If a partial download is accomplished, the application should not
have to download that data again. A resumable download feature is particularly
important over wireless networks, where data transfer can be expensive.
The dropped connection must not tie up resources, particularly at
the server.

These are each challenging issues. What if changes to a
two-table database are being downloaded and the connection is dropped? If a new
contact is downloaded, but the company they work for is not, can the application
handle that?

7. And After That…

The bookkeeping required to track changes, track the state
of each client at the server, and maintain the state of synchronization, can
cripple some synchronization systems. The bottom line is that a large-scale
synchronization system involves many applications making changes to a single
shared set of data, and scalability is going to be a challenge in such
environments.

Whether it is separate threads to manage database
connections and client connections; configurable timeouts to ensure that
resources are used properly; a robust protocol to make sure that connections
are kept alive if expensive operations are being carried out on device or in
the server; or all the gear to ensure the integrity of the data is maintained
no matter what happens in the synchronization environment, a production-quality
data synchronization system is a complex and demanding piece of infrastructure.

This article just scrapes the surface of what is needed in a
robust data synchronization system. As you develop mobile or rich internet applications,
it becomes obvious that there are other features you may want to tap into.
Perhaps some changes are more urgent than others are, and need to be
synchronized with higher priority than the remainder. What about secure
authentication at each point in the chain? Does your data need to be encrypted?
And what about high-availability, or synchronizations initiated by the server
rather than by the client? As loosely connected applications grow in scope and
complexity, these and many other core capabilities are best built into a data
synchronization layer.

About the author:

After writing SQL Anywhere documentation for a decade Tom Slee moved into product management in 2005, specializing in mobile database management and data synchronization. Before joining Sybase, he carried out research in theoretical chemistry at the Universities of Waterloo and Oxford. He is also the author of No One Makes You Shop at Wal-Mart: The Surprising Deceptions of Individual Choice (2006) which has been used in university sociology, philosophy, and economics courses.