The Missing Sync
September 25, 2008
by Tom Slee
Rich Internet Applications
Once upon a time, there were two competing visions of computer applications. The powerful but aging desktop application had rich feature sets and history on its side. The energetic and brash but still scrawny web-based application had key benefits on its side such as ease of software updates, availability from any computer, and ease of collaboration, but it was obviously lacking in features.
However, this competition doesnt end with a single victor; instead, each competitor takes on facets of the other. Desktop applications are integrating internet features, including storage in the cloud. Meanwhile web-based applications are adding offline capabilities and building in a desktop component. There is a single emerging hybrid architecture called Rich Internet Application (RIA) that has the best features of desktop and online applications.
Major companies like Google, Adobe and Microsoft are investing in this hybrid architecture. They are building platforms for RIAs that take advantage of almost-permanent connectivity and also provide a local offline store in which to store data for offline use or to give a performance boost. The system as a whole is not isolated (like a traditional desktop application) or tightly connected (like a hosted web-based application) but is loosely connected. Live data resides in many places, and must move around the system as and when it can.
This loosely connected architecture promises many benefits for application users. From the web-based world, it gets collaboration, secure storage, and access from any computer. From the desktop world, it gets a rich feature set, the performance benefit of local storage, and access at any time whether or not connected.
The Mobile Connection
The loosely connected architecture is not new. One part of the computer industry has had to deal with issues of intermittent connectivity for years, as well as balancing the costs and benefits of network access: the mobile enterprise computing industry. Many mobile enterprise applications have adopted an always-available or occasionally-connected architecture: they store data on the device so that the application can be used whether or not a network is available, and synchronize that data periodically. As with web-based applications, the ultimate home of the data is not the device but is a central data store on a server. As with other collaborative applications, any piece of data may be shared among many devices.
Looking at the occasionally connected mobile computing architecture shines a light on a key technology that is still unrealized in the RIA world: data synchronization. In the RIA world, data synchronization is still on the horizon; those who have to tackle it try to do so in an ad-hoc manner as best they can and those who are developing the platforms, while realizing how important data synchronization is, are unsure where it belongs. Is it an application feature or is it a platform-level feature? Moreover, if it belongs at the platform level, then how should it be implemented?
The experience of mobile enterprise computing shows that while much of the logic of synchronization is application specific, there is a common set of core features that belongs in the platform. It is time to take a closer look at the data synchronization problem, and the mobile computing world is a good place to start.
Data Synchronization: A Surprisingly Knotty Problem
At first blush, data synchronization does not sound too complex. You need to send data from the applications local store to a server, and also send data from the server down to the local store. That doesnt sound too challenging does it? Upload a few items; download a few items. Done. However, its not so simple. Data synchronization is more complicated than you might think and online applications cannot be automatically and simply ported to work in an occasionally connected manner. Writing a data synchronization layer is a bit like writing a database usually, it is not something you want to do yourself. Data management and data synchronization should be considered a discrete layer (a data platform) below the general application layer; this article looks at what services that layer needs to provide.
A Simple Example
To get a feel for data synchronization, lets look at a simple example that is one small piece of many applications: a list of contacts.
At this stage, well say nothing about how the data is stored: in principle, you could store a set of these records as a plain text file, an XML file, a database, or in an object store. But, being a list, it makes sense to store each contact in a separate record. Most of the fields in the record will be descriptive information, such as name, address, time of most recent contact, and so on. In addition, the record itself needs to be distinguished so that it can be uniquely accessed (there may be two Jane Smiths, or two contacts at the same address, and so on), so we give it a unique ID value.
A typical record looks like this:
The home of the contact list is on the servers of your organization, and the first challenge for data synchronization is to deliver to each mobile user a copy of the contacts that he or she needs, and only those.
Well divide up the contacts by city, so that one group of users gets the contacts in Springfield, another group gets the contacts in Shelbyville, a group of managers gets both sets of contacts, and so on. This kind of division, in varying forms, is typical for all sorts of data in business applications: think of delivery locations, customer addresses, and so on.
The second challenge for data synchronization is to keep this list of contacts correct as changes are made. Some changes are made at the server (imagine a contact calling the company to tell them of a change of address, for example) and some are made at the application (the Last Contact time will be updated when an application user talks to a customer). To see what these two synchronization challenges really involve, lets walk through a few simple scenarios and see what jobs the data synchronization layer has to do.
4.1. Downloading the Contact Lists
The first time a user starts her application, it downloads the list of contacts from the server. The server could store the data in any format but for now lets assume its a database.
It would be possible to just download the complete contact list to each and every user, but that would be wasteful in a number of ways:
To solve the problem, the synchronization layer needs to partition the data. When a mobile user connects, they need to be identified to the system. Information associated with that user can be exploited to download the correct information. For example, the user could be mapped to a separate city (or list of cities) in a table in the consolidated database.
The rows for user 0002 can then be downloaded by matching all rows with a City field of Shelbyville, while user 0003 gets both the Shelbyville and the Springfield contacts.
One way a data synchronization layer may handle this is by using a SQL query against the database to select the rows to be downloaded; SQL allows them to use joins to reach across tables and download rows associated with a given user, even if there is nothing in the particular row that directly identifies the user.
4.2. Adding a Contact
Now that each user has their list of contacts, what happens when one of them adds a new contact?
Imagine user 0002 adding this contact:
This new record needs to be uploaded to the server, and it also needs to go to user 0003.
Thinking about this problem makes it clear that what is being sent back and forth during efficient data synchronization is not the data but changes to the data: we want to send just the new row and no others to the server. A data synchronization system needs to find a way to pick out the new contacts from all the others. If the data synchronization technology is separate from the data management technology then this task has to be bolted on to the side perhaps by adding a timestamp to the table and tracking the last time synchronization took place. If the data synchronization layer is integrated with data management then the change tracking can be implemented at a more efficient lower level.
Here is another challenge: looking at the new contact record, you will see that it is assigned a Contact ID value of 903. This value must be unique across the entire system, and yet the application is not guaranteed to have access to records being added by other users on other devices. How can we guarantee the uniqueness of key values?
One mechanism is to construct keys as Universal Unique Identifiers (UUIDs) long strings of alphanumeric values constructed from device-specific and time-specific data in such a way as to be guaranteed unique. Another way is to partition the set of possible keys across the applications and maintain a pool of ID values in each local store. Whichever is right for your case, the synchronization layer has to help manage it.
4.3. Deleting a Contact
Things get more tricky when you think about deleting records. Imagine that an application user learns that a contact has resigned from his job, so that his contact information needs to be deleted from the whole system. The user deletes the record from the data store in her application. That delete needs to be sent up to the server as part of the synchronization, but (of course) the record is no longer present to be sent. How can we send up a record that no longer exists?
If you were implementing your own synchronization system, you would need to hold a tracking table that keeps deleted rows around until the delete operation has been sent to the server and the server has confirmed receipt, and then clears out the tracking table so that it doesnt get sent again. A synchronization layer needs to have this feature built in as part of its general change-tracking mechanism, so that application developer does not need to implement the additional code to track these deletes.
If a contact is deleted by a user accessing the server directly, rather than an application, then a separate solution is needed to ensure that the corresponding row is deleted at the application. A solution may involve accessing transaction logs from the server, or maintaining shadow tables or a status column.
4.4. Updating a Contact
Something similar happens when you update a contacts information. Imagine that two users update information about the same contact, but the first one to update their local data is unable to synchronize until the next day, while the second user synchronizes immediately.
When the second user synchronizes, he will send up an updated Last Contact time, which replaces the value at the server. When the first user synchronizes the next day, her (earlier) Last Contact time is sent up. In this case, the correct behaviour is that the older Last Contact time should not replace the newer Last Contact time at the server.
The synchronization layer needs to do two things here. The first is to identify the fact that a conflict has occurred: a record has been changed at two separate applications, and just applying the changes in the order they are synchronized does not always do the right thing. The second is to take the right action to resolve the conflict: in this case, keep the later of the two times.
Depending on the nature of the data that is in conflict, and the business rules governing that data, different rules need to be implemented to resolve the change. For example, in an inventory table you might want uploads to be additive while in another situation you may want one users input to overwrite anothers.
To identify a conflict, you need to send up the old version of the values as well as the new version, so that you can check if the data on the server has been changed since the last time it was downloaded to the device. This is similar to the case of deleted records: information that is no longer in the database must nevertheless be sent up to the server. The role of a synchronization layer is to handle such problems automatically, or to provide the application developer with a mechanism for implementing custom conflict resolution mechanisms.
4.5. Non-synchronized Deletes
We have walked through the synchronization of deletes, but there are often cases where, for reasons of performance, space or security, you want to delete rows at the application but not at the server. A typical example is if you are keeping a diary of events, and only want to keep the most recent months events locally. Obviously, you dont want to wipe out the old events from the entire system, just from your local store.
In this case, a synchronization system must provide a way to turn off change-tracking so that you can clean out old records, and then resume change-tracking. Its just one more non-obvious feature that you need to look for in a serious data synchronization system.
Synchronization is starting to look complicated now: tracking changes at the application and at the server, sending the right changes, making sure that unique key values are preserved, identifying and resolving conflicts when they occur this is a substantial list of tasks for a synchronization system to implement.
But we are not finished yet.
The simple changes we have looked at so far are all for a single list, with a well-defined set of rules for who gets which contacts. However, in the real world things change. Some users will be added, others will change duties; new versions of the application will be rolled out, from minor tweaks to major revisions. A data synchronization system has to provide the facilities you need to ensure that your application can continue to evolve.
5.1. Reassigning Data
Here is one common scenario: a promotion results in user 0002 working with contacts from Springfield rather than from Shelbyville. At the next synchronization, his Shelbyville contacts need to be deleted from his application and the Springfield contacts need to be downloaded.
This kind of reassignment of data happens in many circumstances. For example, schedule changes for field service workers can lead to a rearrangement of routes. If a repair worker is held up at an appointment, other customer visits have to be reassigned to other workers schedules.
The synchronization system has to make the following changes:
1. Complete a final upload from user 0002s device of the old set of data (any changes to Shelbyville contacts).
2. Download a set of operations that delete the Shelbyville records.
3. Download the Springfield contacts to user 0002.
Doing this for the contact list is one thing. If each contact has separate information associated with them, held in other tables (items purchased, say, or appointments) then that information must be cleared up as well, while respecting foreign key constraints.
In this situation, no records have been deleted from the system or added to the system. Nevertheless, records do need to be deleted from and added to individual devices. Whats more, this is not even a matter of changing updates (change of owner) into deletes or inserts: some records that have not even been updated (such as appointments) linked to the user ID will need to be deleted from some devices and added to others.
To implement this kind of feature, a synchronization system must have a separate mechanism for downloading deletes so that the reassignment can trigger the appropriate operations at the applications. When a contact is deleted from an application, relational database features such as cascading deletes are one way of ensuring that all associated data in other tables can be automatically cleaned out, without having to take up the network bandwidth of downloading the deletes for each record explicitly. (The synchronization software must be smart enough, of course, not to synchronize these deletes back up to the server). It must also be able to identify all the new data at the server that is needed on each application. Data reassignment is a problem that needs to be thought through properly in any synchronization system.
5.2. Application changes
Applications are constantly changing, and data synchronization has to be able to handle application changes, whether they are minor or major system upgrades, perhaps pilot projects, and other special cases where a non-standard application must be used. A new application may need different synchronization logic to the old application. Additional columns, new tables, and more elaborate logic may be added on.
The upgrade problem is complicated by the loosely-connected nature of RIA applications: an upgrade cannot happen instantaneously. For some period, there will inevitably be two versions of an application running against the same server. If you carry out pilot projects with small portions of the workforce, then multiple application versions may be the norm.
The ability to implement multiple sets of synchronization logic on a single server requires a separation of the logic from the underlying schema and also the ability to upgrade the schema of local data stores.
Getting synchronization logic to work in a testing environment, with a reliable high-speed network and a handful of users, is one thing. Making it work in a production environment - with many users, intermittent networks, no IT access to the applications and so on is something else. Events that are rare in small-scale, controlled environments become important in the field. Here are a few such events.
6.1. Interrupted synchronization
If network coverage is lost part-way during a synchronization, there are several concerns:
These are each challenging issues. What if changes to a two-table database are being downloaded and the connection is dropped? If a new contact is downloaded, but the company they work for is not, can the application handle that?
7. And After That...
The bookkeeping required to track changes, track the state of each client at the server, and maintain the state of synchronization, can cripple some synchronization systems. The bottom line is that a large-scale synchronization system involves many applications making changes to a single shared set of data, and scalability is going to be a challenge in such environments.
Whether it is separate threads to manage database connections and client connections; configurable timeouts to ensure that resources are used properly; a robust protocol to make sure that connections are kept alive if expensive operations are being carried out on device or in the server; or all the gear to ensure the integrity of the data is maintained no matter what happens in the synchronization environment, a production-quality data synchronization system is a complex and demanding piece of infrastructure.
This article just scrapes the surface of what is needed in a robust data synchronization system. As you develop mobile or rich internet applications, it becomes obvious that there are other features you may want to tap into. Perhaps some changes are more urgent than others are, and need to be synchronized with higher priority than the remainder. What about secure authentication at each point in the chain? Does your data need to be encrypted? And what about high-availability, or synchronizations initiated by the server rather than by the client? As loosely connected applications grow in scope and complexity, these and many other core capabilities are best built into a data synchronization layer.
About the author:
After writing SQL Anywhere documentation for a decade Tom Slee moved into product management in 2005, specializing in mobile database management and data synchronization. Before joining Sybase, he carried out research in theoretical chemistry at the Universities of Waterloo and Oxford. He is also the author of No One Makes You Shop at Wal-Mart: The Surprising Deceptions of Individual Choice (2006) which has been used in university sociology, philosophy, and economics courses.