Synopsis:
The latest article in our series discusses improvements in the transactional support implemented in SQL Server 2005 Integration Services. The more granular scope of transactions, which now can be assigned to arbitrarily designated Control Flow tasks and containers, makes it is possible to limit their impact or execute several of them in parallel. However, even with this improved functionality, you need to keep the blocking issues they might potentially introduce in mind.
Transactional functionality is essential in database operations, in order to satisfy atomicity, consistency, isolation and durability requirements (which are frequently referred to collectively using the acronym ACID). These terms describe features inherent to data processing, which ensure that each modification has a clearly defined outcome, resulting in either success or failure (atomicity), thus preventing potential corruption of data (consistency), executes independently from other changes (isolation), and, once completed, leaves underlying data in the same state until another transaction takes place (durability). Complying with these rules is especially important when dealing with multiple, interdependent changes. The scope of these changes can vary, ranging from a few database objects within the same database or server instance, to any number of distributed systems, as is frequently the case in online transaction processing scenarios. Incorporating several distinct activities into the same transaction guarantees that if any of them fails, all of the remaining ones are rolled back, returning the system to its original state (in other words, all of them have to either succeed or fail – no other outcome is possible). One common example of applying this principle involves asynchronous environments where tasks are initiated via message queuing. By including task processing into the scope of transactions, you can ensure that messages will remain in the queue until their content is properly handled.
The same concepts are relevant within the context of SQL Server 2005 Integration Services. Transactions are typically employed in order to complete a sequence of tasks as a single unit of work that either succeeds or fails as a whole. Some of the actions which are executed as part of a task might not be transactional by nature (and therefore their rollback is simply not possible), but adding them into the scope of a transaction will result in having their outcome affecting other tasks, with full transactional capabilities.
Even though transactions were supported in Data Transformation Services (predecessor to SQL Server 2005 Integration Services), their functionality was somehow limited (for a more detailed review of their features, refer to one of our earlier DatabaseJournal articles). Among the most relevant improvements introduced in SSIS is the more granular scope of transactions, which can now be assigned to arbitrarily designated Control Flow tasks and containers. While the option to configure transactional support on the package level still exists (as was the case with Data Transformation Services), the main difference is the ability to have concurrent transactions associated with “sibling” containers, executing in parallel (DTS packages were restricted to one active transaction only). Transactional behavior is controlled by assigning one of three possible values to the TransactionOption property of tasks, containers, or a package (this is done from Transactions section of their respective Properties window within the Business Intelligence Development Studio):
- Required – results either in initiating a new transaction when the scope of task or container is entered by the Control Flow processing – if one has not yet been started by a parent container – or in joining an already existing transaction otherwise. This means that, regardless of circumstances, the content of a task or container will always execute as part of a transaction. Such configuration is useful if you want to ensure that a specific task will be rolled back in case of its failure or you have a group of several tasks that should be enclosed within a single transaction (to accomplish this, place them in a single container and assign Required value to its TransactionOption property).
- Supported – causes a task or container to join a transaction, providing that one has been already initiated by its parent. If this is not the case, a new transaction is not started. This setting should be assigned to a child container or task if you want them to join a transaction initiated by the package or their parent container configured with a Required value of the TransactionOption property (as in the scenario described in the previous bullet point).
- NotSupported – forces a container to execute without transactional support, even if a transaction has been started by its parent container. Note that setting this value on a parent container (including the package) does not prevent its children (tasks or containers) from launching their own transactions.
In addition, the Transactions section in the Properties window of a task, container and package includes the IsolationLevel property. As the name indicates, its role is to set the transaction isolation level, which determines the locking mechanism used by SELECT statements for the duration of task or container execution (in other words, a protection mechanism used to isolate transaction activities from other processes accessing the same data), and can have one of the following values:
- ReadUncommitted – allows viewing uncommitted changes caused by other processes (running within their own transactions) executing in parallel with the current one.
- ReadCommitted – this setting results in a situation where changes applied by other processes (executing within their own transactions) in parallel with the current one are not visible until these transactions commit.
- RepeatableRead – ensures that data being read by the current transaction stays consistent until this transaction commits (by preventing changes to it attempted by other transactions).
- Chaos – like ReadUncommitted, permits viewing uncommitted changes, but without holding update locks until the end of transaction.
- Serializable – the most restrictive isolation level, ensures complete independence of the current transaction (highest locking level). This is accomplished by placing an update lock on the data to which changes are applied, effectively blocking all other transactions from accessing it as well as preventing reads against data modified (but not yet committed) by them.
- Snapshot – introduced in SQL Server 2005, it is based on the concept of taking a “snapshot” of target objects affected by transactional changes at the designated point in time (when a transaction starts). Their original status and all subsequent changes to them are tracked using versioned rows in tempdb database (which might have a negative impact on performance and size of the tempdb database, if modifications are frequent). This provides a consistent view of data that existed when the transaction started, which remains available until its completion. The main benefit of this approach is providing the ability to run multiple, parallel queries without placing exclusive or sharing locks, which addresses situations where long running transactions interfere with the need for concurrent read access to frequently updated data.
The less restrictive options (read uncommitted and chaos) improve concurrency (the ability to execute multiple transactions simultaneously) but increase the possibility of inconsistent reads (such as non-repeatable reads, dirty reads, and phantom reads), while the more restrictive options (read committed, repeatable read, or serializable) have the opposite effect. The snapshot isolation features superior concurrency without negative impact on inconsistent reads; however, its benefits need to be considered in the context of its implications on performance and tempdb size increase.
As with SQL Server 2000 Data Transformation Services, SSIS transactional features we described require that Distributed Transaction Coordinator service is running on the system where the packages are executed. This service is responsible for coordinating transactions that span multiple resource managers, such as databases and message queues. Since by default, the service is configured with manual startup type, you should consider changing it to automatic (and make sure it is started prior to launching your package).
Note that, in general, you should limit the use of transactions to an absolute minimum, since they affect concurrency through a /resource locking mechanism (even Snapshot isolation benefits only activities that query the data being modified – simultaneous update attempts still result in blocking). Strive to include only tasks that require it and fully support rollback, in a transaction. Keep in mind that certain T-SQL statements are not allowed in transactions (for their listing, refer to SQL Server 2005 Books Online). Be aware of blocking issues that might force you to divide longer transactions into smaller execution units. In our next article, we will provide examples demonstrating the use of transactions in SSIS packages and introduce the concept of checkpoints, which constitute an alternative method of controlling package execution.