SQL Server 2005 Part 4 – High Availability and Scalability Enhancements – Online Indexing, Fast Recovery, Database Snapshots, and Snapshot Isolation

We are continuing our series which covers new and enhanced functionality
implemented in SQL Server 2005 (based on its Beta 2 release), focusing on high
availability and scalability area. We have presented so far database
mirroring and failover clustering
– now it is time for other features
falling into the same category. In this article, we will describe improvements
in online indexing and recovery procedures, as well as newly introduced
database snapshots and snapshot transaction isolation level:

  • online indexing – allows execution of index data definition
    language statements in parallel with modifications and queries against
    underlying tables or views. In previous versions of SQL Server, such operations
    were not possible or limited at best. Creating, dropping, or rebuilding (which
    effectively drops and recreates) an index (typically performed as part of
    maintenance tasks geared towards optimizing query performance caused by index
    fragmentation) were precluding (or restricting to SELECT statements only) any
    type of access to indexed data (because of exclusive or shared locks that have
    been placed on it). Such behavior resulted, for example, from running DBCC
    DBREINDEX or dropping and recreating indexes. You could sometime improve the
    situation by running DBCC INDEXDEFRAG, which held locks for a much shorter time
    than the other two methods (although efficiency of this approach was limited
    when dealing with higher levels of fragmentation). SQL Server 2005 eliminates
    the need for exclusive locks completely by maintaining two copies of an index.
    While one of them is being rebuilt, the other can be used for supporting
    current activities (this one is subsequently dropped once the rebuild of the
    first one is completed).

    To rebuild an index, you can either execute ALTER
    INDEX T-SQL statement with the REBUILD clause (with serves as a replacement for
    DBCC DBREINDEX) or invoke the CREATE INDEX T-SQL statement with the DROP_EXISTING
    clause. (You can also split the process into two separate steps – first
    dropping and then recreating the index with DROP INDEX and CREATE INDEX
    statements, respectively, but this approach is least efficient and therefore
    not recommended). All of these activities can be performed on-line by adding the
    WITH (ONLINE=ON) clause to each statement (the clause can be combined with
    CREATE INDEX, DROP INDEX, ALTER INDEX REBUILD, ALTER INDEX DISABLE, ALTER TABLE
    ADD CONSTRAINT, and ALTER TABLE DROP CONSTRAINT).

    Note that online indexing has negative performance
    implications (due to the need of maintaining two copies of indexes), hence it
    is still advisable to perform it offline whenever possible. In addition, you
    should first evaluate, by determining the degree of fragmentation, whether
    there is a need for rebuilding an index or whether reorganizing the index
    instead, will suffice. This can be accomplished with sys.dm_db_index_physical_stats
    system function, which returns (among other values) the percentage of logical
    fragmentation for a particular (or all, if desired) indexes on a selected table
    or view. Index reorganization (recommended if fragmentation is no higher than
    30%), which always performs as on-line an operation, sorts existing index pages
    in place (by physically reordering leaf level pages to match the logical
    sequence of leaf nodes) without dropping an index and creating a new one (and
    functions as an equivalent of SQL Server 2000 DBCC INDEXDEFRAG). You should
    resort to rebuilding an index if fragmentation percentage exceeds a 30%
    threshold (as explained, whether this operation is performed off- or on-line
    depends on whether the WITH ONLINE clause is applied to the associated T-SQL
    statement).

  • early restore access – allows connecting to a database during
    restore or recovery (invoked automatically whenever a database is brought
    online, for example, as the result of restarting SQL Server) as soon as
    committed entries in a transaction log are rolled forward. During the recovery
    process, entries in the transaction log designating transactions committed
    since the most recent checkpoint are reapplied to the database (during
    so-called "redo" phase), which is followed by rollback of all
    incomplete transactions (known as the "undo" phase). Both types of
    actions are necessary in order to maintain database consistency. Prior to SQL
    Server 2005, the database remained inaccessible for applications until the
    whole procedure was completed. With the new recovery model (available at this
    point only in SQL Server 2005 Enterprise Edition) this changes – the database
    becomes available as soon as the first "redo" part (involving
    replaying transactions committed after the most recent checkpoint) is finished.
    This does not mean, however, that access to all pages is granted – in order to
    prevent data consistency problems, data pages affected by uncommitted transactions
    are locked for the duration of the second part (requests for access to these
    pages need to wait until uncommited transactions are rolled back).

  • database snapshots – provides a read-only, "virtual"
    copy of a database, representing its state at a particular time. This
    functionality can be very useful in a number of scenarios, ranging from
    simplifying creation of historical reports reflecting content of the database
    at a specific moment in time to ensuring rapid recovery from risky system
    upgrades or accidental user errors.

    Snapshot is different from a traditional backup due
    to the way it is implemented (which also affects its size). Initiation of a
    database snapshot marks a point after which the first change to any data page
    triggers creation of its copy, which retains its original content. Snapshot
    consists of all such copies, in addition to pages that have not been modified
    yet (including those that would be affected by transactions not committed yet
    at the time of snapshot creation). Creation of duplicate pages is accomplished
    using copy-on-write approach, where a copy of each database page, which is
    supposed to be modified, is saved as part of the snapshot before modification
    is applied. Since, initially, the snapshot refers to the same pages as its
    source, space requirements as well as the amount of time and processing
    resources needed to create it are minimal. However, following its creation,
    first-time modification to original pages increase the amount of related I/O
    operations and space occupied by it.

    Snapshots work in different server configurations,
    including database mirroring and failover clustering. As a matter of fact, as
    we mentioned in our
    article on database mirroring
    , snapshots can be used to create a read-only
    copy of a mirror (the mirror can not be accessed directly since it is in
    recovery mode as long as the principal is operational), which in turn provides the
    ability to perform data reporting activities against it (to lower utilization
    of the principal).

    At the same time, however, snapshots have some
    important limitations. Among the most apparent ones is their read-only status
    and reliance on underlying database, which have a number of other implications
    (for example, their properties, such as permissions or online/offline status
    are inherited and can not be independently changed). Snapshots also cannot be
    created for any of the system databases (model, master, and temp). It is not
    possible to back them up and they should never be viewed as a substitute for
    backup – if the source database fails, its snapshot is no longer accessible.
    While their dependency on NTFS (because copied pages are stored in "sparse
    files," which are NTFS-specific) is unlikely to constitute a major issue
    in most implementations, it will prevent you from using RAW partitions for
    database storage.

    To create a database snapshot, use the standard
    CREATE DATABASE T-SQL statement including the unique name of the snapshot,
    logical name(s) of underlying database file(s), name of a sparse file where the
    snapshot will be stored, and the AS SNAPSHOT OF clause followed by the database
    name. (As mentioned before, you can create a snapshot of a mirror, in which
    case you use its name instead). For example, to create a snapshot of the pubs
    database, you could execute the following:


    CREATE DATABASE [pubs_ss120604] ON
    (NAME = pubs,
    FILENAME = ‘C:Program FilesMicrosoft SQL ServerMSSQLdatapubs_ss120604.ss’)
    AS SNAPSHOT OF pubs;
    GO

    A list of existing snapshots appears in the
    Database Snapshots folder of SQL Server Management Studio. Once snapshots are
    created, you can take advantage of them by either reading their content (users
    can access both the copy and the source, with the latter exhibiting the same
    characteristics as any standard database) or by reverting to them, overwriting
    effectively current data in the original database.

    The procedure of reverting has several
    prerequisites. First, you need to ensure that only a single snapshot for the
    target database exists, which means that you need to delete all of its
    remaining snapshots (to delete a snapshot, use the DROP DATABASE T-SQL
    statement including the snapshot name). You also need to remove all full-text
    catalogs from the database. After reverting is completed, the state of data
    reflects data retained in the snapshot, which means that all database changes
    following snapshot creation are lost. This includes entries in the transaction
    log, since reverting also clears its content. Therefore, it is recommended to
    perform full database backup immediately afterwards (no incremental or
    differential backups are possible at that point). To revert to a database snapshot,
    you need to run the RESTORE DATABASE T-SQL statement with the FROM
    DATABASE_SNAPSHOT clause, followed by its name.

  • snapshot isolation – a new type of transaction isolation level.
    Transactions can be viewed as a unit of work that satisfies four characteristics
    of the ACID model:

    • atomicity (stating that the transaction can either succeed in its
      entirety or fail, without leaving any changes)

    • consistency (conveying the notion that each transaction leaves a
      database in consistent state)

    • isolation (indicating that multiple transactions scheduled to
      take place in parallel execute independently)

    • durability (ensuring that the effect of each committed
      transaction is recorded and stored in the database).

    Transaction isolation levels protect data from
    consistency and integrity problems in a multi-user environment and impact
    primarily locking mechanisms used by the database engine to manage simultaneous
    requests for read access to the same rows or tables by multiple transactions.
    Raising the isolation level increases the potential for blocking but decreases the
    likelihood of negative concurrency side effects (such as dirty reads or lost
    updates). Analogically, decreasing isolation level makes it possible for
    multiple transactions to operate on the same data, but at the cost of possible
    update or read anomalies.

    SQL Server 2005 still supports levels available in
    earlier versions of SQL Server, such as Serializable, Repeatable Read, Read
    Committed, and Read Uncommitted (listed in the decreasing sequence of
    concurrency capabilities), however with the snapshot isolation it introduces a
    different approach to managing concurrent data access. Snapshot isolation does
    not lock rows of data involved in the transaction, but instead it creates their
    copies in tempdb database, assigning a version number to each. The copy is
    maintained until either transaction modifying the data is committed (at which
    point, it is deleted) or rolled back (resulting in restoring the original
    value). Snapshot isolation level is intended primarily for frequently repeating
    read operations on relatively dynamic data (which, with other isolation levels,
    would likely cause contention issues) and for databases where long-running
    transactions are common.

    You can apply snapshot isolation on the session
    level, using the SET TRANSACTION ISOLATION LEVEL SNAPSHOT T-SQL statement, as
    long as the ALLOW_SNAPSHOT_ISOLATION database option has been turned ON (note
    that this can be further modified for individual SELECT statements with table-level
    locking hints). In addition, if the READ_COMMITTED_SNAPSHOT database option is
    ON, then all transactions operating with read-committed isolation level also
    use row versioning instead of locking.

In our next article, we will discuss the remaining scalability and
high-availability features in SQL Server 2005 Beta 2.

»


See All Articles by Columnist
Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles