SQL Server 2005 Part 4 - High Availability and Scalability Enhancements - Online Indexing, Fast Recovery, Database Snapshots, and Snapshot Isolation

We are continuing our series which covers new and enhanced functionality
implemented in SQL Server 2005 (based on its Beta 2 release), focusing on high
availability and scalability area. We have presented so far database
mirroring and failover clustering – now it is time for other features
falling into the same category. In this article, we will describe improvements
in online indexing and recovery procedures, as well as newly introduced
database snapshots and snapshot transaction isolation level:

online indexing – allows execution of index data definition
language statements in parallel with modifications and queries against
underlying tables or views. In previous versions of SQL Server, such operations
were not possible or limited at best. Creating, dropping, or rebuilding (which
effectively drops and recreates) an index (typically performed as part of
maintenance tasks geared towards optimizing query performance caused by index
fragmentation) were precluding (or restricting to SELECT statements only) any
type of access to indexed data (because of exclusive or shared locks that have
been placed on it). Such behavior resulted, for example, from running DBCC
DBREINDEX or dropping and recreating indexes. You could sometime improve the
situation by running DBCC INDEXDEFRAG, which held locks for a much shorter time
than the other two methods (although efficiency of this approach was limited
when dealing with higher levels of fragmentation). SQL Server 2005 eliminates
the need for exclusive locks completely by maintaining two copies of an index.
While one of them is being rebuilt, the other can be used for supporting
current activities (this one is subsequently dropped once the rebuild of the
first one is completed).
To rebuild an index, you can either execute ALTER
INDEX T-SQL statement with the REBUILD clause (with serves as a replacement for
DBCC DBREINDEX) or invoke the CREATE INDEX T-SQL statement with the DROP_EXISTING
clause. (You can also split the process into two separate steps – first
dropping and then recreating the index with DROP INDEX and CREATE INDEX
statements, respectively, but this approach is least efficient and therefore
not recommended). All of these activities can be performed on-line by adding the
WITH (ONLINE=ON) clause to each statement (the clause can be combined with
CREATE INDEX, DROP INDEX, ALTER INDEX REBUILD, ALTER INDEX DISABLE, ALTER TABLE
ADD CONSTRAINT, and ALTER TABLE DROP CONSTRAINT).
Note that online indexing has negative performance
implications (due to the need of maintaining two copies of indexes), hence it
is still advisable to perform it offline whenever possible. In addition, you
should first evaluate, by determining the degree of fragmentation, whether
there is a need for rebuilding an index or whether reorganizing the index
instead, will suffice. This can be accomplished with sys.dm_db_index_physical_stats
system function, which returns (among other values) the percentage of logical
fragmentation for a particular (or all, if desired) indexes on a selected table
or view. Index reorganization (recommended if fragmentation is no higher than
30%), which always performs as on-line an operation, sorts existing index pages
in place (by physically reordering leaf level pages to match the logical
sequence of leaf nodes) without dropping an index and creating a new one (and
functions as an equivalent of SQL Server 2000 DBCC INDEXDEFRAG). You should
resort to rebuilding an index if fragmentation percentage exceeds a 30%
threshold (as explained, whether this operation is performed off- or on-line
depends on whether the WITH ONLINE clause is applied to the associated T-SQL
statement).
early restore access – allows connecting to a database during
restore or recovery (invoked automatically whenever a database is brought
online, for example, as the result of restarting SQL Server) as soon as
committed entries in a transaction log are rolled forward. During the recovery
process, entries in the transaction log designating transactions committed
since the most recent checkpoint are reapplied to the database (during
so-called "redo" phase), which is followed by rollback of all
incomplete transactions (known as the "undo" phase). Both types of
actions are necessary in order to maintain database consistency. Prior to SQL
Server 2005, the database remained inaccessible for applications until the
whole procedure was completed. With the new recovery model (available at this
point only in SQL Server 2005 Enterprise Edition) this changes – the database
becomes available as soon as the first "redo" part (involving
replaying transactions committed after the most recent checkpoint) is finished.
This does not mean, however, that access to all pages is granted – in order to
prevent data consistency problems, data pages affected by uncommitted transactions
are locked for the duration of the second part (requests for access to these
pages need to wait until uncommited transactions are rolled back).
database snapshots – provides a read-only, "virtual"
copy of a database, representing its state at a particular time. This
functionality can be very useful in a number of scenarios, ranging from
simplifying creation of historical reports reflecting content of the database
at a specific moment in time to ensuring rapid recovery from risky system
upgrades or accidental user errors.
Snapshot is different from a traditional backup due
to the way it is implemented (which also affects its size). Initiation of a
database snapshot marks a point after which the first change to any data page
triggers creation of its copy, which retains its original content. Snapshot
consists of all such copies, in addition to pages that have not been modified
yet (including those that would be affected by transactions not committed yet
at the time of snapshot creation). Creation of duplicate pages is accomplished
using copy-on-write approach, where a copy of each database page, which is
supposed to be modified, is saved as part of the snapshot before modification
is applied. Since, initially, the snapshot refers to the same pages as its
source, space requirements as well as the amount of time and processing
resources needed to create it are minimal. However, following its creation,
first-time modification to original pages increase the amount of related I/O
operations and space occupied by it.
Snapshots work in different server configurations,
including database mirroring and failover clustering. As a matter of fact, as
we mentioned in our
article on database mirroring, snapshots can be used to create a read-only
copy of a mirror (the mirror can not be accessed directly since it is in
recovery mode as long as the principal is operational), which in turn provides the
ability to perform data reporting activities against it (to lower utilization
of the principal).
At the same time, however, snapshots have some
important limitations. Among the most apparent ones is their read-only status
and reliance on underlying database, which have a number of other implications
(for example, their properties, such as permissions or online/offline status
are inherited and can not be independently changed). Snapshots also cannot be
created for any of the system databases (model, master, and temp). It is not
possible to back them up and they should never be viewed as a substitute for
backup – if the source database fails, its snapshot is no longer accessible.
While their dependency on NTFS (because copied pages are stored in "sparse
files," which are NTFS-specific) is unlikely to constitute a major issue
in most implementations, it will prevent you from using RAW partitions for
database storage.
To create a database snapshot, use the standard
CREATE DATABASE T-SQL statement including the unique name of the snapshot,
logical name(s) of underlying database file(s), name of a sparse file where the
snapshot will be stored, and the AS SNAPSHOT OF clause followed by the database
name. (As mentioned before, you can create a snapshot of a mirror, in which
case you use its name instead). For example, to create a snapshot of the pubs
database, you could execute the following:
__ATLAS_CODE_BLOCK__0__
A list of existing snapshots appears in the
Database Snapshots folder of SQL Server Management Studio. Once snapshots are
created, you can take advantage of them by either reading their content (users
can access both the copy and the source, with the latter exhibiting the same
characteristics as any standard database) or by reverting to them, overwriting
effectively current data in the original database.
The procedure of reverting has several
prerequisites. First, you need to ensure that only a single snapshot for the
target database exists, which means that you need to delete all of its
remaining snapshots (to delete a snapshot, use the DROP DATABASE T-SQL
statement including the snapshot name). You also need to remove all full-text
catalogs from the database. After reverting is completed, the state of data
reflects data retained in the snapshot, which means that all database changes
following snapshot creation are lost. This includes entries in the transaction
log, since reverting also clears its content. Therefore, it is recommended to
perform full database backup immediately afterwards (no incremental or
differential backups are possible at that point). To revert to a database snapshot,
you need to run the RESTORE DATABASE T-SQL statement with the FROM
DATABASE_SNAPSHOT clause, followed by its name.
snapshot isolation – a new type of transaction isolation level.
Transactions can be viewed as a unit of work that satisfies four characteristics
of the ACID model:
Transaction isolation levels protect data from
consistency and integrity problems in a multi-user environment and impact
primarily locking mechanisms used by the database engine to manage simultaneous
requests for read access to the same rows or tables by multiple transactions.
Raising the isolation level increases the potential for blocking but decreases the
likelihood of negative concurrency side effects (such as dirty reads or lost
updates). Analogically, decreasing isolation level makes it possible for
multiple transactions to operate on the same data, but at the cost of possible
update or read anomalies.
SQL Server 2005 still supports levels available in
earlier versions of SQL Server, such as Serializable, Repeatable Read, Read
Committed, and Read Uncommitted (listed in the decreasing sequence of
concurrency capabilities), however with the snapshot isolation it introduces a
different approach to managing concurrent data access. Snapshot isolation does
not lock rows of data involved in the transaction, but instead it creates their
copies in tempdb database, assigning a version number to each. The copy is
maintained until either transaction modifying the data is committed (at which
point, it is deleted) or rolled back (resulting in restoring the original
value). Snapshot isolation level is intended primarily for frequently repeating
read operations on relatively dynamic data (which, with other isolation levels,
would likely cause contention issues) and for databases where long-running
transactions are common.
You can apply snapshot isolation on the session
level, using the SET TRANSACTION ISOLATION LEVEL SNAPSHOT T-SQL statement, as
long as the ALLOW_SNAPSHOT_ISOLATION database option has been turned ON (note
that this can be further modified for individual SELECT statements with table-level
locking hints). In addition, if the READ_COMMITTED_SNAPSHOT database option is
ON, then all transactions operating with read-committed isolation level also
use row versioning instead of locking.
- atomicity (stating that the transaction can either succeed in its
  entirety or fail, without leaving any changes)
- consistency (conveying the notion that each transaction leaves a
  database in consistent state)
- isolation (indicating that multiple transactions scheduled to
  take place in parallel execute independently)
- durability (ensuring that the effect of each committed
  transaction is recorded and stored in the database).

In our next article, we will discuss the remaining scalability and
high-availability features in SQL Server 2005 Beta 2.

»

See All Articles by ColumnistMarcin Policht

SQL Server 2005 Part 4 – High Availability and Scalability Enhancements – Online Indexing, Fast Recovery, Database Snapshots, and Snapshot Isolation

Marcin Policht

Company

Categories