table and index partitioning – intended to improve performance of
operations performed on large tables. The basic concept is straightforward and
involves splitting a table into several units (called partitions), which can be
accessed independently of each other, limiting impact of I/O intensive
activities performed on the table’s data (queries, data loads, backups and
restores, maintenance tasks – such as index rebuilds and defragmentations, as
well as operations that would result in lock escalation to the table level). The
most common method of splitting data is horizontal partitioning, in which rows
of a table matching mutually exclusive criteria (such as, range of dates or
letters in alphabet, for datetime and character data, respectively) are placed
in designated partitions.
SQL Server 2000 implementation of horizontal
partitioning of tables, known as Updateable Distributed Partitioned Views
(supporting Data Modification statements, unlike its SQL Server 7.0
predecessor, capable of querying data only), was based on creating identically
structured tables across separate databases. Separation of data loaded into
tables was based on a CHECK constraint associated with one of their columns.
The same partitioned view was then defined on each of the servers hosting
partitioned data (this required creating linked server entries for each
participant). Once the view was created, users could access underlying data
transparently, without being required to know its physical location. Updateable
Distributed Partitioned Views were a subject of several limitations (outlined
in more details in the SQL Server 2000 Books Online), however the most relevant
one was resulting from the fact that partitions had to be spread out across
multiple SQL Server instances.
SQL Server 2005 extended the horizontal
partitioning concept, by allowing placing partitions of the same table within a
single database. In order to fully realize performance and scalability
advantages, such partitions should be assigned to separate filegroups based on
appropriately defined criteria, which are established by partitioning functions
and associated partitioning schemes. The partition function determines the distribution
of rows of a table or index across partitions and is applied by comparing its
formula against values in a designated column (called the partitioning column).
While the partitioning column will typically contain numeric or date/time
values, others are also permitted – with exception of timestamp, text, image,
XML, varchar(max), varbinary(max), CLR user-defined, and alias data types. A partitioning
scheme assigns each of the partitions to a specific filegroup. Note that in
order to maximize performance and scalability benefits it is
important to both properly select the partitioning column and define the
partitioning function associated with it. The choice should depend on data
distribution and the type of activities that are typically performed against
the target table. For example, if data is used to produce monthly reports, then
partitioning based on the date column using monthly intervals as partition
boundaries would be a very likely candidate.
The process of partitioning involves four main
steps. The first, preliminary step consists of creating filegroups and their
corresponding files. During the second step, you define a partition function
using the CREATE PARTITION FUNCTION T-SQL statement, with ranges of values (set
by the FOR VALUES clause) that determine the beginning and end of each
partition (and subsequently, their number). For example, the following
statement:
CREATE PARTITION FUNCTION pfIncome (money)
AS RANGE LEFT FOR VALUES ('$30,000', '$90,000')
is intended to create three
partitions – the first one, containing values less then or equal to $30,000
(inclusive, based on the LEFT element of the AS RANGE clause), the second one
between $30,000 (exclusive) and $90,000 (inclusive), and the third one encompassing
values of $90,000 and higher.
In the third step, you create a partition scheme using the CREATE PARTITION
SCHEME T-SQL statement, which associates partition function to filegroups,
determining physical layout of data. The association between partitions and
filegroups is typically one to one, although other arrangements are also possible
(including specifying extra unassigned partitions for future growth). Following
our earlier example, we could use the following statement to create a
partitioning scheme:
CREATE PARTITION SCHEME psIncome
AS PARTITION pfIncome
TO ('fgIncome1', 'fgIncome2', 'fgIncome3')
This would result in rows within
each range being redirected into their respective filegroups. By specifying a different
sequence of (already existing) filegroups – e.g. by specifying TO ('fgIncome1', 'fgIncome1', 'fgIncome2')
– you could alter the mapping, allocating the first two partitions to the first
filegroup.
Once partitions are associated with their respective filegroups, you can
proceed with the fourth step, in which you create a partitioned table or
index using CREATE TABLE or CREATE INDEX statements. The ON clause within
these statements identifies the target partition scheme and partitioning
column, whose values are compared against the definition of the
partitioning function.
For example, the following
statement would result in the creation of a table with the second column
designated as the partitioning one:
CREATE TABLE tbIncome (cSSN char(9), mIncome money)
ON psIncome(mIncome)
Once you start performing operations on the target
table (such as data additions, modifications, or queries), the database engine
will transparently redirect them to appropriate partitions. For more
information on the procedure required to set up or modify data partitioning
(including converting existing non-partitioned tables to partitioned ones and
vice versa), refer to the relevant topics in the Books Online. You will also
find there, a more elaborate partitioning example, illustrating this process
with the AdventureWorks sample database included with the installation of SQL
Server 2005.
backup media set mirroring – increases the reliability of backups
by performing a concurrent dump of a database to multiple (up to four) backup
devices, without significant impact on performance. This greatly increases the
chances for avoiding backup failures due to a faulty tape media or device.
Note, however, that for mirrored media sets, equivalent backup devices are
needed (in the case of tape drives, it is recommended to use the same hardware
models from the same manufacturer). In order to perform mirrored backup of a
database, you need to add the MIRROR TO clause to the BACKUP DATABASE T-SQL
statement, followed by the paths of the DISK or TAPE backup devices. In case of
a backup media error detected during the restore, you can replace the faulty
one with its mirror.
Another method increasing the reliability of
backups involves checksum generation when running BACKUP, which can be enabled
by including the CHECKSUM clause (backup is also capable of verifying existing
checksums and detecting torn pages, assuming that the PAGE_VERIFY CHECKSUM and
TORN_PAGE_DETECTION database options are enabled). With this clause present, the
generated checksum is recorded on the backup media that can be subsequently
used to verify the restore operation. Keep in mind, though, that this feature
introduces negative performance implications (and hence it is disabled by
default), so you should evaluate its impact before implementing it. You can
confirm successful completion of backups and media set readability (including
checksum, if applicable) by executing RESTORE VERIFYONLY against a media or
backup set. This statement, enhanced in SQL Server 2005, emulates the restore
procedure without actually performing it. Furthermore, providing that the
checksum information has been recorded during backup, you can take advantage of
it by using the CHECKSUM clause when running RESTORE. However, a potential
problem with this approach is the fact that even a single error encountered
during restore will result in its failure. If such behavior is not desired, you
can prevent it by applying the CONTINUE_AFTER_ERROR option, allowing the
restore to complete (while this is likely to leave a database in inconsistent
state, you can follow it with a manual repair).
Another backup reliability enhancement introduced
in SQL Server 2005 is integration of full-text catalogs into your database
backups. Full-text catalogs contain full-text indexes, which are used to speed
up keyword-based queries of text data stored in SQL Server databases in char, varchar,
nvarchar, and formatted binary columns. While full-text search catalogs are
stored outside of SQL Server (as part of the file system), they are managed by one
of SQL Server 2005 components (Microsoft Full-Text for SQL Server service) and
administered with SQL Server utilities. Integrating full-text search catalogs
into SQL Server backups and restores resolves the synchronization issues
present in SQL Server 2000, where each of these procedures had to be performed
separately. You can also include full-text catalogs when transferring databases
by combining DETACH and ATTACH operations (this was not the case in SQL Server
2000).