SHARE

Partitioning in SQL Server – Part 2

Written By

Dec 27, 2012

8 minute read

Introduction
Understanding Partitioning Concepts
Partition Functions
Partition Schemes
Partition Tables
Partitioned Indexes
Partition-aligned Indexes
Storage-aligned Indexes
Conclusion
Resources

Introduction

When a table grows large or huge in size it becomes really difficult to load new data, remove old data, maintain indexes and queries involving this table, and runs extremely slow. In this situation, we can partition the table, which essentially breaks the table and its indexes into logically smaller chunks called partitions. Further, with a partitioned table, maintenance works, lock escalation settings and compression settings can be applied individually at each partition rather than on a single huge table. Not only that, Query Optimizer references only partitions needed to serve the query requests and eliminates the partitions that don’t contain data needed by the query during query optimization and execution, and also it uses parallelism to process multiple partitions in parallel. SQL Server 2008 introduced partition table parallelism for better performance and for better resource utilization (of modern multi-processors hardware). With SQL Server 2012, we are now allowed to even create up to 15K partitions on a single table.

In my last article of the series, I discussed what partitioning in SQL Server is, the different kinds of partitioning options available, why and when we should go for partitioning and all of the benefits partition table/index provide. In this article I am going to examine the different concepts of partitioning in SQL Server and provide a step-by-step guide on creating a partition table/index.

Understanding Partitioning Concepts

Now that we understand the scenarios where partitioning will yield benefits, we next need to understand certain concepts/components before we actually start partitioning a table. But even before that, we also need to first identify a column of the table on which we want the partition to be created because choosing a different partitioning column at a later stage will require you to re-create the table, re-load the data and re-create all of the indexes. Here are some of notes, which you can consider when determining the partitioning column:

You should strive to choose the column that is often used in filter conditions; this ensures query optimizer will consider only the partitions needed to serve the request, eliminatint the rest during query optimization and execution. Also this column should divide the data evenly and should help to logically archive the data.
The column, in consideration, must be a single column; if it’s not, then you must create a persisted computed column including all of the columns you want to consider and use this persisted computed column as a single partition column.
There are certain constraints on the data type of the column, which can be used as a partition column. This means, all data types are valid except text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined data types. Though it is very common to partition data on a date or time data type column, it is not a rule.
For Partitioned Indexes

When partitioning a clustered index, the clustering key must contain the partitioning column. When partitioning a non-unique clustered index and the partitioning column is not explicitly specified in the clustering key, SQL Server adds the partitioning column by default to the list of clustered index keys. If the clustered index is unique, you must explicitly specify that the clustered index key contains the partitioning column. More details can be found here.
When partitioning a unique non-clustered index, the index key must contain the partitioning column. When partitioning a non-unique, non-clustered index, SQL Server adds the partitioning column by default as a non-key (included) column of the index, to make sure the index is aligned with the base table. SQL Server does not add the partitioning column to the index if it is already present in the index. More details can be found here.

Partition Functions

Partition function defines the logic to be used to distribute data across partitions. With partition function we define range boundaries for data distribution or we specify the total number of partitions to be created. Below is the general syntax for creating partition function:

CREATE PARTITION FUNCTION partition_function_name ( input_parameter_type   )
AS RANGE [ LEFT | RIGHT ]   
FOR VALUES ( [ boundary_value [ ,...n ] ] ) 
[ ; ]

When creating partition function we define the RANGE (LEFT or RIGHT), which actually dictates where the boundary value should fall.

LEFT: It can be considered as “<=” and it means the boundary value falls to the left side of the partition. This is the default setting if not specified.

RIGHT: It can be considered as “<” and it means the boundary value falls to the right side of the partition.

Please note, any rows whose partitioning column has NULL values are placed in the left-most partition, unless NULL is specified as a boundary value and RIGHT is indicated. In that case, the left-most partition is an empty partition, and NULL values are placed in the following partition.

If you want to know the partitioning number into which a particular partitioning column value will be mapped, you can use $PARTITION function. It gives the partition number for any valid value irrespective of whether the value already exists in the partitioned table or indexes that uses the partition function.

The number of partitions created will be equal to N + 1 where N is the total number of boundary values specified in the partition function. SQL Server 2012 allows us to create 15000 partitions whereas in earlier versions we could only create maximum 1000 partitions; hence we can specify 14999 boundary values in SQL Server 2012 whereas in earlier version we could only specify up to 999 boundary values.

Partition Schemes

The Partition Scheme maps the partitions of the partitioned table or index to particular file groups (basically the physical locations). As I said above, N number of boundary values creates N+1 partitions and hence you also need to specify N+1 file groups with partition scheme definition, one for each partition. You can specify the same file groups (using the ALL keyword or specifying file groups multiple times) for all the partitions or you can have different file groups, one for each partition or combination of both. Below is the general syntax for creating a partition function:

CREATE PARTITION SCHEME partition_scheme_name
AS PARTITION partition_function_name
[ ALL ]   TO ( { file_group_name | [ PRIMARY ] } [ ,...n ] )
[ ; ]

Before creating a partition scheme, the partition function and file groups must exist. A partition scheme must refer to one and only one partition function whereas partition functions can be referenced by more than one partition scheme. Please note, though we can create partitions on a PRIMARY file group, it’s recommended to have them on different file groups. This will simplify your recovery process in case any disaster happens. Also, using multiple file groups (which have files spread across multiple disks) helps Query Optimizer to parallel process the request.

Partition Tables

In order to create a partitioned table, you need to specify the partition scheme name with the ON clause of the CREATE TABLE command along with the name of the column from the table that will be used as the partitioning column. If you already have a table, then you need to drop and re-create the table’s clustered index using the partition scheme.

We can even create a partitioned index though an index does not have to have the same partition function or partition scheme. But using the same partition function and partition scheme as its table, we say the index is aligned; more on partition aligned index and storage aligned index later in this article.

Once you are done with identifying the partitioning column, you need to follow these steps to create the partitioned table:

Create a partition function that defines or sets the correct boundaries for data distribution.
Create a partition scheme that ties the boundaries of the partition function with the file groups.
Create the table or alter the clustered index of the table to use the partition scheme created in the above step.

It’s not mandatory to a create partition function or partition scheme to create a partition table, you can use the existing partition function or partition scheme if you have already created one. Further, a partition function can be referenced by more than one partition scheme, likewise a partition scheme can be referenced by more than one table for partitioning as shown in the image below.

Relationship between the partition function, partition scheme and partition table

As shown above, as there is a relationship between the partition function, partition scheme and partition table, and hence a partition scheme cannot be dropped if it is referenced by any table; likewise a partition function cannot be dropped if it is being referenced by any partition scheme.

Partitioned Indexes

Not only you can create partitions on a table but you can also create partitioned indexes on the table. The indexes on the partitioned table might fall in these categories:

Indexes not aligned to partitioned table
Partition-aligned indexes
Storage-aligned index

Partition-aligned Indexes

In addition to partitioning the data set of a table, indexes of the table can also be partitioned either using the same partition function (it doesn’t need to be exactly the same partition function; another function with the same definition – like same number, type and values of boundaries values – works the same way) as the base table or with different a partition function. Partitioning indexes using the same partition function is recommended and then it is said to be a partition aligned index. In order to create a partition index, the table does have to be partitioned. Also, in order to enable partition switching, all indexes on the table must be aligned. The benefits of creating aligned indexes are:

When a partition is switched out or in, all the related data and indexes are moved effectively by SQL Server.
Helps Query Optimizer to eliminate inapplicable partitions/index during query optimizer and execution.

Storage-aligned Indexes

An index is said to be storage aligned if the index uses not only the same partition function but also the same partition scheme. Having the index storage aligned (table and index data on the same files or file groups) helps SQL Server to resolve conflicts in data access as all the required table and index data would be collocated on the same disk.

There are some special guidelines for creating a partitioned index here.

Conclusion

In this article of the series, I talked about different partitioning concepts like partition function, partition scheme, choosing partitioning column and creating a partition on a table or index. We also learned about partitioned index and how an index is aligned and storage aligned with the base table partitioning. In the next article of the series, I am going to talk in detail about Partitioned Index Views and guide you step by step in creating a partitioned table using T-SQL commands.

Resources

Using Partitioned Views

Partitioned Tables and Indexes

Create Partitioned Tables and Indexes

See all articles by Arshad Ali

Arshad Ali

Arshad Ali works with Microsoft India R&D Pvt Ltd. He has 8+ years of experience, mostly on Microsoft Technologies. Most recently, as a SQL Developer and BI Developer he has been working on a Data Warehousing project. Arshad is an MCSD, MCITP: Business Intelligence, MCITP: Database Developer 2008 and MCITP: Database Administrator 2008 certified and has presented at several technical events including SQL-School. On an educational front, he has an MCA (Master in Computer Applications) and MBA-IT. Disclaimer : I work for Microsoft and help people and businesses make better use of technology to realize their full potential. The opinions mentioned herein are solely mine and do not reflect those of my current employer or previous employers.