Introduction
With every new release of SQL Server, partitioning has reached new heights of improvement. For example, though we could create partitioned views for better manageability and scalability since SQL Server 7.0, SQL Server 2005 started with native support for Table Partitioning (more on this later in this article). SQL Server 2008 introduced partition table parallelism for better performance and for better resource utilization (of modern multi-processors hardware). With SQL Server 2012, we are now allowed to even create up to a 15K partition on a single table.
In this series of articles, I will discuss in detail the Why, What, When and How of Partitioning in SQL Server.
The Problems We are Facing and Why We Need Partitioning…
First of all, before we start to delve into the details of Partitioning in SQL Server, we need to first understand why there is need to partition a table or create a partitioned view and what problems we might face if we keep all of the data in a single large table.
There are several reasons why we need to use Partitioning in SQL Server; some of them are listed below:
- Management of a single large table becomes a pain for a DBA if the data in the table continues to grow. For example, the necessity for a longer maintenance window during backup/restore, longer downtime when creating index on a large non-partitioned table etc.
- On a large table, only a subset of data will be volatile/read-write and rest, may be up to 90% or even more, will be read-only data. With varying access patterns you need more indexes for read-only data whereas you need fewer indexes for read-write data.
- Creating and maintaining (Rebuild/Reorganize) indexes on a single large table takes significantly longer time and even increases the downtime window.
- Backup and restore takes significantly longer as it needs to back or restore the whole data every time.
- Frequent lock escalation issue at table level.
Understanding Partitioning in SQL Server
Partitioning in SQL Server is not a new concept and has improved with every new release of SQL Server. Partitioning is the process of dividing a single large table into multiple logical chunks/partitions in such way that each partition can be managed separately without having much overall impact on the availability of the table. Partitioning improves the manageability and availability of table as well as the performance of the queries running against this partitioned table.
For example, as you can see below, the first row represents a single large table, which contains sales data for 2008, 2009, 2010, 2011 and 2012 whereas the second row represents a table with 5 partitions, each partition contains sales data for each year (even if you think a yearly partition is too large to manage, you can partition it on monthly basis, the choice is yours based on your requirements).
SQL Server supports two types of partitioning:
- Partitioned Views – Partition view can be created by UNION’ing tables with similar structure either from the same database or from different databases to horizontally partitioned data and it appears as a single table to its end-users. For example, for the above example we can create 5 different tables with the same structure where each table will contain data for each year (CHECK constraint must be put to ensure one table contains data for one and only one year and to get the benefit of table/partition elimination during query optimizer); then we can create a view by UNION’ing them together so that for end users it appears that data is coming from a single table.
If all the member tables are from the same instance it is called Local Partitioned View whereas if member tables are from different instances, it is called Distributed Partitioned View. As a Partition view might contain tables from different databases/instances, it is also called scale-out partitioning; more about Partitioned View can be found here.
- Partition Tables and Indexes – Partition tables and Indexes were first introduced in SQL Server 2005 and enhanced further in SQL Server 2008 and SQL Server 2012. Partition table/index is an enterprise edition feature and natively supported by database engine. When we create partition table/index, data is horizontally divided into units (called partitions) that can be spread across more than one file group in a single database. As partitions of a table don’t cross the boundary of the database, it is also called scale-up partitioning.
Partitioning table/index makes the large table/index more manageable or scalable as each partition can be managed separately. For example, we can create indexes; manage compression settings on each partition separately without having much impact on the whole table.
Though there are no strict rules that dictate when a table needs to be partitioned, but when a table grows big enough in size such that manageability, ensuring availability becomes a challenge for you or when your users report slow query performance against a large single table then you need to think of partitioning the table. Ideally, when your single table grows beyond 50 GB, you should consider partitioning it but it all depends on your requirements and environment.
Benefits of Partitioning in SQL Server
Even though SQL Server natively supports table/index partitioning and manage partitions internally you should not partition all the tables as it has little overhead in managing them at the DBA level as well. You should consider partitioning a large table/index that is becoming difficult to manage, doesn’t scale well and queries against this table are having poor performance. There are many benefits of using partitioning table/index and some of them are discussed below:
- Manageability– Manageability of partition table/index became easier as you can rebuild/re-organize indexes of each partition separately. You can manage each partition separately; you can take a back-up of only the file-groups that contain partitions having volatile data etc.
- Query Performance– The query optimizer uses techniques to optimize and improve the query performance. For example,
- Partition elimination – Partition Elimination is a technique used by query optimizer to not consider partitions that don’t contain data requested by the query. For example, if a query requests data for only the years 2010 and 2011, in that case only two partitions will be considered during query optimization and execution unlike a single large table where query optimizer will consider the whole dataset; the other partitions (2008, 2009 and 2012) will be simply ignored.
- Parallel Processing – Query Optimizer uses a technique to process each partition in parallel or even multiple CPU cores can work together to work on a single partition. With this, the query optimizer tries to utilize modern hardware resources efficiently. For example, if a query requests data for only the years 2010 and 2011, in that case only two partitions will be considered during query optimization and suppose if you have 8 cores machine, all 8 cores can work together to produce the result from the two identified partitions.
- Indexes– You can have different settings (FILLFACTOR) or different numbers of indexes for each partition of a table. For example, the most recent year partition will have volatile data and will be both read and write intensive data and used by OLTP applications and hence you should have the minimum number of indexes, whereas older partitions will have mostly read only data and be used by Analytical applications and hence you can create more indexes to make your analytical queries run faster.
- Compression– Compression is new feature introduced with SQL Server 2008. It minimizes the need for storage space at the cost of additional CPU cycles whenever data is read or written. Again, the most recent year partition will have volatile data and be accessed frequently so ideally you should not compress it, whereas the older partitions will not be accessed frequently and hence you can compress them to minimize the storage space requirement.
- Minimized time for Backup/Restore– For a large table, normally only the latest few partitions will be volatile and hence you can take a regular backup of the file group (read-write) that contains this volatile data whereas you can take occasional backups of the file group (read-only) that contains non-volatile data. This way, we can minimize the downtime window and reduce the backup and restore time.
- Loading data to/from is fast– Data load in the partition table takes only seconds, instead of the minutes or hours of operation when you have a non-partitioned table, using a technique called SWITCH-IN. I will talk in detail about this in my next article.
- Data archival– Data archival from a partitioned table again takes only seconds, instead of the minutes or hours of operation when you have a non-partitioned table, using a technique called SWITCH-OUT. I will talk in detail about this in my next article.
Conclusion
In this article of the series, I talked about what partitioning in SQL Server is, the different kinds of partitioning options that we have, why and when we should go for partitioning and what benefits a partition table/index provides. In the next article of the series, I am going to discuss the different concepts of partitioning in SQL Server and provide a step-by-step guide on creating partition table/index.