Introduction to MSSQL Server 2000 Analysis Services: Performing Incremental Cube Updates - An Introduction - Page 2
November 8, 2004
Overview and Discussion
As we noted before, processing a cube recalculates its aggregations, and then loads the data and aggregations to the cube. The steps that MSAS undertakes in this process include filling in of the various dimension levels in the cube with data that it reads from the dimension tables, as well as reading data from the fact table, calculating the designed aggregations, and then populating the cube with the results.
We have seen that a cube must be processed before it can be queried, in numerous scenarios we have encountered in our series to date. To expand upon this a bit more precisely, any of the following actions can, if performed on a cube, force the processing of the cube before it can be queried or browsed within MSAS, as we have noted individually in other articles:
For reasons that are obvious, changes in the data mart or warehouse underlying a cube are common grounds for processing. The synchronization of any cube with the data from which it is constructed insures that the cube accurately and completely reflects the data that it exists to present. Periodicity of these synchronization updates relies upon the frequency of change in the cube's source data, among other possible factors.
In summary, then, we see that any change to the source data that underlies a cube, and many structural changes we make in Analysis Manager, force processing of the cube to ensure that the changes are, in turn, synchronized, and cube updates completed, before presentation of the cube data to information consumers. Let's take a look next at processing options that MSAS makes available.
MSAS offers numerous processing options, as we shall see in this section. The best way to synchronize our cubes with the underlying data, and to ensure that they reflect completely the dimensional structure that we have established in Analysis Manager, is to "toss" the cube entirely, and rebuild from scratch. There are many times when this is not desirable, however, and so other options exist. The time it takes to perform a Full Processing cycle (say, longer than the overnight period that lies between hours that information consumers are expected to need access to the cube we are updating) may be prohibitive, although updates are vital. Moreover, storage space may be a consideration. The accumulation of aggregations in a Full Processing cycle requires the creation of temporary files that, in combination with the independent copies of cube and dimension files that accompany a full build, can become major resource challenges.
One way to manage challenges of this sort within MSAS is to perform "piecemeal processing" of certain components of the MSAS database. Of the four options available for these independent processing approaches, three are mutually exclusive, and include Full Process, Refresh Data and Incremental Update. The fourth option, Incrementally Update the Dimensions of this Cube can be performed along with any of the first three options, to incrementally update the cube's dimensions as part of the cube processing.
Our exposure to cube processing within my articles largely centers upon the Full Process option. Unsurprisingly, Full Processing is the path we take most of the time, because it affords us a simple approach, within the context of articles that focus on another aspect of cube design and development entirely, to updating the cube involved, so that we can continue with the primary focus of the article without too much distraction. As we have discussed, Full Processing completely rebuilds a cube from scratch, reconstructs it based upon the current MSAS definition, recalculates its data and aggregations, and repopulates the new structure with the data and aggregations. In the business environment, other processing options can be of value for various reasons.
When we simply need to clear out the data in a cube, then to reload the data with its recalculated aggregations, (as in a case where the underlying source data has changed, but the cube structure has not been altered), we have the Refresh Data option. This would be the case, say, where we want to update a cube for each month's activity (new data), and to recalculate all aggregations surrounding the data, and where we are certain that no changes have been made in the structure of the cube. This approach is intended for the "steady state" operations of many organizations.
In my opinion, little is gained by choosing Refresh Data over Full Process because both processing options completely rebuild the aggregations tables from scratch. If testing on your local environment shows little difference, you might simply use the Full Process option for similar update needs: at least with Full Process you get an automatic check of the dimension structure to ascertain that no changes have, in reality, been made.
An Incremental Update adds new data and updates aggregations. Structural changes, such as those made to dimensions, measures, and so forth, are not updated with an Incremental Update. Moreover, Incremental Updates do not update changes to the cube's existing, underlying source data. We will examine the Incremental Update further, within the context of walking through the steps of the Incremental Update Wizard, in our next section. The important characteristic of an Incremental Update to keep in mind is that it merges new data into an existing partition, adding the data to, and updating the aggregations of, the cube for which it is initiated.
This management of "what has changed only" in the warehouse / mart, for addition to the cube, has its obvious attractions when the goal is to reduce cube processing time, but only if the way that it works is understood in detail, and if accommodation for the process is made prior to its use. To summarize its operation in simple terms, we can look at the Incremental Update process as being pointed at a different fact table than that in which the "existing" / already processed data is being stored - or, if pointed at the same fact table, pointed only at the "new" subset of the data via a filter we put in place. New files are created by the process - cube files that are identical to those produced in a Full Process build.
The Incremental Update process, in contrast with Full Process, does not simply swap the new files it creates with the ones that make up the previous cube. Remember, these cube files presumably represent "new" data that is not summarized in the existing cubes. The Incremental Update process creates yet another set of files, composed of a combination of the original cube files and the "new" cube set. Because we have, at least temporarily, three full sets of files, Incremental Updates on larger cubes may not act to relieve the disk space issues that cause problems with a Full Process approach. In addition, other potential dangers can arise within the process by which the Incremental Update creates a temporary partition to accomplish the merge with the existing data, which is housed within its own partition(s). We will touch upon this further at the appropriate point in our practice exercise within the next section.
NOTE: Once the "combination cube" is born, the original and "complementary" (or "delta") cube files are removed completely from the drive. The combination cube, now alone, is then named to identify it as the cube it replaces.
We will discuss the Incremental Update process in more detail, when it is relevant to our introduction to the Incremental Update Wizard, in the next section. As I have stated, I address detailed specific incremental processing, optimization and other strategies and approaches in other articles. Our focus in this article is to introduce the concept of incremental updates, and the use of the Incremental Update Wizard, and, therefore, our efforts rely upon the underlying assumption of an Incremental Update.
Incrementally Update the Dimensions of this Cube is a supplementary action that, as we have stated before, can be performed along with any of the available options to incrementally update the cube's dimensions. This is done as part of cube processing under the respective option, and is not the type of Incremental Update for which we use the Incremental Update Wizard. It simply represents a means of adding in new dimension members that have come along. This "add-on" feature within the primary update options exists to enable us to easily handle simple member adds that do not alter the dimension structure enough to drive a forced reprocessing of the cube.