Incremental Processing
Overview and Discussion
As we
noted before, processing a cube recalculates its aggregations, and then loads
the data and aggregations to the cube. The steps that MSAS undertakes in this
process include filling in of the various dimension levels in the cube with
data that it reads from the dimension tables, as well as reading data from the
fact table, calculating the designed aggregations, and then populating the cube
with the results.
We have seen that a cube
must be processed before it can be queried, in numerous scenarios we have
encountered in our series to date. To expand upon this a bit more precisely,
any of the following actions can, if performed on a cube, force the processing
of the cube before it can be queried or browsed within MSAS, as we have noted individually
in other articles:
-
The initial
build of the cube;
-
Designing
storage options and aggregations for the cube (in conjunction with an initial
build or not);
-
Changing the
cube's structure (measures, dimensions, and so on) and saving the changes to
the cube;
-
Making
structural changes to a shared dimension used within the cube.
For reasons that are
obvious, changes in the data mart or warehouse underlying a cube are common grounds
for processing. The synchronization of any cube with the data from
which it is constructed insures that the cube accurately and completely reflects
the data that it exists to present. Periodicity of these synchronization
updates relies upon the frequency of change in the cube's source data, among
other possible factors.
In summary, then, we see
that any change to the source data that underlies a cube, and many structural
changes we make in Analysis Manager, force processing of the cube to ensure
that the changes are, in turn, synchronized, and cube updates completed, before
presentation of the cube data to information consumers. Let's take a look next
at processing options that MSAS makes available.
Processing Options
MSAS offers numerous processing
options, as we shall see in this section. The best way to synchronize our
cubes with the underlying data, and to ensure that they reflect completely the
dimensional structure that we have established in Analysis Manager, is to "toss"
the cube entirely, and rebuild from scratch. There are many times when this is
not desirable, however, and so other options exist. The time it takes to perform
a Full Processing cycle (say, longer than the overnight period
that lies between hours that information consumers are expected to need access
to the cube we are updating) may be prohibitive, although updates are vital.
Moreover, storage space may be a consideration. The accumulation of
aggregations in a Full Processing cycle requires the creation of
temporary files that, in combination with the independent copies of cube and
dimension files that accompany a full build, can become major resource
challenges.
One way to manage
challenges of this sort within MSAS is to perform "piecemeal processing"
of certain components of the MSAS database. Of the four options available for
these independent processing approaches, three are mutually exclusive, and
include Full Process, Refresh Data and Incremental Update.
The fourth option, Incrementally Update the Dimensions of this Cube can
be performed along with any of the first three options, to incrementally update
the cube's dimensions as part of the cube processing.
Our exposure to cube
processing within my articles largely centers upon the Full Process
option. Unsurprisingly, Full Processing is the path we take most of the
time, because it affords us a simple approach, within the context of articles
that focus on another aspect of cube design and development entirely, to
updating the cube involved, so that we can continue with the primary focus of
the article without too much distraction. As we have discussed, Full
Processing completely rebuilds a cube from scratch, reconstructs it based
upon the current MSAS definition, recalculates its data and aggregations, and
repopulates the new structure with the data and aggregations. In the business
environment, other processing options can be of value for various reasons.
When
we simply need to clear out the data in a cube, then to reload the data with
its recalculated aggregations, (as in a case where the underlying source data
has changed, but the cube structure has not been altered), we have the Refresh
Data option. This would be the case, say, where we want to update a cube
for each month's activity (new data), and to recalculate all aggregations
surrounding the data, and where we are certain that no changes have been made
in the structure of the cube. This approach is intended for the "steady
state" operations of many organizations.
In my
opinion, little is gained by choosing Refresh Data over Full Process
because both processing options completely rebuild the aggregations tables from
scratch. If testing on your local environment shows little difference, you
might simply use the Full Process option for similar update needs: at
least with Full Process you get an automatic check of the dimension
structure to ascertain that no changes have, in reality, been made.
An Incremental
Update adds new data and updates aggregations. Structural
changes, such as those made to dimensions, measures, and so forth, are not
updated with an Incremental Update. Moreover, Incremental Updates
do not update changes to the cube's existing, underlying source data.
We will examine the Incremental Update further, within the context of
walking through the steps of the Incremental Update Wizard, in our next
section. The important characteristic of an Incremental Update to keep
in mind is that it merges new data into an existing partition, adding
the data to, and updating the aggregations of, the cube for which it is
initiated.
This
management of "what has changed only" in the warehouse / mart, for
addition to the cube, has its obvious attractions when the goal is to reduce
cube processing time, but only if the way that it works is understood in detail,
and if accommodation for the process is made prior to its use. To summarize
its operation in simple terms, we can look at the Incremental Update
process as being pointed at a different fact table than that in which the "existing"
/ already processed data is being stored - or, if pointed at the same fact
table, pointed only at the "new" subset of the data via a filter we
put in place. New files are created by the process - cube files that are
identical to those produced in a Full Process build.
The Incremental
Update process, in contrast with Full Process, does not simply swap
the new files it creates with the ones that make up the previous cube.
Remember, these cube files presumably represent "new" data that is
not summarized in the existing cubes. The Incremental Update process
creates yet another set of files, composed of a combination of the original
cube files and the "new" cube set. Because we have, at least
temporarily, three full sets of files, Incremental Updates on larger
cubes may not act to relieve the disk space issues that cause problems with a Full
Process approach. In addition, other potential dangers can arise within the
process by which the Incremental Update creates a temporary partition to
accomplish the merge with the existing data, which is housed within its own
partition(s). We will touch upon this further at the appropriate point in our
practice exercise within the next section.
NOTE: Once the "combination cube"
is born, the original and "complementary" (or "delta") cube
files are removed completely from the drive. The combination cube, now alone,
is then named to identify it as the cube it replaces.
We will discuss the Incremental
Update process in more detail, when it is relevant to our introduction to
the Incremental Update Wizard, in the next section. As I have stated, I address
detailed specific incremental processing, optimization and other strategies and
approaches in other articles. Our
focus in this article is to introduce the concept of incremental updates, and the
use of the Incremental Update Wizard, and, therefore, our efforts rely
upon the underlying assumption of an Incremental Update.
Incrementally Update
the Dimensions of this Cube is a
supplementary action that, as we have stated before, can be performed along
with any of the available options to incrementally update the cube's dimensions.
This is done as part of cube processing under the respective option, and is not
the type of Incremental Update for which we use the Incremental
Update Wizard. It simply represents a means of adding in new dimension
members that have come along. This "add-on" feature within the
primary update options exists to enable us to easily handle simple member adds
that do not alter the dimension structure enough to drive a forced reprocessing
of the cube.