Introduction to MSSQL Server Analysis Services: Mastering Enterprise BI: Create Aging "Buckets" in a Cube - Page 2
August 8, 2005
An Approach to Aging within a Cube
Overview and Discussion
In this article, we will focus upon an accounts receivable aging scenario, largely because of the pervasiveness of such agings in the business environment, but also because it will work slightly better within the constraints of the sample databases (both relational and Analysis Services), among other sample objects, that are available to virtually anyone who has installed Analysis Services. Regardless of the type of aging we need to create within the cubes we design for clients and employers, the general concept is virtually always the same: to create "buckets" to group, and typically aggregate, transactions based upon their ages. In this article, we will work within an accounts receivable scenario, but I have applied the same principles within inventory agings, where I have assisted clients in determining risk of obsolescence, as well as an ongoing basis for write downs and other adjustments, etc. in the inventory realm. Furthermore, I have implemented underlying structural support in numerous other scenarios where I have applied aging concepts within cube design and development.
Within an accounts receivable aging, customer transactions are classified according to the date of sale, and, although presentations are legion, the end objective is to present an aging schedule of some sort that lists accounts and quantifies the length of time balances / transactions have been outstanding. Because the risk of uncollectibility increases the longer accounts remain unpaid, aging schedules help the organization to minimize risk by revealing patterns of delinquency and helping to create a focus upon where collection efforts should be concentrated.
A well-maintained accounts receivable aging can also help the enterprise in the prevention of loss of future sales, as some customers / clients might be tempted to seek other sources of supply and "jump" to another supplier if, say, their balances reach high levels and other circumstances emerge (recessionary phases in the economy, reduced demand for the customer's goods, general instability, and other factors). The objective of aging is to minimize bad debt on the books, as well as to optimize cash flows and the overall health of the organization.
The approach we take to achieve our ends in this article represents but one of several that I have taken for aging within Analysis Services cubes. It is, in my opinion, both easy to understand and straightforward to implement, given a meaningful transaction date within the data upon which an aging can be based. In our example, we will be "aging" the purchases of customers in a manner that is intended to simulate accounts receivable. The transactions and structures within the FoodMart Sales sample cube are, to say the least, a highly simplified reflection of the real world, but nevertheless provide the basis we need to create a scenario that can be extrapolated to accounts receivable, as well as other accounts where we might wish to similarly age transactions. The idea is to demonstrate a method in a way that is of benefit to virtually anyone who has installed Analysis Services with its accompanying samples.
In this article, we will examine the use of a dimension whose members are based upon conditional logic to attain our objectives. As I said earlier, this is but one of several approaches I have used to perform aging processes in Analysis Services (I have also done it with calculated measures, as well as using hybrid approaches where I perform the aging of the transactions at an intermediate view level within the RDBMS, for various reasons). In the same manner that derived measures can be based upon conditional logic (as an example, I often use CASE statements in derived measures for this purpose, when the RDBMS under consideration is MSSQL Server), we can use conditional logic to create the members of a dimension outfitted for the purpose of housing aging groups.
NOTE: For more information on the derived measures in general, see my article Derived Measures vs. Calculated Measures, also within this Database Journal series.
The approach to take in creating aging "buckets," be it through the method we will examine in this article, or through alternatives we will suggest here and in later articles, depends upon many factors, chief of which are perhaps optimization considerations. Our intent in this article will not be to focus upon optimization per se, as this is an area that can vary greatly from environment to environment. In this article, we will present an approach, an option to achieving aging accumulations. It is up to the reader to determine whether the support structure itself (a dimension based upon conditional logic) is appropriate, or if another method is a better "fit" within their own setting.
For purposes of our practice procedure, we will assume that we have been contacted by a client to assist in the addition of aging capabilities to a cube they already have in place. We meet with representatives of the Accounting department of the FoodMart organization, who inform us that they wish to age accounts receivable based upon customer transaction dates, over the last six months of their current year, 1997. The consumers emphasize repeatedly that queries are to process as rapidly as possible.
The data under consideration, we are told, resides within the Sales cube (which accompanies the installation of Analysis Services as a sample). The consumers tell us that they wish to age the customer accounts into the following relatively common aging buckets:
Limitations of the sample Sales cube require us to ignore the fact that there is no "sold" date (indeed, the simple business scenario presented within the cube gives us little more than a "cash basis" depiction of business. The sales transactions that underlie the Sales cube do not indicate important information that would be necessary to generate conventional accounts receivable agings or other attributes. This, however, is what we have to work with within a sample set that everyone has. From the perspective of the data source (FoodMart 2000.mdb), the transactiondate that we will use resides within the time_by_day table, joined to the sales_fact_1997 table as shown in Illustration 1.
Because the data source provides no "collection" date for the "receivables" generated on a given transaction date (again, it appears that sales are "cash basis" in this simple data), we will need to make some assumptions. Keep in mind that the idea here is to demonstrate a method for creating aging buckets, and not to model an accounts receivable system, and so forth.
We will focus on the last six months of 1997, and, for purposes of our practice example, assume the unlikely circumstance that no receipts have been posted on the sales that have taken place within those six months. Again, we are limited by the simplicity of the data sources, and are simply practicing the generation of aging accumulations via one straightforward method. The scenario within which we will work is hardly a reflection of a real world financial system and the data it houses. In the real world, collection dates, as well as "status" flags (such as "received," "receipt matched to receivable transaction," and so forth, would provide us the capability of selecting only "unpaid" or "outstanding" receivables, which we would be able to filter out of the population of "all receivables" as of a given date for aging purposes. (Integrated financial databases, such as CODA-Financials, even allow us to determine other factors within the receivables collection cycle, such as time elapsed between the initial transaction date and the collection date, which are captured as part of the match between receipts and associated receivables, which also marks a transaction as fully / partially paid, etc. The storage of these events and dates provide us the capability to generate useful metrics regarding general cash flows, customer payment trends, and other opportunities for analysis).
As a part of using a dimension whose members are "derived" to perform our aging accumulations, we will also focus upon the underlying RDBMS, the FoodMart MS Access database. The approach we use within our dimension to generate our aging accumulations will differ, somewhat, within the far more likely scenario for a data source within the business environment, in which we would be generating a cube based upon an MSSQL Server database (or perhaps use another common enterprise RDBMS as the primary data source). We will offer an approach that would be more appropriate in constructing a similar set of dimensional members for an MSSQL Server relational data source at the appropriate time, as an added tip in extrapolating the concepts of the lesson to your own business environments.
We confirm our understanding of the requirement with the Accounting department representatives, and set about the creation of the required capabilities.
Considerations and Comments
For purposes of our practice exercise, we will be working with the Sales cube, within a copy of the FoodMart 2000 Analysis Services database. The Sales cube, as most of us are aware, is a member of a group of working samples that accompanies a typical installation of Analysis Services. If the sample FoodMart MSAS database, the FoodMart .mdb that underlies it, or any other sample object to which we refer in our practice exercise, was not installed, or was removed prior to your beginning this article, please see the MSAS documentation, including the Books Online, for the straightforward procedure to restore the database from the archive (.cab) file containing the samples or elsewhere. As of this writing, a copy of the archive / other components can be obtained from the installation CD or via download from the appropriate Microsoft site(s).
We will construct a shared dimension, whose members are derived from conditional logic that will generate aged accumulations. As I stated earlier, I often use calculated members to achieve a similar effect depending upon the environment involved, and the specific optimization considerations therein. While the focus of this article is not optimization, it might be useful to provide a brief treatment of the high-level considerations. A good starting point is to compare and contrast dimensions and calculated members in general, as well as from within the current perspective.
Unlike calculated members (including, of course, calculated measures), whose values are created at runtime based upon the MDX expression(s) they contain, a dimension, just like a measure, is calculated during the cube processing cycle, and is stored within Analysis Services. In the case of the calculated member, only its definition is stored, and the values it renders are calculated in memory when specified in a query ("runtime"). Because only the definition is stored, cube processing time ("build time") is not materially increased, and, indeed, depending upon the environment, this might be an appropriate tradeoff. In our present scenario, however, query processing time might represent a more important concern, so we lose a possible benefit behind the choice of a calculated measure to provide the needed values.
As we will see in the practice example, the approach of deriving the members of our Aging Periods dimension will take advantage of the flexibility that is offered in by the member key column for dimensions in Analysis Services, which we will leverage to house the conditional logic we will use for deriving the aging buckets. In general, our approach might make more sense if the aging aggregations are frequently called upon for reporting, and if query speed is important, which, as we have learned from our hypothetical information consumer group, is the case within the scenario of our practice example. Calculated members might be a better choice for values that we expect to be used infrequently, again because the tradeoff is tipped more in favor of a lower overhead on the cube processing side, where query response time might not be as high a priority.
The "calculation" behind derived members within a dimension can extend well beyond simple concatenation and the like, as we shall see in our example, and can apply conditional logic and other nuances in derivation of the members. The syntax obviously has to fit the database our practice example will leverage MS Access friendly syntax, because the FoodMart 2000 data source ships as an MS Access .mdb.