Process Tracking

Introduction

One of the constructs that I find myself building quite
often is a process that periodically retrieves an incremental amount of
data.
So many reports that I build, whether for myself or for others, are looking
for
the new data since some x amount of time. It might be the sales for the last
day, changes on a particular table, etc., but usually it is the change in
some
data over the last day.

Of course, this is usually a fairly easy report structure to
build. I have routines that determine the previous day’s date, resetting the
time to midnight (00:00:00) and then select all data that is new since this
date. Since most of the tables I am working with have some sort of datetime
field to track changes, this results in a query that looks something
like:

declare @dt
select @dt = cast( substring(
             cast(dateadd( day, -1, getdate()) as char( 20)), 1, 12) as
             datetime)

select *
from sales
where SaleDate > @dt

Where @dt is the previous day’s date, calculated with a time
set to some value.

The Problem

This works great for the first week that it is implemented.
The next Monday when someone comes into work and runs this report, they
usually
see a smaller report than they are expecting. Why?

Well, in a few e-commerce applications I have worked with,
there is a bunch of activity on Friday and Saturday. I guess it’s the end of
the week and people do more surfing and purchasing. If we are calculating
the
previous day’s date, then on Monday, we get all the sales for Sunday. This
leaves out the sales from Friday and Saturday.

No problem, we can easily change our code that calculates
the beginning date to account for weekends. With the following case
statement,
we can decide whether we need to get one day or three days worth of
data.

declare @dt datetime, @d tinyint
if datepart( weekday, getdate()) = 2
  select @d = -3
else
  select @d = -1

select @dt = cast( substring( cast( dateadd( day, @d, getdate()) as char( 
20))
                   , 1, 12) as datetime)

This will check for a Monday (datepart = 2) and then set the
number of days to go back for the next statementl.

Since most of my processes are automated and I send an
incremental amount of data to a user automatically on some schedule, this
used
to be my preferred solution. I have used this and it usually works until one
of
two things happens. Either the task fails for some reason and it takes a day
or
two to get fixed, or a holiday occurs. If the task breaks on Monday and it
does
not run again until Tuesday ( this usually happens when I go on vacation)
the user
gets a report on Tues that shows Mon sales, but is missing all the weekend
sales. After a holiday, the user gets two (or more ) reports instead of a
single consolidated report.

The Solution

I came up with this solution while building an extract of
data for a business partner. They wanted to receive an incremental load of
mail
information every two weeks rather than a complete load of data. Since our
mail
information was growing rapidly, this was a reasonable request and so I was
tasked with building this.

I used one of the techniques above which worked for a month
and then the task broke. By the time I had fixed it, a couple of days had
gone
by and I had to manually generate the report, otherwise my task would have
missed a few days of data. It was then that I decided to build my process
tracking table. Here is the code

Create table PrcssTrckr
(    PrcssNm varchar( 80),
     LastXfr datetime,
     LastStat tinyint
)

Nothing fancy, but the coumns are defined as follows:

PrcssNm ProcessName This is an ad-hoc field with a unique index that describes which
process I am
tracking.
LastXfr Last Transfer date This is the last date when I transferred the data using the particular
process.
LastStat Last Status I keep this column updated with an exit code from the process. When it
breaks, I can then check this to see what the result code was. The result
codes
vary from process to process and are not necessarily stored procedure return
codes.

How I Use This

This table allows me flexibility in a few ways. One, I can
use the same data set, like mail information, for a number of different
processes. Suppose I have snail mail and email extraction processes that run
on
the same customer information. For these two data feeds, I would add two
rows
to my PrcssTrckr table as follows:

PrcssNm           LastXfr                 LastStat
--------------    -------------------     ------------------------------
SnailMailFeed     01-01-1900 00:00:00     0
EmailFeed         01-01-1900 00:00:00     0

These would be the initial rows inserted. Now we start the
snail mail export process which runs a query something like the
following:

select name,address, city, state, country, zip
from Customer c, PrcssTrckr p
where c.created > p.lastxfr
and p.PrcssNm = 'SnailMailFeed'

This returns a list of all members (since non were created
before the beginning of the last century. After this data is extracted, the
LastXfr date is set to GETDATE() and the table looks like the following:

PrcssNm           LastXfr                 LastStat
--------------    -------------------     ------------------------------
SnailMailFeed     02-10-2001 08:15:00     0
EmailFeed         01-01-1900 00:00:00     0

Now we know that this process last completed the time
stamped on the first row. If we now run the same process tomorrow, the query
returns an incremental feed of data containing customers created between
2-10-01
and 2-11-01. If I were to wait and run this every other day, then the
incremental feed would include two days of customers.

The process now tracks its own progress and always returns
an incremental feed of data. This is independent of the email process which
could be running on a completely different schedule.

This also allows the process to fail and when it is fixed,
still return the correct incremental amount of data to the user.

Conclusion

Over time, I find myself using this technique for many different
processes, from data extractions to customized reports. Anyplace the
incremental data feeds are needed, this has proven to be a valuable
technique.
It has even ensured correct reports when my server was down for a few days
after our company moved to a new office. As with most techniques, I am sure
that many of you will find even more and better uses for this than I
have.

As always, I welcome feedback.

Steve Jones
February 2001

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles