As you can see within the displayed window, the multiphase data pump is
divided into the following phases and sub-phases:
-
Pre Source Data - The first phase, executed only once per
task, before the first row is fetched from the source. This can be used, for
example, to create a log file, initialize data, or write a header row.
-
Row Transform - The second phase, encompassing the functionality
of the entire SQL Server 7.0 Transform Data task, launched every time a new row
is retrieved from the data source and completed prior to writing it to the
destination.
-
Post Row Transform - The third phase, performing
additional processing depending on the outcome of the second phase. This
outcome determines which one of its sub-phases (On Insert Success, On Insert
Failure, On Transform Failure) is invoked. On Insert Success is triggered when
processing was successful, On Insert Failure indicates problems at the
destination (such as inserting null data in a non-nullable column), and
Transform Failure is commonly the result of type conversion errors (e.g.
character to integer). Post Row Transform is typically followed immediately by
the Row Transform phase, fetching the next row from data source, although
whether this happens depends on a couple of factors. More specifically,
Transform Failure or Insert Failure could result in task termination, depending
on the value of the (previously discussed) max error count parameter
(configurable from the Options tab of the Transform Data Task Properties dialog
box). Other possibilities include the Batch Complete phase, if the batch size
has been reached (discussed next) or the Post Source phase if the last row has
been processed.
-
Batch Complete - The fourth phase, executing once per
batch, once it is completed. This is where the Insert batch size parameter,
mentioned earlier in this article (available from the Options tab of the
Transform Data Task Properties dialog box), comes into play. With this
functionality, you can divide your entire data load into more manageable pieces
and handling cleaning and record keeping tasks separately for each.
-
Pump Complete - This takes place after the last row of the
data is processed. Since actions performed during this phase have no access to
source or destination, they are used typically for data-independent cleanup
activities, such as global variable manipulation (which can be used as error
counters).
-
Post Source Data - The final stage of the task, which
provides data access. It is frequently used to write footer rows to the
destination (although it can also include a number of other activities, such as
global variable cleanup).
Division of processing into phases provides several benefits. It offers more
control over data manipulation as well as error handling and recovery.
Transformation can be restarted from the point of failure, without reloading
already processed data.
Let's take a look at a sample ActiveX Script demonstrating the multi-phase
capabilities of the Transform Data Task. For the sake of example, let's assume
we will use an existing transform, producing table consisting of the content of
the Shippers table with the total number of orders for each ShipperID, using
the batch size of two (which should produce two batches). We will record the
processing information in a text file (which name is stored in a global
variable "LogFile") including such values as "TotalBatches",
"TotalRows", and "FailedRows". Prior to any customization, auto
generated code of our ActiveX Script transformation has the following format
(Function Main is executed during the Row Transform phase):
Function PreSourceMain()
PreSourceMain = DTSTransformstat_OK
End Function
Function Main()
DTSDestination("Orders") = DTSSource("Orders")
DTSDestination("Phone") = DTSSource("Phone")
DTSDestination("CompanyName") = DTSSource("CompanyName")
DTSDestination("ShipperID") = DTSSource("ShipperID")
Main = DTSTransformStat_OK
End Function
Function TransFailureMain()
TransFailureMain = DTSTransformstat_OK
End Function
Function InsertSuccessMain()
InsertSuccessMain = DTSTransformstat_OK
End Function
Function InsertFailureMain()
InsertFailureMain = DTSTransformstat_OK
End Function
Function BatchCompleteMain()
BatchCompleteMain = DTSTransformstat_OK
End Function
Function PumpCompleteMain()
PumpCompleteMain = DTSTransformstat_OK
End Function
Function PostSourceMain()
PostSourceMain = DTSTransformstat_OK
End Function
Start by creating the "LogFile" global variable from the DTS
Package Properties dialog box (you can find more information on the subject in one of our
earlier articles) of type string and set its value appropriately (e.g.
"d:\logs\MPDPump.log"). Then modify the ActiveX Script so it matches
the following code:
' Copy each source column to the destination column
Function PreSourceMain()
Const Overwrite = TRUE
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oLogFile = oFSO.CreateTextFile(DTSGlobalVariables("LogFile").Value, Overwrite)
DTSGlobalVariables("TotalBatches").Value = 0
DTSGlobalVariables("TotalRows").Value = 0
oLogFile.Close
Set oFSO = Nothing
PreSourceMain = DTSTransformstat_OK
End Function
' Copy each source column to the destination column
Function Main()
DTSDestination("Orders") = DTSSource("Orders")
DTSDestination("Phone") = DTSSource("Phone")
DTSDestination("CompanyName") = DTSSource("CompanyName")
DTSDestination("ShipperID") = DTSSource("ShipperID")
DTSGlobalVariables("TotalRows").Value = _
DTSGlobalVariables("TotalRows").Value + 1
MsgBox "Row #: " & DTSGlobalVariables("TotalRows").Value
Main = DTSTransformStat_OK
End Function
Function TransFailureMain()
TransFailureMain = DTSTransformstat_OK
End Function
Function InsertSuccessMain()
InsertSuccessMain = DTSTransformstat_OK
End Function
Function InsertFailureMain()
InsertFailureMain = DTSTransformstat_OK
End Function
Function BatchCompleteMain()
DTSGlobalVariables("TotalBatches").Value = _
DTSGlobalVariables("TotalBatches").Value + 1
MsgBox "Batch #: " & DTSGlobalVariables("TotalBatches").Value
BatchCompleteMain = DTSTransformstat_OK
End Function
Function PumpCompleteMain()
Const ForReading = 1, ForWriting = 2, ForAppending = 8
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oLogFile = oFSO.OpenTextFile(DTSGlobalVariables("LogFile").Value, ForWriting)
oLogFile.WriteLine "Total Rows:" & vbTab & DTSGlobalVariables("TotalRows").Value
oLogFile.WriteLine "Total Batches:" & vbTab & DTSGlobalVariables("TotalBatches").Value
oLogFile.Close
Set oFSO = Nothing
PumpCompleteMain = DTSTransformstat_OK
End Function
Function PostSourceMain()
PostSourceMain = DTSTransformstat_OK
End Function
This trivial example, which displays rows and batch numbers as they are
being processed, and writes the total number of batches and rows to a log file,
should help you to create scripts that are more elaborate. Refer to the Phased
Transformation Samples topic in the SQL Server 2000 Books Online for additional
help on this subject (you will find there samples demonstrating TransformFailed
and InsertFailed functions). In our next article, we will cover the remaining
two DTS tasks.
»
See All Articles by Columnist Marcin Policht