Import Utility Interface
The word "utility"
should be taken with a grain of salt here. It is a utility in the
sense that it is MY utility to import MY data into MY database. To employ this
process with your data, you will have to virtually rewrite the entire
module that performs the data pump. However, before we get into that, I
wanted to introduce you to the user interface I created for demonstrating
this process. It is available in the download for this article.
The first two buttons were added so that readers of this article could
view the XML schema and table relationships diagram more easily while
examining the sample code. The actual import is initiated by clicking
the Run Import button. Once clicked, it becomes disabled and the Cancel
button is enabled. Clicking Cancel interrupts the import process and
returns control to the user. This is critical when processing hundreds
or thousands of files, since that could require an hour or more, depending on
the complexity of your data. The sample data includes 18 XML files, two
of which contain errors.
The single text box above contains the path to the unpacked
XML files. Remember, for the sake of this article we are assuming
that the files are already unzipped. In my example, the source files, once
processed, remain if the folder for future test runs. You will need to modify
my sample code to either delete or move files once they have been processed.
The top listbox shows XML files that could not be processed because they were
ill formed or because they contained an illegal character. The bottom listbox
shows the records corresponding to the 16 files that were successfully
imported. The six right-most columns of this listbox display the
record count for each foreign key child table. Once finished, a log
file is created in the same directory as the utility to keep track of what
was just processed. This log file gave us an easy way to check to see if
the nightly import ran and if so, how large the import was.
How The Real Work Gets Done
Up to this point, everything
has been either theory or window dressing. The real work of this
utility is performed in the module named, BAS_IMPRT_PRBLM_XML by a procedure
called ProcessSingleProblemXML(). It has been reproduced below with
inline comments that explain each step of the process.
Private Function ProcessSingleProblemXML(ByVal sFile As String) As Boolean
On Error Resume Next
' This function will process a single Problem XML file, assuming
' its structure hasn't been changed. (While XML is touted as the
' greatest thing since sliced bread, it has this singular flaw:
' the file must never change or the code below will break!)
'
' Each Problem file has the following data:
'
' PROBLEM_DETAILS (one node per problem)
' CALL_NOTES (many nodes per problem)
' PROBLEM_NOTES (many nodes per problem)
' OUTAGES (many nodes per problem)
' RCA (many nodes per problem)
' PROBLEM_AUDIT_TRAIL (many nodes per problem)
' PROBLEM_HISTORY (many nodes per problem)
'
' XML DOM objects required for walking the Problem.xml tree.
Dim objDomDoc As New DOMDocument
Dim objNodeData As IXMLDOMNode
Dim objNodeItem As IXMLDOMNode
Dim objNodeProblem As IXMLDOMNode ' Top most node
Dim objNode_DETAILS As IXMLDOMNode ' Details node
Dim objNode_CALL_NTS As IXMLDOMNode ' Call Notes node
' The iElement variable will be used for each recordset. It will
' be used to loop through all elements of the node corresponding to
' each field in the corresponding temp table. This process will be
' repeated for each of the 7 primary tables.
Dim ielement As Integer
Dim sProblemID As String
Dim dbs As DAO.Database
Dim rst As DAO.Recordset
Dim sSQL As String
Dim sField As String
Dim sColumn As String
Dim sValue As String
' Open a DAO Database connection
Set dbs = CurrentDb
' Verify that such a file exists. If not, then exit
sFile = cXML_File_Folder & sFile
If Dir(sFile) = "" Then Exit Function
' Open DOM object by loading the xml file.
objDomDoc.async = False
objDomDoc.Load sFile
' Create the top-most node object from which all others will be derived
' I believe there's a way to reference nodes by name but I had difficulty
' getting this to work correctly. Accordingly, I refer to child nodes by
' their ordinal position throughout my code.
'
' I begin by setting a DOM Node object to the top-most parent Node.
Set objNodeProblem = objDomDoc.documentElement.childNodes.Item(0)
' If this XML file cannot produce the node object, report the error.
If Err.Number = 91 Then
Set rst = dbs.OpenRecordset("FIL_IMPRT_ERRRS", dbOpenDynaset)
rst.AddNew
rst!TYP = "PRBLM"
rst!BAD_FIL_NM = Replace(sFile, "Problem", "Reject")
rst!ERRR_DAT = Now()
rst.Update
rst.Close
Set rst = Nothing
ProcessSingleProblemXML = False
Exit Function
Else
ProcessSingleProblemXML = True
End If
' If we made it this far, then the file is valid and we may continue.
' Capture the PROBLEM_ID because without it, we cannot proceed
'
' Again, we're using ordinal location to navigate to that PROBLEM_ID value.
' objNodeProblem.childNodes(0) is the PROBLEM_DETAILS node
' objNodeProblem.childNodes(0).childNodes(0) is the DATA node
' objNodeProblem.childNodes(0).childNodes(0).childNodes(0).Text is PROBLEM_ID
sProblemID = Nz(objNodeProblem.childNodes(0).childNodes(0).childNodes(0).Text, "")
If sProblemID = "" Then
' Missing PROBLEM_ID from XML data file. Cannot continue
Exit Function
Else
' The XML file has a value for PRBLM_ID, so we may proceed.
' Start by deleting any existing record.
sSQL = "DELETE FROM PRBLMS WHERE [PRBLM_ID]='" & sProblemID & "'"
dbs.Execute sSQL
End If
' ///////////////////////////////////////////////////////////////////////////
' Process PROBLEM_DETAILS Node first, creating new PROBLEM_ID record.
' NOTE: Using WHERE clause of 1=0 returns an empty recordset, ready for
' record additions. Saves on the overhead of large recordsets.
'
' REMEMBER: There is only one PROBLEM_DETAIL per record. (parent record)
' As a sort of shorthand, I'm using the objNodeProblem DOM object created
' above as a starting point from which I grab the DATA grandchild.
Set objNode_DETAILS = objNodeProblem.childNodes(0).childNodes(0)
sSQL = "SELECT * FROM PRBLMS WHERE 1=0"
Set rst = dbs.OpenRecordset(sSQL, dbOpenDynaset)
rst.AddNew
' The PROBLEM table has 52 columns. The child nodes are processed in
' what ever order they come. The corresponding table field is looked
' up through the GetFieldName() function. (see below)
For ielement = 0 To 51
Set objNodeItem = objNode_DETAILS.childNodes(ielement)
If Nz(objNodeItem.Text, "") <> "" Then
sField = objNodeItem.nodeName
' As mentioned above, I use aliases for all of the field names.
' This function, GetFieldName() returns the field name alias.
sColumn = GetFieldName("PRBLMS", sField)
' I won't bother to explain this, other than to say that the
' data coming in was dirty and had to be "cleaned up".
' Only add valid, non zero length values
sValue = URLDecode(objNodeItem.Text)
If sValue <> "" Then rst(sColumn) = sValue
End If
Set objNodeItem = Nothing
Next ielement
rst.Update
rst.Close
Set objNode_DETAILS = Nothing
' ///////////////////////////////////////////////////////////////////////////
' Process CALL_NOTES Node second. Loop through for multiple records. This
' node represents a child table, which can have many records for each detail.
' Again, I derive the node object from objNodeProblem created above.
Set objNode_CALL_NTS = objNodeProblem.childNodes(1)
sSQL = "SELECT * FROM PRBLM_CALL_NTS WHERE 1=0"
Set rst = dbs.OpenRecordset(sSQL, dbOpenDynaset)
' This time, we loop through the child nodes, since we don't know how
' many (if any) exist.
For Each objNodeData In objNode_CALL_NTS.childNodes
rst.AddNew
' While the XML Schema doesn't include a PROBLEM_ID field with each of
' it's child nodes, my tables to. It is this field that is used to
' enforce referential integrity, so I add the ID to each new record.
rst!PRBLM_ID = sProblemID
' Again, we know ahead of time that this node has 6 child nodes.
' We MUST know this and it must be accurate for the code to work.
For ielement = 0 To 5
Set objNodeItem = objNodeData.childNodes(ielement)
If Nz(objNodeItem.Text, "") <> "" Then
sField = objNodeItem.nodeName
sColumn = GetFieldName("CALL_NT", sField)
' Only add valid, non zero length values
sValue = URLDecode(objNodeItem.Text)
If sValue <> "" Then rst(sColumn) = sValue
End If
Set objNodeItem = Nothing
Next ielement
rst.Update
Next
rst.Close
Set objNode_CALL_NTS = Nothing
' ///////////////////////////////////////////////////////////////////////////
' Process PROBLEM_NOTES Node third. Loop through for multiple records.
' ...
'
' From this point on, the above code is repeated for all nodes corresponding
' to all 6 child tables. It's nearly identical, so won't be repeated here.
'
' That's all there is to it, really.
In Conclusion
If I knew more about XML, I could probably have written
some really slick and efficient DOM code but unfortunately, I do not and
I did not. The good news is that if I can do this without knowing
a whole lot about XML and DOM, then so can you. The biggest trick
in the code above is figuring out how to walk the XML tree, finding the data
you need. As mentioned in the code comments above, I decided to do
that by identifying the ordinal position of the child node I was
looking for. If your data is stored in attributes instead of node
values, then you will have to figure out how to extract the values you
are looking for. (My favorite XML reference is a book by Mark and
Tracey Wilson named
XML
Programming with VB and ASP. It is very helpful for VB programmers
seeking to get up to speed on XML fast.)
While doing this article I searched the newsgroups looking for a utility or
at least some suggestions on how to approach this issue. I did not find
anything very different from what you see above. For now, at least,
this is going to be a very manual process. If your goal is similar
to mine, a regular import of similar XML files , which
share the same schema, then the paradigm described above will be of
use. If you want to casually import random XML files on an irregular
basis, then I suggest you play around with the XML import wizard in Access
2003. I tried it on the XML used for this article,
and the wizard fell on its face. I wish I had a simpler
solution and as usual, if any reader has a better suggestion, I am all
ears.
»
See All Articles by Columnist Danny J. Lesandrini