XML-illegal characters in database-legal field names?

January 30, 2003

[From XML.com]

John Simpson describes a technique for database and XML integration.

Q: I have a database that contains "unconventional" field names (i.e., Book_&_Page, Grantee's_ID_#). These field names do not meet the requirements for element names, so I am forced to run them through a "sanitizing" function before naming the element nodes. The function replaces or removes the characters offensive to XML. So the Book_&_Page field might become the Book__Page element, and Grantee's_ID_ might become Grantees_ID_no because I've chosen to sanitize simply by removing the offending characters.

Unfortunately, this sanitizing process introduces the possibility that I could end up with elements with the same name, although in the database they are named differently. For example, if there were original fields named Book_&_Page and Book_:_Page, sanitizing by the rule of removing the character would result in Book__Page for both of these fields. (Changing the field names in the database is NOT an option.)

The initial dilemma is deciding what rules to apply to what characters in the sanitizing function. Coming up with a generic set of rules that could apply to multiple datasets over multiple systems seems risky. I am hoping that someone has come up with a way to "wrap" the element name so that these "illegal" characters can be included in the name. Maybe something along the lines of a CDATA section for the content of an element?

The article continues at http://www.xml.com/pub/a/2003/01/29/qa.html