Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted Jul 12, 2006

Oracle: Separating Numbers and Letters

By Steve Callan


The time has come to make account "numbers" in your customer database real numbers or real alphanumeric strings across the board. Why would you ever choose to do something like that, especially if you are dealing with millions of records? One reason is that you may be forced into doing so because your company was acquired, and the database integration between your system and your new parent company requires such an undertaking. Another reason is based on the rampant use of Social Security Numbers (SSN) as a means of identification. Chances are you have online accounts, which use all or part of your SSN as the account number, which, for what it's worth, SSNs were never intended to be used for anything other than taxation and tracking Social Security benefits. Many colleges, as an example, have moved away from using the SSN as the primary means of identification for students.


Another reason for performing a conversion may be that you have exhausted the original pool of possible account numbers or that your random alphanumeric string generation scheme has been generating duplicates. So whatever the reason, how would you go about salvaging actual numeric strings and identifying those which need to be converted? Here is an example to help clarify the situation.

Suppose you have the following account numbers: A12345, 009431, and 783652. By quick inspection, A12345 obviously needs to be converted because of the letter "A." The third account number is a "real" number, so nothing needs to be done with it. The second account number is a bit tricky. The characters, so to speak, are all numeric, but is 009431 a number you really want to keep given that most systems ignore leading zeroes? Moreover, if the new system has account numbers all the same length (six in this scenario), then 9431 fails to meet the minimum length. The end result is that the salvageable account numbers, which are truly numeric, must also be tested for length (or a minimum value).

What schemes are at your disposal to identify strings one way or the other? Math, translation, date functions, and regular expressions are but four ways to approach and solve this problem.


Approach 1: Using TRANSLATE


This isn't pretty, but it will identify a real number.


SELECT <column> FROM <table> WHERE
length(<column>)- length( TRANSLATE(<column>, CHR(1)||TRANSLATE(<column>, CHR(1)||'1234567890', CHR(1) ), CHR(1) ) ) = 0;

If you want to test for a string, then use "> 0" instead of testing "= 0." I have inserted the three account numbers from before into a test table.

SQL> create table account_test
2  (acctnum varchar2(6));
Table created.
SQL> insert into account_test values('A12345');
1 row created.
SQL> insert into account_test values('009341');
1 row created.
SQL> insert into account_test values(783652);
1 row created.
SQL> commit;
Commit complete.

Note that in the last insert statement, the account number was inserted without the single quotation marks, that is, it went in as a number, so how does Oracle view this now: number or varchar2 string?

SQL> SELECT acctnum "ACCTNUM" FROM account_test WHERE
  2  length(acctnum)- 
  3  length(TRANSLATE(acctnum,CHR(1)||TRANSLATE(acctnum,CHR(1)||'1234567890',
     CHR(1)),CHR(1)))=0;
ACCTNUM
-------
009341
783652

This approach, although the length/translate part looks cumbersome, has one distinct advantage in that the results will always be one way or the other. How do you account for the length of the number so that 009341 is filtered out? This leads to a math approach.

Approach 2: Using Math

Adding an extra "and length(acctnum/1)=6" or similar operation has the desired effect of filtering 009341. A distinct disadvantage of the math approach is that when using it by itself, you will run into an error situation when trying to divide ‘A12345' by one.

SQL> select '009341'/1 from dual;
'009341'/1
----------
      9341
SQL> select 'A12345'/1 from dual;
select 'A12345'/1 from dual
       *
ERROR at line 1:
ORA-01722: invalid number

The math approach is still valid if you account for the ORA-01722 error, and one way to do this is to use a nested PL/SQL block within another PL/SQL block so you can deal with the exception without exiting the main processing loop.

SQL> set serveroutput on
SQL> declare
  2    v_result number;
  3    cursor c is
  4    select acctnum
  5    from account_test;
  6  begin
  7    for r in c loop
  8      begin
  9        select r.acctnum/1 into v_result from dual;
 10        dbms_output.put_line(r.acctnum||' is a number');
 11      exception
 12        when others then
 13          dbms_output.put_line(r.acctnum||' is a NOT number');
 14      end;
 15    end loop;
 16  end;
 17  /
A12345 is a NOT number
009341 is a number
783652 is a number
PL/SQL procedure successfully completed.

A slight modification to the select into v_result statement will perform the filtering operation from before, so 009341 will be flagged as not being a number (at least not in the sense of what we're looking for).

Using WHEN OTHERS has the undesired effect of masking other errors, so a better exception handling condition, that is, one which specifically catches "invalid number" conditions, would be ideal. Fortunately, there happens to be a built-in predefined handler named INVALID_NUMBER. If all you are familiar with is the OTHERS condition, you owe it to yourself to see what Oracle (specifically, PL/SQL) has to offer in its list of Predefined PL/SQL Exceptions.

So far, the results have been simply output to the user/terminal. Instead of using a select statement (by itself), the result set can be steered into one of two tables via an "insert into other_table select …" statement.

Approach 3: Using a Date Function

An infrequently used date format option is Julian date. A simple example of using the Julian date format to convert numbers to words is shown below.

SQL> select to_char(to_date(009341,'J'), 'JSP') "WORDS" from dual;
WORDS
-------------------------------------
NINE THOUSAND THREE HUNDRED FORTY-ONE

What happens when you try to convert A12345 into words using this approach? I will use the same type of PL/SQL block as before, but toss in an extra exception handling condition.

SQL> declare
  2    v_result varchar2(300);
  3    cursor c is
  4    select acctnum
  5    from account_test;
  6  begin
  7    for r in c loop
  8      begin
  9        select to_char(to_date(r.acctnum ,'J'), 'JSP') into v_result from dual;
 10        dbms_output.put_line(r.acctnum||' is a number');
 11      exception
 12        when value_error then
 13          dbms_output.put_line(r.acctnum||' is a NOT number');
 14        when others then
 15          dbms_output.put_line(sqlerrm);
 16      end;
 17    end loop;
 18  end;
 19  /
ORA-01858: a non-numeric character was found where a numeric was expected
009341 is a number
783652 is a number
PL/SQL procedure successfully completed.

The VALUE_ERROR handler didn't catch the ORA-01858 condition and is why the output shows the SQL error message. The inner exception block can be modified to catch/test for specific error numbers, but accounting for all of the possible types of errors of this nature quickly makes the date approach – at least as far as the account number conversion project is concerned – somewhat cumbersome. Our last approach is elegant and easy to implement.

Approach 4: Using Regular Expressions

A new feature in Oracle (which release?) deals with regular expressions. Long a mainstay of other languages (to include UNIX), regular expression functionality had been missing from Oracle. This approach is so simple and elegant to use that you may wonder why the other three were even mentioned in the first place. The only requirement for using REGEXP is that you must be using, obviously, a version of Oracle that supports it. You know there are companies still using version 7.3.4, and given that REGEXP first appeared several versions later, well, that answers why the other approaches merit consideration.

So what does REGEXP_LIKE do for us? Let's go straight to the "only digits" search condition.

SQL> select acctnum from account_test
  2  where regexp_like (acctnum,'^[[:digit:]]+$');
ACCTNUM
-------
009341
783652

Again, a slight modification to the query will filter out the too short 009341 number. The REGEXP_LIKE function performs exactly the same as the first approach using TRANSLATE, that is, results come back one way or the other.

A Slight Extension

Taking either the first or last approach and converting the query/returned result into a function call adds an extremely useful utility users/Oracle have been missing since day one. How do you test for "not a number" or "is a number" in Oracle? By using the first or last approach, you just performed that test. Turn it into a function returning true or false, and grant execute permissions as appropriate, and you can now perform NaN(whatever) tests.

In Closing

The examples in this article are not purely academic because they can be applied to many commonly encountered day-to-day situations. Data hygiene is of critical importance to a database. Suppose you are building (or managing) a database application, which takes orders from customers. How do you check if a phone number, bank routing number, bank account number, or credit card number is all digits? Alternatively, what if you need to identify customers by country (using the US and Canada as examples)? US ZIP codes are different from Canadian postal codes. A NON all-digit postal code is a good bet that you are dealing with a Canadian address. A common theme in my articles is that there is almost always more than one way to solve a problem, and the problem of separating numbers from letters is no exception.

» See All Articles by Columnist Steve Callan



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM


















Thanks for your registration, follow us on our social networks to keep up-to-date