Oracle: Separating Numbers and Letters
July 12, 2006
Another reason for performing a conversion may be that you have exhausted the original pool of possible account numbers or that your random alphanumeric string generation scheme has been generating duplicates. So whatever the reason, how would you go about salvaging actual numeric strings and identifying those which need to be converted? Here is an example to help clarify the situation.
Suppose you have the following account numbers: A12345, 009431, and 783652. By quick inspection, A12345 obviously needs to be converted because of the letter "A." The third account number is a "real" number, so nothing needs to be done with it. The second account number is a bit tricky. The characters, so to speak, are all numeric, but is 009431 a number you really want to keep given that most systems ignore leading zeroes? Moreover, if the new system has account numbers all the same length (six in this scenario), then 9431 fails to meet the minimum length. The end result is that the salvageable account numbers, which are truly numeric, must also be tested for length (or a minimum value).
What schemes are at your disposal to identify strings one way or the other? Math, translation, date functions, and regular expressions are but four ways to approach and solve this problem.
Approach 1: Using TRANSLATE
This isn't pretty, but it will identify a real number.
SELECT <column> FROM <table> WHERE length(<column>)- length( TRANSLATE(<column>, CHR(1)||TRANSLATE(<column>, CHR(1)||'1234567890', CHR(1) ), CHR(1) ) ) = 0;
If you want to test for a string, then use "> 0" instead of testing "= 0." I have inserted the three account numbers from before into a test table.
SQL> create table account_test 2 (acctnum varchar2(6)); Table created. SQL> insert into account_test values('A12345'); 1 row created. SQL> insert into account_test values('009341'); 1 row created. SQL> insert into account_test values(783652); 1 row created. SQL> commit; Commit complete.
Note that in the last insert statement, the account number was inserted without the single quotation marks, that is, it went in as a number, so how does Oracle view this now: number or varchar2 string?
SQL> SELECT acctnum "ACCTNUM" FROM account_test WHERE 2 length(acctnum)- 3 length(TRANSLATE(acctnum,CHR(1)||TRANSLATE(acctnum,CHR(1)||'1234567890', CHR(1)),CHR(1)))=0; ACCTNUM ------- 009341 783652
This approach, although the length/translate part looks cumbersome, has one distinct advantage in that the results will always be one way or the other. How do you account for the length of the number so that 009341 is filtered out? This leads to a math approach.
Approach 2: Using Math
Adding an extra "and length(acctnum/1)=6" or similar operation has the desired effect of filtering 009341. A distinct disadvantage of the math approach is that when using it by itself, you will run into an error situation when trying to divide A12345' by one.
SQL> select '009341'/1 from dual; '009341'/1 ---------- 9341 SQL> select 'A12345'/1 from dual; select 'A12345'/1 from dual * ERROR at line 1: ORA-01722: invalid number
The math approach is still valid if you account for the ORA-01722 error, and one way to do this is to use a nested PL/SQL block within another PL/SQL block so you can deal with the exception without exiting the main processing loop.
SQL> set serveroutput on SQL> declare 2 v_result number; 3 cursor c is 4 select acctnum 5 from account_test; 6 begin 7 for r in c loop 8 begin 9 select r.acctnum/1 into v_result from dual; 10 dbms_output.put_line(r.acctnum||' is a number'); 11 exception 12 when others then 13 dbms_output.put_line(r.acctnum||' is a NOT number'); 14 end; 15 end loop; 16 end; 17 / A12345 is a NOT number 009341 is a number 783652 is a number PL/SQL procedure successfully completed.
A slight modification to the select into v_result statement will perform the filtering operation from before, so 009341 will be flagged as not being a number (at least not in the sense of what we're looking for).
Using WHEN OTHERS has the undesired effect of masking other errors, so a better exception handling condition, that is, one which specifically catches "invalid number" conditions, would be ideal. Fortunately, there happens to be a built-in predefined handler named INVALID_NUMBER. If all you are familiar with is the OTHERS condition, you owe it to yourself to see what Oracle (specifically, PL/SQL) has to offer in its list of Predefined PL/SQL Exceptions.
So far, the results have been simply output to the user/terminal. Instead of using a select statement (by itself), the result set can be steered into one of two tables via an "insert into other_table select " statement.
Approach 3: Using a Date Function
An infrequently used date format option is Julian date. A simple example of using the Julian date format to convert numbers to words is shown below.
SQL> select to_char(to_date(009341,'J'), 'JSP') "WORDS" from dual; WORDS ------------------------------------- NINE THOUSAND THREE HUNDRED FORTY-ONE
What happens when you try to convert A12345 into words using this approach? I will use the same type of PL/SQL block as before, but toss in an extra exception handling condition.
SQL> declare 2 v_result varchar2(300); 3 cursor c is 4 select acctnum 5 from account_test; 6 begin 7 for r in c loop 8 begin 9 select to_char(to_date(r.acctnum ,'J'), 'JSP') into v_result from dual; 10 dbms_output.put_line(r.acctnum||' is a number'); 11 exception 12 when value_error then 13 dbms_output.put_line(r.acctnum||' is a NOT number'); 14 when others then 15 dbms_output.put_line(sqlerrm); 16 end; 17 end loop; 18 end; 19 / ORA-01858: a non-numeric character was found where a numeric was expected 009341 is a number 783652 is a number PL/SQL procedure successfully completed.
The VALUE_ERROR handler didn't catch the ORA-01858 condition and is why the output shows the SQL error message. The inner exception block can be modified to catch/test for specific error numbers, but accounting for all of the possible types of errors of this nature quickly makes the date approach at least as far as the account number conversion project is concerned somewhat cumbersome. Our last approach is elegant and easy to implement.
Approach 4: Using Regular Expressions
A new feature in Oracle (which release?) deals with regular expressions. Long a mainstay of other languages (to include UNIX), regular expression functionality had been missing from Oracle. This approach is so simple and elegant to use that you may wonder why the other three were even mentioned in the first place. The only requirement for using REGEXP is that you must be using, obviously, a version of Oracle that supports it. You know there are companies still using version 7.3.4, and given that REGEXP first appeared several versions later, well, that answers why the other approaches merit consideration.
So what does REGEXP_LIKE do for us? Let's go straight to the "only digits" search condition.
SQL> select acctnum from account_test 2 where regexp_like (acctnum,'^[[:digit:]]+$'); ACCTNUM ------- 009341 783652
Again, a slight modification to the query will filter out the too short 009341 number. The REGEXP_LIKE function performs exactly the same as the first approach using TRANSLATE, that is, results come back one way or the other.
A Slight Extension
Taking either the first or last approach and converting the query/returned result into a function call adds an extremely useful utility users/Oracle have been missing since day one. How do you test for "not a number" or "is a number" in Oracle? By using the first or last approach, you just performed that test. Turn it into a function returning true or false, and grant execute permissions as appropriate, and you can now perform NaN(whatever) tests.
The examples in this article are not purely academic because they can be applied to many commonly encountered day-to-day situations. Data hygiene is of critical importance to a database. Suppose you are building (or managing) a database application, which takes orders from customers. How do you check if a phone number, bank routing number, bank account number, or credit card number is all digits? Alternatively, what if you need to identify customers by country (using the US and Canada as examples)? US ZIP codes are different from Canadian postal codes. A NON all-digit postal code is a good bet that you are dealing with a Canadian address. A common theme in my articles is that there is almost always more than one way to solve a problem, and the problem of separating numbers from letters is no exception.