Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum Rss Feed

» Database Journal Home
» Database Articles
» Database Tutorials
MS Access
SQL Scripts & Samples
» Database Forum
» Slideshows
Free Newsletters:

News Via RSS Feed

Rss Feed

Database Journal |DBA Support |SQLCourse |SQLCourse2

Featured Database Articles


Posted July 12, 2018

Oracle's ASCIISTR() and Unicode Characters

By David Fitzjarrell

Oracle provides an interesting function, ASCIISTR(), to return ASCII strings from a VARCHAR2 or CLOB column, and in general it does an admirable job. There are non-printing characters however, that 'put a spanner in the works', returning HEX strings instead of characters. Since each HEX string is five bytes long, such strings can significantly increase the overall length of the text so that it won't 'fit' in the allocated buffer space. Let's look at one of those 'annoying' characters, look at its source and provide a possible solution.

Word-processing programs (which will remain unnamed), spreadsheets, PDFs and HTML documents are great tools, that is until text from any of those sources is inserted into a text or CLOB column in an Oracle database using the popular 'copy and paste' procedure. Many times, those programs use what is called a 'non-breaking space' to separate words in a line of text, and if the Oracle database in question is configured to use UTF-8 character encoding those non-breaking spaces get inserted into the column without error or translation. A simple select of the column data won't reveal these insidious pests, but ASCIISTR() will. This is mentioned in the Oracle online documentation; the description from that document is reproduced below:

ASCIISTR takes as its argument a string, or an expression that resolves to a string, in any character set and returns an ASCII
version of the string in the database character set. Non-ASCII characters are converted to the form \xxxx, where xxxx represents
a UTF-16 code unit.

So what is a 'non-breaking space'? According to Wikipedia:

In word processing and digital typesetting, a non-breaking space (" "), also called no-break space, non-breakable space (NBSP),
hard space, or fixed space,[1] is a space character that prevents an automatic line break at its position. In some formats,
including HTML, it also prevents consecutive whitespace characters from collapsing into a single space.

In the database world the ASCIISTR() Function can't translate it into an ASCII character so it returns a text string of the HEX representation of it, which is '\00A0'. Concatenate enough of those spaces into a single line and the line length expands, sometimes dramatically. When there are many of these in a line, paragraph, or block of text Oracle can throw the following errors from ASCIISTR():

ORA-24920: column size too large for client. 
ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: %s, maximum: %s)

Reports can fail because of those errors, and to the user, the cause is unknown because these 'non-breaking spaces' look like regular spaces.

Can this issue be fixed? Yes, but it may not be a really simple fix, depending upon the configured database block size. For block sizes of 8192 or less the string will need to be 'walked' in 'chunks' to update it completely. If the block size is 16K, successful single-statement updates are possible. Looking at an example that searches the first 2400 characters of a string for '\00A0' the PL/SQL block might look like this:

        v_chunk         number:=2400;
        v_start_pos     number:=1;
        v_next_pos      number:=0;
        v_passes        number:=0;
	-- Set the number of passes through the loop
        select round(length(col)/v_chunk,0) into v_passes from mytable;
	-- Execute the loop, walking the string in pre-defined blocks
        for i in 1..v_passes loop
                v_next_pos := v_start_pos + v_chunk;
                update mytable
                set col = replace(asciistr(substr(col, v_start_pos, v_chunk)), 'Z','')
                where id in
                        (select id from mytable where asciistr(substr(col, v_start_pos, v_chunk)) like '%Z%');
        -- Set the new starting position

                v_start_pos := v_next_pos;
        end loop;
        -- Commit the changes

Feel free to use this code, making any changes to table and column names or chunk size.

The above script works because it calculates the number of 'steps' it will take to update the entire string, based on the string length and the 'chunk' size. This determines the number of steps required to 'walk' the entire string. The loop itself calculates the starting position and positions the 'pointer' where it needs to be to pick up from the previous update. Eventually the entire string is 'walked' and any errant characters replaced.

Of course, not everyone will have need of this script; this problem arises when the copy and paste operations described above are in use at insurance companies, medical offices, law offices and others who regularly insert text into a VARCHAR2 or CLOB column that is found online or in word processing/spreadsheet/PDF files.

Although this may seem like a remote issue, to some it can be a problem on a daily, weekly, or monthly basis depending upon when reports where this character is liberally sprinkled are generated. It's nice to know, though, that if this problem ever does appear there's a possible solution handy.

See all articles by David Fitzjarrell

Oracle Archives

Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM