Compressing files in Oracle

Introduced in Oracle 10g,
the UTL_COMPRESS package can be used to compress and uncompress large objects
(raw, blob or bfile). For all practical purposes, you can think of UTL_COMPRESS
as PL/SQL’s means of compressing or zipping files. When looking at the various
procedures and functions, many are prefixed with LZ. The algorithm used to
perform the compression is based on the Lempel-Ziv algorithm, and when used,
the code is based on the LZW or Lempel-Ziv-Welch implementation.

Oracle’s documentation for
this package is sparse in terms of showing a full-scale example. Sparse isn’t
even the correct word here. Nonexistent is a better descriptor. A MetaLink note
(249974.1) shows two examples, and various pages on the Internet can be
searched.

The actual implementation of
this package is pretty easy to use once you see an example. Many of the
examples shown elsewhere include a length comparison among the input, the
compressed, and the uncompressed lengths. Ideally, the input and uncompressed
lengths should be the same. You may find that the compressed length is longer
than the input length. This occurs when the input length is small or short. The
overhead of building a dictionary can make the compressed length longer than the
input length.

In this simple example, we’ll
create our own input, compress it, uncompress it, and evaluate the lengths of
each.

SET SERVEROUTPUT ON
DECLARE
l_in_blob            BLOB;
l_compressed_blob    BLOB;
l_uncompressed_blob  BLOB;
BEGIN
— Set some values
l_in_blob     := TO_BLOB(UTL_RAW.CAST_TO_RAW
(‘This is a long string of words used for this example’));
l_compressed_blob   := TO_BLOB(‘0’);
l_uncompressed_blob := TO_BLOB(‘0’);
— Compress the string
UTL_COMPRESS.lz_compress
(src => l_in_blob, dst => l_compressed_blob);
— Uncompress the string
UTL_COMPRESS.lz_uncompress
(src => l_compressed_blob, dst => l_uncompressed_blob);
— Compare the results with the input
DBMS_OUTPUT.put_line(‘Input length is    :
	‘ || LENGTH(l_in_blob));
DBMS_OUTPUT.put_line(‘Compressed length  :
	‘ || LENGTH(l_compressed_blob));
DBMS_OUTPUT.put_line(‘Uncompressed length:
	‘ || LENGTH(l_uncompressed_blob));
— Caller responsibility to free up temporary LOBs
— See Operational Notes in the documentation
DBMS_LOB.FREETEMPORARY(l_in_blob);
DBMS_LOB.FREETEMPORARY(l_compressed_blob);
DBMS_LOB.FREETEMPORARY(l_uncompressed_blob);
END;
/
Input length is    : 52
Compressed length  : 67
Uncompressed length: 52
PL/SQL procedure successfully completed.

Note that the compressed length
is longer than the input length. Adding a few more characters to the input
string yields the following.

SET SERVEROUTPUT ON
DECLARE
l_in_blob            BLOB;
l_compressed_blob    BLOB;
l_uncompressed_blob  BLOB;
BEGIN
— Set some values
l_in_blob     := TO_BLOB(UTL_RAW.CAST_TO_RAW
(‘This is a long string of words used for this example.
Now is the time for all good men to come to the aid of their country’));
l_compressed_blob   := TO_BLOB(‘0’);
l_uncompressed_blob := TO_BLOB(‘0’);
— Compress the string
UTL_COMPRESS.lz_compress
(src => l_in_blob, dst => l_compressed_blob);
— Uncompress the string
UTL_COMPRESS.lz_uncompress
(src => l_compressed_blob, dst => l_uncompressed_blob);
— Compare the results with the input
DBMS_OUTPUT.put_line(‘Input length is    : ‘ || LENGTH(l_in_blob));
DBMS_OUTPUT.put_line(‘Compressed length  : ‘ || LENGTH(l_compressed_blob));
DBMS_OUTPUT.put_line(‘Uncompressed length: ‘ || LENGTH(l_uncompressed_blob));
— Caller responsibility to free up temporary LOBs
— See Operational Notes in the documentation
DBMS_LOB.FREETEMPORARY(l_in_blob);
DBMS_LOB.FREETEMPORARY(l_compressed_blob);
DBMS_LOB.FREETEMPORARY(l_uncompressed_blob);
END;
/
Input length is    : 122
Compressed length  : 113
Uncompressed length: 122
PL/SQL procedure successfully completed.

That little bit extra for
the input string pushed us over the top in terms of having the compression take
any real effect.

Note that the LZ_COMPRESS
subprogram is overloaded (more than one signature method can be used to invoke
it). The “quality” parameter is set to a default of 6. This parameter provides
a trade-off between speed of compress and quality of compression. It takes
quite a few more words or length of input before you see a difference in what
the quality input does.

As an example, we can take
the text about quality in the documentation and use that as the input string,
and vary the quality from 1 to 9. If you use this block of text as an example,
be sure to remove the single quotes around quality near the end.

Quality is an optional
compression tuning value. It allows the UTL_COMPRESS user to choose between
speed and compression quality, meaning the percentage of reduction in size. A
faster compression speed will result in less compression of the data. A slower
compression speed will result in more compression of the data. Valid values are
[1..9], with 1=fastest and 9=slowest. The default ‘quality’ value is 6.

At a setting of 1, the
compressed length is 248, and at 9, the length becomes 245. The strength or utility
of using UTL_COMPRESS comes into play when dealing with truly large objects and
not just experimenting with contrived strings. For this use case, we can use a
document, but then, we’ll also need a table.

The setup steps are to
create a table to hold the BLOB, create a directory with read/write for the
user, put a file into the directory, initialize the record, insert a BLOB,
compress it, and compare the lengths.

set serveroutput on
create table compress_blob (indx integer, y blob);
create directory MYDIR as ‘C:\temp’;
–We assume the user doing this will have read/write on MYDIR
–Copy a file into the directory, e.g., A_57KB_Word_doc.doc
–This block will take care of the insert for you, or you
–could create a separate procedure to do this
–This is simple code, does not address other PL/SQL errors
DECLARE
  ablob blob;
  abfile bfile := bfilename(‘MYDIR’, ‘A_57KB_Word_doc.doc’);
  — Gets a pointer to the file.
  a_compressed_blob blob;
  amount  integer;
  asize   integer;
  quality integer := 9;
  cursor blob_cur is select * from compress_blob;
BEGIN
—
— compress_blob table is initialized with one record because
— the PL/SQL BLOB locator (ablob) must point to a specific
— EXISTING NON-NULL database BLOB.
—
— initialize the blob locator
  insert into compress_blob values (1, empty_blob());
  select y into ablob from compress_blob where indx = 1;
  — open the bfile and get the initial file size
  dbms_lob.fileopen(abfile);
  asize := dbms_lob.getlength(abfile);
  dbms_output.put_line(‘Size of input file: ‘ || asize);
  — load the file and get the size
  dbms_lob.loadfromfile(ablob, abfile, asize);
  dbms_output.put_line(‘After loadfromfile’);
  asize := dbms_lob.getlength(ablob);
  dbms_output.put_line(‘Size of blob: ‘ || asize);
  — compress the blob
  — you can experiment with varying the quality
  a_compressed_blob := utl_compress.lz_compress(ablob, quality);
  — insert the compressed blob
  insert into compress_blob values (2, a_compressed_blob);
  — compare the sizes of the blobs in the table
  dbms_output.put_line
  (‘Sizes before and after insertion/compression –>’);
  for c1_rec in blob_cur
    loop
    asize := dbms_lob.getlength(c1_rec.y);
    dbms_output.put_line(asize);
  end loop;
end;
/
Size of input file: 57856
After loadfromfile
Size of blob: 57856
Sizes before and after insertion/compression –>
18722
57856

When using a quality value
of 1, the compressed size was 20868, so a value of 9 represents a 10% or so
improvement. As file sizes increase, so will the amount of compression.

For production-type code,
you will want to handle errors such as TOO_MANY_ROWS, NO_DATA_FOUND and
DUP_VAL_ON_INDEX. For BLOB-related errors, a set of exception handlers (as
shown in the documentation) such as follows could be packaged up for re-use.

EXCEPTION
  when UTL_COMPRESS.INVALID_ARGUMENT then
  dbms_output.put_line(‘An argument was an invalid type or value.’);
  when UTL_COMPRESS.BUFFER_TOO_SMALL then
  dbms_output.put_line(‘Compressed representation is too big.’);
  when UTL_COMPRESS.DATA_ERROR then
  dbms_output.put_line(‘Input or output data stream has invalid format.’);
  when UTL_COMPRESS.STREAM_ERROR then
  dbms_output.put_line(‘Error in compression/uncompression of data stream’);
  when others then
  dbms_output.put_line(‘An exception occurred’);
  dbms_output.put_line(sqlcode || sqlerrm);

Pre-processing checks such
as checking if open (then closing) and vice versa could also be incorporated. Don’t
forget to use DBMS_LOB.FREETEMPORARY to release locked items (including the
source file).

In Closing

Armed with a couple of
examples, you should be able to take this code and modify it for your use. One
of the benefits of being able to compress and uncompress from within Oracle is
that file manipulation can be stored in one place as opposed to having to bounce
back and forth between a shell script and a SQL*Plus session.

»

See All Articles by Columnist Steve Callan

In Closing

Steve Callan

Company

Categories