Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted Aug 19, 2009

Compressing files in Oracle

By Steve Callan

Introduced in Oracle 10g, the UTL_COMPRESS package can be used to compress and uncompress large objects (raw, blob or bfile). For all practical purposes, you can think of UTL_COMPRESS as PL/SQL’s means of compressing or zipping files. When looking at the various procedures and functions, many are prefixed with LZ. The algorithm used to perform the compression is based on the Lempel-Ziv algorithm, and when used, the code is based on the LZW or Lempel-Ziv-Welch implementation.

Oracle’s documentation for this package is sparse in terms of showing a full-scale example. Sparse isn’t even the correct word here. Nonexistent is a better descriptor. A MetaLink note (249974.1) shows two examples, and various pages on the Internet can be searched.

The actual implementation of this package is pretty easy to use once you see an example. Many of the examples shown elsewhere include a length comparison among the input, the compressed, and the uncompressed lengths. Ideally, the input and uncompressed lengths should be the same. You may find that the compressed length is longer than the input length. This occurs when the input length is small or short. The overhead of building a dictionary can make the compressed length longer than the input length.

In this simple example, we’ll create our own input, compress it, uncompress it, and evaluate the lengths of each.

SET SERVEROUTPUT ON
DECLARE
l_in_blob            BLOB;
l_compressed_blob    BLOB;
l_uncompressed_blob  BLOB;
BEGIN
-- Set some values
l_in_blob     := TO_BLOB(UTL_RAW.CAST_TO_RAW
('This is a long string of words used for this example'));
l_compressed_blob   := TO_BLOB('0');
l_uncompressed_blob := TO_BLOB('0');

-- Compress the string
UTL_COMPRESS.lz_compress 
(src => l_in_blob, dst => l_compressed_blob);

-- Uncompress the string
UTL_COMPRESS.lz_uncompress 
(src => l_compressed_blob, dst => l_uncompressed_blob);

-- Compare the results with the input
DBMS_OUTPUT.put_line('Input length is    : 
	' || LENGTH(l_in_blob));
DBMS_OUTPUT.put_line('Compressed length  : 
	' || LENGTH(l_compressed_blob));
DBMS_OUTPUT.put_line('Uncompressed length: 
	' || LENGTH(l_uncompressed_blob));

-- Caller responsibility to free up temporary LOBs
-- See Operational Notes in the documentation
DBMS_LOB.FREETEMPORARY(l_in_blob);
DBMS_LOB.FREETEMPORARY(l_compressed_blob);
DBMS_LOB.FREETEMPORARY(l_uncompressed_blob);
END;
/
Input length is    : 52
Compressed length  : 67
Uncompressed length: 52

PL/SQL procedure successfully completed.

Note that the compressed length is longer than the input length. Adding a few more characters to the input string yields the following.

SET SERVEROUTPUT ON
DECLARE
l_in_blob            BLOB;
l_compressed_blob    BLOB;
l_uncompressed_blob  BLOB;
BEGIN
-- Set some values
l_in_blob     := TO_BLOB(UTL_RAW.CAST_TO_RAW
('This is a long string of words used for this example.
Now is the time for all good men to come to the aid of their country'));
l_compressed_blob   := TO_BLOB('0');
l_uncompressed_blob := TO_BLOB('0');

-- Compress the string
UTL_COMPRESS.lz_compress 
(src => l_in_blob, dst => l_compressed_blob);

-- Uncompress the string
UTL_COMPRESS.lz_uncompress 
(src => l_compressed_blob, dst => l_uncompressed_blob);

-- Compare the results with the input
DBMS_OUTPUT.put_line('Input length is    : ' || LENGTH(l_in_blob));
DBMS_OUTPUT.put_line('Compressed length  : ' || LENGTH(l_compressed_blob));
DBMS_OUTPUT.put_line('Uncompressed length: ' || LENGTH(l_uncompressed_blob));

-- Caller responsibility to free up temporary LOBs
-- See Operational Notes in the documentation
DBMS_LOB.FREETEMPORARY(l_in_blob);
DBMS_LOB.FREETEMPORARY(l_compressed_blob);
DBMS_LOB.FREETEMPORARY(l_uncompressed_blob);
END;
/

Input length is    : 122
Compressed length  : 113
Uncompressed length: 122

PL/SQL procedure successfully completed.

That little bit extra for the input string pushed us over the top in terms of having the compression take any real effect.

Note that the LZ_COMPRESS subprogram is overloaded (more than one signature method can be used to invoke it). The “quality” parameter is set to a default of 6. This parameter provides a trade-off between speed of compress and quality of compression. It takes quite a few more words or length of input before you see a difference in what the quality input does.

As an example, we can take the text about quality in the documentation and use that as the input string, and vary the quality from 1 to 9. If you use this block of text as an example, be sure to remove the single quotes around quality near the end.

Quality is an optional compression tuning value. It allows the UTL_COMPRESS user to choose between speed and compression quality, meaning the percentage of reduction in size. A faster compression speed will result in less compression of the data. A slower compression speed will result in more compression of the data. Valid values are [1..9], with 1=fastest and 9=slowest. The default 'quality' value is 6.

At a setting of 1, the compressed length is 248, and at 9, the length becomes 245. The strength or utility of using UTL_COMPRESS comes into play when dealing with truly large objects and not just experimenting with contrived strings. For this use case, we can use a document, but then, we’ll also need a table.

The setup steps are to create a table to hold the BLOB, create a directory with read/write for the user, put a file into the directory, initialize the record, insert a BLOB, compress it, and compare the lengths.

set serveroutput on
create table compress_blob (indx integer, y blob);
create directory MYDIR as 'C:\temp';
--We assume the user doing this will have read/write on MYDIR
--Copy a file into the directory, e.g., A_57KB_Word_doc.doc
--This block will take care of the insert for you, or you
--could create a separate procedure to do this
--This is simple code, does not address other PL/SQL errors
DECLARE
  ablob blob;
  abfile bfile := bfilename('MYDIR', 'A_57KB_Word_doc.doc');  
  -- Gets a pointer to the file.
  a_compressed_blob blob;
  amount  integer;
  asize   integer;
  quality integer := 9;
  cursor blob_cur is select * from compress_blob;
BEGIN
--
-- compress_blob table is initialized with one record because  
-- the PL/SQL BLOB locator (ablob) must point to a specific 
-- EXISTING NON-NULL database BLOB.  
--
-- initialize the blob locator
  insert into compress_blob values (1, empty_blob());
  select y into ablob from compress_blob where indx = 1;

  -- open the bfile and get the initial file size
  dbms_lob.fileopen(abfile);
  asize := dbms_lob.getlength(abfile);
  dbms_output.put_line('Size of input file: ' || asize);

  -- load the file and get the size
  dbms_lob.loadfromfile(ablob, abfile, asize);
  dbms_output.put_line('After loadfromfile');
  asize := dbms_lob.getlength(ablob);
  dbms_output.put_line('Size of blob: ' || asize);

  -- compress the blob
  -- you can experiment with varying the quality
  a_compressed_blob := utl_compress.lz_compress(ablob, quality);

  -- insert the compressed blob
  insert into compress_blob values (2, a_compressed_blob);

  -- compare the sizes of the blobs in the table 
  dbms_output.put_line
  ('Sizes before and after insertion/compression -->');
  for c1_rec in blob_cur
    loop
    asize := dbms_lob.getlength(c1_rec.y);
    dbms_output.put_line(asize);
  end loop;
end;
/
Size of input file: 57856
After loadfromfile
Size of blob: 57856
Sizes before and after insertion/compression -->
18722
57856

When using a quality value of 1, the compressed size was 20868, so a value of 9 represents a 10% or so improvement. As file sizes increase, so will the amount of compression.

For production-type code, you will want to handle errors such as TOO_MANY_ROWS, NO_DATA_FOUND and DUP_VAL_ON_INDEX. For BLOB-related errors, a set of exception handlers (as shown in the documentation) such as follows could be packaged up for re-use.

EXCEPTION
  when UTL_COMPRESS.INVALID_ARGUMENT then
  dbms_output.put_line('An argument was an invalid type or value.');

  when UTL_COMPRESS.BUFFER_TOO_SMALL then
  dbms_output.put_line('Compressed representation is too big.');

  when UTL_COMPRESS.DATA_ERROR then
  dbms_output.put_line('Input or output data stream has invalid format.');

  when UTL_COMPRESS.STREAM_ERROR then
  dbms_output.put_line('Error in compression/uncompression of data stream');

  when others then
  dbms_output.put_line('An exception occurred');
  dbms_output.put_line(sqlcode || sqlerrm);

Pre-processing checks such as checking if open (then closing) and vice versa could also be incorporated. Don’t forget to use DBMS_LOB.FREETEMPORARY to release locked items (including the source file).

In Closing

Armed with a couple of examples, you should be able to take this code and modify it for your use. One of the benefits of being able to compress and uncompress from within Oracle is that file manipulation can be stored in one place as opposed to having to bounce back and forth between a shell script and a SQL*Plus session.

» See All Articles by Columnist Steve Callan



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM


















Thanks for your registration, follow us on our social networks to keep up-to-date