Storing Word Documents in Oracle

Have you ever wondered about
storing documents into your Oracle database and just didn’t know where to
start? Here is a quick introduction to the basics you need to know.

Manipulating
Oracle Files with UTL_FILE
showed you how to read the alert log
and do some manipulation on the file while it was external to the database. You
should review this article as it contains some background information you will
need to know, along with some explanation of some of the procedures in this
code that I will not go into here. The next logical extension to the last
article is the manipulation of external files, such as documents, and the
storage in the database. This article will take you through a brief overview of
the datatypes and procedures in order to store word documents within the
database.

The Datatypes

When talking about manipulating documents within a database,
there are only a few choices for a datatype that can handle a large document.
These large objects (LOBs) can use any one of the four datatypes depending on
the characteristics of the object you are storing. These large objects can be
in the form of text, graphics, video or audio.


Datatype


Description

BLOB

Used to store unstructured binary data up to 4G. This
datatype stores the full binary object in the database.

CLOB/NCLOB

Used to store up to 4G of character data. This datatype
stores the full character data in the database.

BFILE

Used to point at large objects that are external to the
database and in operating system files. The BFILE column also contains binary
data and cannot be selected.

Benefits of LOBs

It use to be that the largest object you could store in the
database was of the datatype LONG. Oracle has for the last few releases kept
telling us to convert our LONG datatypes to a LOB datatype (maybe they will
too). The reason for converting our LONGs to LOBs can be seen in this short
list of benefits.

  1. LOB
    columns can reach the size of 4G.

  2. You
    can store LOB data internally within a table or externally.

  3. You
    can perform random access to the LOB data.

  4. It
    is easier to do transformations on LOB columns.

  5. You
    can replicate the tables that contain LOB columns.

Create a Table to Store the Document

In order to store the documents into the database you must
obviously first create an object to store the information. Following is the DDL
to create the table MY_DOCS. You will notice that there is a holder for the
bfile location and a column (DOC_BLOB) to hold the document.


CREATE TABLE my_docs
(doc_id NUMBER,
bfile_loc BFILE,
doc_title VARCHAR2(255),
doc_blob BLOB DEFAULT EMPTY_BLOB() );

The Load Procedure

The load procedure takes as arguments the document name and
an id number for the document. The procedure will then prime a row for update
based on the document id, BFILE location and document name (which becomes the
document title). The procedure will then open internal and external BLOBs and
load the internal from the external. At this point, the document has been
loaded into the database table.

Code
Meaning

bfile_loc := BFILENAME('DOC_DIR', in_doc);

In order to load the document, you must first point to the
external object through a BFILE locator. The BFILENAME procedure takes a
directory location and the document name.

INSERT INTO
  my_docs (doc_id, bfile_loc, doc_title) 
  VALUES (1, bfile_loc, in_doc);

This statement is to prime the row into which the external
object will be inserted.

SELECT doc_blob INTO temp_blob 
  FROM my_docs WHERE doc_id = in_id
  FOR UPDATE;

Associate the temporary blob object to the table blob
object for updating.

DBMS_LOB.OPEN(bfile_loc, DBMS_LOB.LOB_READONLY);

Open the external blob object for reading.

DBMS_LOB.OPEN(temp_blob, DBMS_LOB.LOB_READWRITE);

Open the temporary blob object for reading and writing.

DBMS_LOB.LOADFROMFILE
  (temp_blob, bfile_loc, Bytes_to_load);

Copy the entire external blob object (BFILE) into the
internal temporary blob object.

The Search Procedure

The search procedure takes as arguments a document id and a
search string. The search procedure takes as arguments a document id and a search string. The procedure then converts the search string into raw format and places it into the variable named PATTERN. Once the variable PATTERN is populated, it is used for searching the loaded temporary BLOB DOC_BLOB to see if the particular pattern exists.

Pattern    := utl_raw.cast_to_raw(in_search);

Take the input search characters and convert them to raw
characters that can be used to search your document.

SELECT doc_blob INTO lob_doc
  FROM my_docs WHERE doc_id = in_id;

Put the document into a temporary BLOB for manipulation.

DBMS_LOB.OPEN (lob_doc, DBMS_LOB.LOB_READONLY);

Open the temporary BLOB for reading.

Position := DBMS_LOB.INSTR
  (lob_doc, Pattern, Offset, Occurrence);

Search the temporary BLOB for the supplied search string.
If it finds the pattern then the variable POSITION will be not 0.

How to Use the code.

The procedures that I have given you are very simplistic in
nature and are intended to be part of a larger application for managing
external documents within a database. They are intended to setup a directory
where your documents live, load the documents into a database, and then search
for string patterns in the document id provided. I personally can see you taking
out the reliance of supplying a document id and allowing the search to span
multiple documents within your library. Below I have given a brief description
on how to use the code as is but feel free to modify and integrate into your
own set of procedures.

How to Use

  1. log into your database of choice as the SYS user
  2. compile the package
	SQL> @mydocs.sql
  1. set serveroutput on
	SQL> set serveroutput on
  1. initial setup of directory object where your documents
    live
	SQL> exec mydocs.doc_dir_setup
  1. Check to make sure you can read one of your documents on
    disk.
	SQL> exec mydocs.list(Your Document Here.doc');
  1. Load your document into the database
	SQL> exec mydocs.load(Your Document Here.doc', 1);
  1. Search your documents for a string pattern
	SQL> exec mydocs.search(Search Pattern', 1);



The Code


CREATE OR REPLACE PACKAGE
mydocs
AS
PROCEDURE doc_dir_setup;
PROCEDURE list (in_doc IN VARCHAR2);
PROCEDURE load (in_doc IN VARCHAR2,
in_id IN NUMBER);
PROCEDURE search (in_search IN VARCHAR2,
in_id IN NUMBER);
END mydocs;
/
CREATE OR REPLACE PACKAGE BODY
mydocs
AS
vexists BOOLEAN;
vfile_length NUMBER;
vblocksize NUMBER;

PROCEDURE doc_dir_setup IS
BEGIN
EXECUTE IMMEDIATE
‘CREATE DIRECTORY DOC_DIR AS’||
”'”E:jkoopmannpublishdatabasejournalOracle””’;
END doc_dir_setup;

PROCEDURE list (in_doc IN VARCHAR2) IS
BEGIN
UTL_FILE.FGETATTR(‘DOC_DIR’,
in_doc,
vexists,
vfile_length,
vblocksize);
IF vexists THEN
dbms_output.put_line(in_doc||’ ‘||vfile_length);
END IF;
END list;

PROCEDURE load (in_doc IN VARCHAR2,
in_id IN NUMBER) IS
temp_blob BLOB := empty_blob();
bfile_loc BFILE;
Bytes_to_load INTEGER := 4294967295;
BEGIN
bfile_loc := BFILENAME(‘DOC_DIR’, in_doc);
INSERT INTO my_docs (doc_id, bfile_loc, doc_title)
VALUES (in_id, bfile_loc, in_doc);
SELECT doc_blob INTO temp_blob
FROM my_docs WHERE doc_id = in_id
FOR UPDATE;
DBMS_LOB.OPEN(bfile_loc, DBMS_LOB.LOB_READONLY);
DBMS_LOB.OPEN(temp_blob, DBMS_LOB.LOB_READWRITE);
DBMS_LOB.LOADFROMFILE(temp_blob, bfile_loc, Bytes_to_load);
DBMS_LOB.CLOSE(temp_blob);
DBMS_LOB.CLOSE(bfile_loc);
COMMIT;
END load;

PROCEDURE search (in_search VARCHAR2,
in_id NUMBER) IS
lob_doc BLOB;
Pattern VARCHAR2(30);
Position INTEGER := 0;
Offset INTEGER := 1;
Occurrence INTEGER := 1;
BEGIN
Pattern := utl_raw.cast_to_raw(in_search);
SELECT doc_blob INTO lob_doc
FROM my_docs WHERE doc_id = in_id;
DBMS_LOB.OPEN (lob_doc, DBMS_LOB.LOB_READONLY);
Position := DBMS_LOB.INSTR(lob_doc, Pattern, Offset, Occurrence);
IF Position = 0 THEN
DBMS_OUTPUT.PUT_LINE(‘Pattern not found’);
ELSE
DBMS_OUTPUT.PUT_LINE(‘The pattern occurs at ‘|| position);
END IF;
DBMS_LOB.CLOSE (lob_doc);
END search;

BEGIN
DBMS_OUTPUT.ENABLE(1000000);
END mydocs;
/

»


See All Articles by Columnist
James Koopmann

James Koopmann
James Koopmann
James Koopmann has fourteen years of database design, development and performance tuning experience. In addition, he has extensive database administration experience in Oracle and other relational databases in production environments, specializing in performance tuning of database engines and SQL based applications. Koopmann is an accomplished author with several technical papers in various Oracle related publications such as Oracle Magazine, Oracle Professional and SQL>UPDATE_RMOUG. He is a featured author and database expert for DatabaseJournal, a member of the editorial review committee for Select Journal (The Magazine for the International Oracle Users Group), an Oracle Certified Professional DBA and noted speaker at local Oracle User Groups around the country.

Latest Articles