Storing Word Documents in Oracle
February 5, 2004
Have you ever wondered about storing documents into your Oracle database and just didn't know where to start? Here is a quick introduction to the basics you need to know.
Manipulating Oracle Files with UTL_FILE showed you how to read the alert log and do some manipulation on the file while it was external to the database. You should review this article as it contains some background information you will need to know, along with some explanation of some of the procedures in this code that I will not go into here. The next logical extension to the last article is the manipulation of external files, such as documents, and the storage in the database. This article will take you through a brief overview of the datatypes and procedures in order to store word documents within the database.
When talking about manipulating documents within a database, there are only a few choices for a datatype that can handle a large document. These large objects (LOBs) can use any one of the four datatypes depending on the characteristics of the object you are storing. These large objects can be in the form of text, graphics, video or audio.
Benefits of LOBs
It use to be that the largest object you could store in the database was of the datatype LONG. Oracle has for the last few releases kept telling us to convert our LONG datatypes to a LOB datatype (maybe they will too). The reason for converting our LONGs to LOBs can be seen in this short list of benefits.
Create a Table to Store the Document
In order to store the documents into the database you must obviously first create an object to store the information. Following is the DDL to create the table MY_DOCS. You will notice that there is a holder for the bfile location and a column (DOC_BLOB) to hold the document.
CREATE TABLE my_docs (doc_id NUMBER, bfile_loc BFILE, doc_title VARCHAR2(255), doc_blob BLOB DEFAULT EMPTY_BLOB() );
The Load Procedure
The load procedure takes as arguments the document name and an id number for the document. The procedure will then prime a row for update based on the document id, BFILE location and document name (which becomes the document title). The procedure will then open internal and external BLOBs and load the internal from the external. At this point, the document has been loaded into the database table.
The Search Procedure
The search procedure takes as arguments a document id and a search string. The search procedure takes as arguments a document id and a search string. The procedure then converts the search string into raw format and places it into the variable named PATTERN. Once the variable PATTERN is populated, it is used for searching the loaded temporary BLOB DOC_BLOB to see if the particular pattern exists.
How to Use the code.
The procedures that I have given you are very simplistic in nature and are intended to be part of a larger application for managing external documents within a database. They are intended to setup a directory where your documents live, load the documents into a database, and then search for string patterns in the document id provided. I personally can see you taking out the reliance of supplying a document id and allowing the search to span multiple documents within your library. Below I have given a brief description on how to use the code as is but feel free to modify and integrate into your own set of procedures.
How to Use
SQL> set serveroutput on
SQL> exec mydocs.doc_dir_setup
SQL> exec mydocs.list(Your Document Here.doc');
SQL> exec mydocs.load(Your Document Here.doc', 1);
SQL> exec mydocs.search(Search Pattern', 1);
CREATE OR REPLACE PACKAGE mydocs AS PROCEDURE doc_dir_setup; PROCEDURE list (in_doc IN VARCHAR2); PROCEDURE load (in_doc IN VARCHAR2, in_id IN NUMBER); PROCEDURE search (in_search IN VARCHAR2, in_id IN NUMBER); END mydocs; / CREATE OR REPLACE PACKAGE BODY mydocs AS vexists BOOLEAN; vfile_length NUMBER; vblocksize NUMBER; PROCEDURE doc_dir_setup IS BEGIN EXECUTE IMMEDIATE 'CREATE DIRECTORY DOC_DIR AS'|| '''"E:\jkoopmann\publish\databasejournal\Oracle"'''; END doc_dir_setup; PROCEDURE list (in_doc IN VARCHAR2) IS BEGIN UTL_FILE.FGETATTR('DOC_DIR', in_doc, vexists, vfile_length, vblocksize); IF vexists THEN dbms_output.put_line(in_doc||' '||vfile_length); END IF; END list; PROCEDURE load (in_doc IN VARCHAR2, in_id IN NUMBER) IS temp_blob BLOB := empty_blob(); bfile_loc BFILE; Bytes_to_load INTEGER := 4294967295; BEGIN bfile_loc := BFILENAME('DOC_DIR', in_doc); INSERT INTO my_docs (doc_id, bfile_loc, doc_title) VALUES (in_id, bfile_loc, in_doc); SELECT doc_blob INTO temp_blob FROM my_docs WHERE doc_id = in_id FOR UPDATE; DBMS_LOB.OPEN(bfile_loc, DBMS_LOB.LOB_READONLY); DBMS_LOB.OPEN(temp_blob, DBMS_LOB.LOB_READWRITE); DBMS_LOB.LOADFROMFILE(temp_blob, bfile_loc, Bytes_to_load); DBMS_LOB.CLOSE(temp_blob); DBMS_LOB.CLOSE(bfile_loc); COMMIT; END load; PROCEDURE search (in_search VARCHAR2, in_id NUMBER) IS lob_doc BLOB; Pattern VARCHAR2(30); Position INTEGER := 0; Offset INTEGER := 1; Occurrence INTEGER := 1; BEGIN Pattern := utl_raw.cast_to_raw(in_search); SELECT doc_blob INTO lob_doc FROM my_docs WHERE doc_id = in_id; DBMS_LOB.OPEN (lob_doc, DBMS_LOB.LOB_READONLY); Position := DBMS_LOB.INSTR(lob_doc, Pattern, Offset, Occurrence); IF Position = 0 THEN DBMS_OUTPUT.PUT_LINE('Pattern not found'); ELSE DBMS_OUTPUT.PUT_LINE('The pattern occurs at '|| position); END IF; DBMS_LOB.CLOSE (lob_doc); END search; BEGIN DBMS_OUTPUT.ENABLE(1000000); END mydocs; /