The Dictionary in the Data Dictionary

Today’s journey takes on a
trip into Oracle’s data dictionary, specifically, the dictionary part or aspect
of it. It’s interesting and informative to take a peak under the covers and see
how Oracle, (the product, that is), is put together. Aside from trying to
reverse engineer things, looking at the underlying components presents an “If
Oracle did it this way, maybe I can too” learning paradigm. For example,
normalization is a huge part of database design, so it’s fair to ask (or look,
in this case) how well the SYS tables are normalized. Much, and practically all
for that matter, of what you query from or against in the data dictionary is
based on views, and as we all know, views are usually based on tables. What is
in those tables and how does it all fit together?

DICTIONARY and DICT

The DICTIONARY view contains
metadata about other data dictionary items. Most people familiar with Oracle’s
data dictionary will say that the data dictionary is based entirely on views on
tables owned by SYS. Well, yes, but to be more precise (using the DICTIONARY
“view” in this case), that’s not entirely correct. Instead of being a view, to
you as a member of PUBLIC, DICTIONARY from your perspective is a synonym. This
synonym points to the SYS-owned view named DICTIONARY, and by the order in
which Oracle resolves
an object name, a public synonym is evaluated before a table or view. If
you were to query against SYS.DICTIONARY, you would be accessing the view named
DICTIONARY owned by SYS.

Closely related to
DICTIONARY is a public synonym named DICT. This synonym is nothing more than a
shortcut name for DICTIONARY. Can you think of another shortcut name of a view
in the data dictionary? The TAB synonym is a pointer to part of what is in the
DBA/ALL/USER_TABLES views. So DICT or DICTIONARY, either one suffices as both
ultimately are derived from several SYS-owned tables.

The columns of DICTIONARY
are table_name and comments. In Oracle 10g R2, there are 659 records, of which,
interestingly enough, 91 have no comments. What good is a dictionary without
words? Almost all table name-only records are from the DBA/ALL/USER family
(just an observation, no real reason why they’re from this category).

Where do the comments come
from? This is where we start looking at the underlying tables. The source code
behind this view can be found in the catalog.sql script. The top part of the
create view statement is shown below.

remark
remark  VIEW “DICTIONARY”
remark  Online documentation for data dictionary tables and views.
remark  This view exists outside of the family schema.
remark
/* Find the names of public synonyms for views owned by SYS that
have names different from the synonym name.  This allows the user
to see the short-hand synonyms we have created.
*/
create or replace view DICTIONARY
    (TABLE_NAME, COMMENTS)
as
select o.name, c.comment$
from sys.obj$ o, sys.com$ c
where o.obj# = c.obj#(+)
  and c.col# is null
  and o.owner# = 0
  and o.type# = 4
  …continues…

The OBJ$ table (owned by
SYS) is like the uber table of the data dictionary. Every object has an ID
number, an owner number, and a type number (among other attributes). If OBJ#
were a primary key, we would expect to see primary key type of information
about that column. As it turns out, OBJ# is not the primary key per a “this
column is defined to the primary key for this table.” What does a primary key
have in terms of other constraints? OBJ# does have a unique index and a not
null check constraint, so that must be close enough from Oracle’s perspective.

The OBJ$ table also has
another interesting index related characteristic, and that is the index named
I_OBJ2. This index uses six columns, which is fairly rare in terms of the
number of columns involved. The vast majority of indexes I see are limited to
one, two or three columns, with the exception of some data warehouse indexes
with five or six columns. All of the objects users create must be registered in
this table, so this is a table that will continue to grow. How and when are
statistics created for the table? That depends on the version of Oracle you’re
using. From older to newer, the advice or guidance has ranged from “don’t do it”
to “have at it.”

The difference between TYPE#
and OWNER# is that the object types are known ahead of time. Oracle only has so
many types, so that information is static, whereas owners (aside from the
internal or fixed type users of Oracle) come and go. With this difference in
mind, we’d expect to see a lookup table that relates a textual name versus an
owner number. The USER$ table serves this function.

Is there a lookup table for
types? We see what the types are in object tables (the object_type column), but
internally, at least as far as the label versus ID number, Oracle does not have
this. So where does the type name come from? The DECODE of type as seen in the
view creation statements for the _OBJECTS views is what produces the text
labels we see.

decode(o.type#, 0, ‘NEXT OBJECT’, 1, ‘INDEX’, 2, ‘TABLE’, 3, ‘CLUSTER’,
    4, ‘VIEW’, 5, ‘SYNONYM’, 6, ‘SEQUENCE’,
    7, ‘PROCEDURE’, 8, ‘FUNCTION’, 9, ‘PACKAGE’,
    11, ‘PACKAGE BODY’, 12, ‘TRIGGER’,
    13, ‘TYPE’, 14, ‘TYPE BODY’,
    19, ‘TABLE PARTITION’, 20, ‘INDEX PARTITION’, 21, ‘LOB’,
    22, ‘LIBRARY’, 23, ‘DIRECTORY’, 24, ‘QUEUE’,
    28, ‘JAVA SOURCE’, 29, ‘JAVA CLASS’, 30, ‘JAVA RESOURCE’,
    32, ‘INDEXTYPE’, 33, ‘OPERATOR’,
    34, ‘TABLE SUBPARTITION’, 35, ‘INDEX SUBPARTITION’,
    40, ‘LOB PARTITION’, 41, ‘LOB SUBPARTITION’,
    42, NVL((SELECT distinct ‘REWRITE EQUIVALENCE’
             FROM sum$ s
             WHERE s.obj#=o.obj#
                   and bitand(s.xpflags, 8388608) = 8388608),
            ‘MATERIALIZED VIEW’),
    43, ‘DIMENSION’,
    44, ‘CONTEXT’, 46, ‘RULE SET’, 47, ‘RESOURCE PLAN’,
    48, ‘CONSUMER GROUP’,
    51, ‘SUBSCRIPTION’, 52, ‘LOCATION’,
    55, ‘XML SCHEMA’, 56, ‘JAVA DATA’,
    57, ‘SECURITY PROFILE’, 59, ‘RULE’,
    60, ‘CAPTURE’, 61, ‘APPLY’,
    62, ‘EVALUATION CONTEXT’,
    66, ‘JOB’, 67, ‘PROGRAM’, 68, ‘JOB CLASS’, 69, ‘WINDOW’,
    72, ‘WINDOW GROUP’, 74, ‘SCHEDULE’, 79, ‘CHAIN’,
    81, ‘FILE GROUP’,
   ‘UNDEFINED’),

Going back to the question
posed earlier about where the comments come from, we can see that a table name
COM$ is referenced in the create dictionary view statement. The COM$ table
consists of OBJ#, COL# and a COMMENT$ columns. In a higher level
normalized form, we’d expect to see a table with OBJ#, COL# and a
comment number or ID, along with a table that then matches the comment number
versus text string or description.

The screenshot below shows
how Oracle (at least this table, but there are others like it) did not go to
third normal form. How many places does a string like “A date function used to
compute the NEXT_DATE” occur? Another string, “Name of the object,” appears 85
times. Doesn’t that clearly violate what every book on database design and
normalization talks about?

Hacking the Data Dictionary

Perhaps you’d be interested
in hacking the data dictionary and adding your own comments into the COM$
table. I wouldn’t recommend doing that (who would?), so if you wanted to fill
in the missing comments, it would be better to create another view that used a
support table to supply that information (and add your own). If you were going
to do it (using DUAL as an example), find the object number for DUAL and then
use that value for the insert into the COM$ table.

SQL> select obj#, owner#, name
from obj$ where name = ‘DUAL’;
      OBJ#     OWNER# NAME
———- ———- ——
       258          0 DUAL
       259          1 DUAL
      9635         24 DUAL
      9857         25 DUAL
     40433         34 DUAL
     40682         35 DUAL
     41037         36 DUAL
     41990         38 DUAL
     42756         43 DUAL
     45858         46 DUAL
     47876         47 DUAL
     50711         51 DUAL
     52789         54 DUAL
     54234         55 DUAL
     52511         56 DUAL
     54804         63 DUAL
16 rows selected.

Note that the DUAL table
appears more than once, so be sure to tag the one owned by SYS (the owner
number for SYS is zero). When inserting into the COM$ table, don’t include a COL#. The
statement that creates the DICTIONARY view matches on where the COL# number
is null. The “@dict” script (mine) just selects from DICTIONARY where the table
name is equal to what was entered.

SQL> insert into com$ (obj#,comment$)
values (258,’This the dual table’);
1 row created.
SQL> @dict
Enter value for table: dual
TABLE_NAME                     COMMENTS
—————————— ——————–
DUAL                           This the dual table

The views with missing
comments are relatively unimportant, so performing this hack using existing
objects is certainly not essential. As for adding new lines to the DICTIONARY
view via inserts into the COM$ table, as long as you’re using existing object
numbers, you’re probably safe. Keep track of what you’ve inserted because if
the data dictionary were to be rebuilt, your work is likely to get
overwritten/deleted.

In Closing

Overall, it looks like the data
dictionary, specifically, the SYS schema, is a lot like schemas you encounter
in the workplace, that is, some objects were created “following the rules,” and
some were not. Are queries against the data dictionary faster because there
aren’t lookup tables? Or, is the lack of a lookup table for comments more of a
convenience for the engineers who work on the SYS schema tables? Regardless, at
least now you can reverse engineer how the dictionary was put together.

»

See All Articles by Columnist Steve Callan

The Dictionary in the Data Dictionary

DICTIONARY and DICT

Hacking the Data Dictionary

In Closing

Steve Callan

Company

Categories