The Dictionary in the Data Dictionary
September 25, 2007
Todays journey takes on a trip into Oracles data dictionary, specifically, the dictionary part or aspect of it. Its interesting and informative to take a peak under the covers and see how Oracle, (the product, that is), is put together. Aside from trying to reverse engineer things, looking at the underlying components presents an If Oracle did it this way, maybe I can too learning paradigm. For example, normalization is a huge part of database design, so its fair to ask (or look, in this case) how well the SYS tables are normalized. Much, and practically all for that matter, of what you query from or against in the data dictionary is based on views, and as we all know, views are usually based on tables. What is in those tables and how does it all fit together?
DICTIONARY and DICT
The DICTIONARY view contains metadata about other data dictionary items. Most people familiar with Oracles data dictionary will say that the data dictionary is based entirely on views on tables owned by SYS. Well, yes, but to be more precise (using the DICTIONARY view in this case), thats not entirely correct. Instead of being a view, to you as a member of PUBLIC, DICTIONARY from your perspective is a synonym. This synonym points to the SYS-owned view named DICTIONARY, and by the order in which Oracle resolves an object name, a public synonym is evaluated before a table or view. If you were to query against SYS.DICTIONARY, you would be accessing the view named DICTIONARY owned by SYS.
Closely related to DICTIONARY is a public synonym named DICT. This synonym is nothing more than a shortcut name for DICTIONARY. Can you think of another shortcut name of a view in the data dictionary? The TAB synonym is a pointer to part of what is in the DBA/ALL/USER_TABLES views. So DICT or DICTIONARY, either one suffices as both ultimately are derived from several SYS-owned tables.
The columns of DICTIONARY are table_name and comments. In Oracle 10g R2, there are 659 records, of which, interestingly enough, 91 have no comments. What good is a dictionary without words? Almost all table name-only records are from the DBA/ALL/USER family (just an observation, no real reason why theyre from this category).
Where do the comments come from? This is where we start looking at the underlying tables. The source code behind this view can be found in the catalog.sql script. The top part of the create view statement is shown below.
remark remark VIEW "DICTIONARY" remark Online documentation for data dictionary tables and views. remark This view exists outside of the family schema. remark /* Find the names of public synonyms for views owned by SYS that have names different from the synonym name. This allows the user to see the short-hand synonyms we have created. */ create or replace view DICTIONARY (TABLE_NAME, COMMENTS) as select o.name, c.comment$ from sys.obj$ o, sys.com$ c where o.obj# = c.obj#(+) and c.col# is null and o.owner# = 0 and o.type# = 4 ...continues...
The OBJ$ table (owned by SYS) is like the uber table of the data dictionary. Every object has an ID number, an owner number, and a type number (among other attributes). If OBJ# were a primary key, we would expect to see primary key type of information about that column. As it turns out, OBJ# is not the primary key per a this column is defined to the primary key for this table. What does a primary key have in terms of other constraints? OBJ# does have a unique index and a not null check constraint, so that must be close enough from Oracles perspective.
The OBJ$ table also has another interesting index related characteristic, and that is the index named I_OBJ2. This index uses six columns, which is fairly rare in terms of the number of columns involved. The vast majority of indexes I see are limited to one, two or three columns, with the exception of some data warehouse indexes with five or six columns. All of the objects users create must be registered in this table, so this is a table that will continue to grow. How and when are statistics created for the table? That depends on the version of Oracle youre using. From older to newer, the advice or guidance has ranged from dont do it to have at it.
The difference between TYPE# and OWNER# is that the object types are known ahead of time. Oracle only has so many types, so that information is static, whereas owners (aside from the internal or fixed type users of Oracle) come and go. With this difference in mind, wed expect to see a lookup table that relates a textual name versus an owner number. The USER$ table serves this function.
Is there a lookup table for types? We see what the types are in object tables (the object_type column), but internally, at least as far as the label versus ID number, Oracle does not have this. So where does the type name come from? The DECODE of type as seen in the view creation statements for the _OBJECTS views is what produces the text labels we see.
decode(o.type#, 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER', 4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE', 7, 'PROCEDURE', 8, 'FUNCTION', 9, 'PACKAGE', 11, 'PACKAGE BODY', 12, 'TRIGGER', 13, 'TYPE', 14, 'TYPE BODY', 19, 'TABLE PARTITION', 20, 'INDEX PARTITION', 21, 'LOB', 22, 'LIBRARY', 23, 'DIRECTORY', 24, 'QUEUE', 28, 'JAVA SOURCE', 29, 'JAVA CLASS', 30, 'JAVA RESOURCE', 32, 'INDEXTYPE', 33, 'OPERATOR', 34, 'TABLE SUBPARTITION', 35, 'INDEX SUBPARTITION', 40, 'LOB PARTITION', 41, 'LOB SUBPARTITION', 42, NVL((SELECT distinct 'REWRITE EQUIVALENCE' FROM sum$ s WHERE s.obj#=o.obj# and bitand(s.xpflags, 8388608) = 8388608), 'MATERIALIZED VIEW'), 43, 'DIMENSION', 44, 'CONTEXT', 46, 'RULE SET', 47, 'RESOURCE PLAN', 48, 'CONSUMER GROUP', 51, 'SUBSCRIPTION', 52, 'LOCATION', 55, 'XML SCHEMA', 56, 'JAVA DATA', 57, 'SECURITY PROFILE', 59, 'RULE', 60, 'CAPTURE', 61, 'APPLY', 62, 'EVALUATION CONTEXT', 66, 'JOB', 67, 'PROGRAM', 68, 'JOB CLASS', 69, 'WINDOW', 72, 'WINDOW GROUP', 74, 'SCHEDULE', 79, 'CHAIN', 81, 'FILE GROUP', 'UNDEFINED'),
Going back to the question posed earlier about where the comments come from, we can see that a table name COM$ is referenced in the create dictionary view statement. The COM$ table consists of OBJ#, COL# and a COMMENT$ columns. In a higher level normalized form, wed expect to see a table with OBJ#, COL# and a comment number or ID, along with a table that then matches the comment number versus text string or description.
The screenshot below shows how Oracle (at least this table, but there are others like it) did not go to third normal form. How many places does a string like A date function used to compute the NEXT_DATE occur? Another string, Name of the object, appears 85 times. Doesnt that clearly violate what every book on database design and normalization talks about?
Hacking the Data Dictionary
Perhaps youd be interested in hacking the data dictionary and adding your own comments into the COM$ table. I wouldnt recommend doing that (who would?), so if you wanted to fill in the missing comments, it would be better to create another view that used a support table to supply that information (and add your own). If you were going to do it (using DUAL as an example), find the object number for DUAL and then use that value for the insert into the COM$ table.
SQL> select obj#, owner#, name from obj$ where name = 'DUAL'; OBJ# OWNER# NAME ---------- ---------- ------ 258 0 DUAL 259 1 DUAL 9635 24 DUAL 9857 25 DUAL 40433 34 DUAL 40682 35 DUAL 41037 36 DUAL 41990 38 DUAL 42756 43 DUAL 45858 46 DUAL 47876 47 DUAL 50711 51 DUAL 52789 54 DUAL 54234 55 DUAL 52511 56 DUAL 54804 63 DUAL 16 rows selected.
Note that the DUAL table appears more than once, so be sure to tag the one owned by SYS (the owner number for SYS is zero). When inserting into the COM$ table, dont include a COL#. The statement that creates the DICTIONARY view matches on where the COL# number is null. The @dict script (mine) just selects from DICTIONARY where the table name is equal to what was entered.
SQL> insert into com$ (obj#,comment$) values (258,'This the dual table'); 1 row created. SQL> @dict Enter value for table: dual TABLE_NAME COMMENTS ------------------------------ -------------------- DUAL This the dual table
The views with missing comments are relatively unimportant, so performing this hack using existing objects is certainly not essential. As for adding new lines to the DICTIONARY view via inserts into the COM$ table, as long as youre using existing object numbers, youre probably safe. Keep track of what youve inserted because if the data dictionary were to be rebuilt, your work is likely to get overwritten/deleted.
Overall, it looks like the data dictionary, specifically, the SYS schema, is a lot like schemas you encounter in the workplace, that is, some objects were created following the rules, and some were not. Are queries against the data dictionary faster because there arent lookup tables? Or, is the lack of a lookup table for comments more of a convenience for the engineers who work on the SYS schema tables? Regardless, at least now you can reverse engineer how the dictionary was put together.