If all databases are SQL92-compliant, then they must be the same.
At least that is the assumption made many times. In this section
I would like to dispel that myth.
SQL92 is an ANSI/ISO standard for databases. It is the successor
to the SQL89 ANSI/ISO standard. It defines a language (SQL) and
behavior (transactions, isolation levels, and so on) that tell
you how a database will behave. Did you know that many
commercially available databases are SQL92-compliant? Did you
know that it means very little as far as query and application
portability goes?
Starting with the standard, we will find that the SQL92 standard
has four levels:
- Entry-level — This is the level to which most
vendors have complied. This level is a minor enhancement of the
predecessor standard, SQL89. No database vendors have been
certified higher and in fact the National Institute of Standards
and Technology (NIST), the agency that used to certify for SQL-
compliance, does not even certify anymore. I was part of the team
that got Oracle 7.0 NIST-certified for SQL92 entry-level
compliance in 1993. An entry level compliant database has the
feature set of Oracle 7.0.
- Transitional — This
is approximately 'halfway' between entry-level and intermediate-
level as far as a feature set goes.
- Intermediate — this adds many features including
(not by any means an exhaustive list):
- Dynamic SQL
- Cascade DELETE for referential integrity
- DATE and TIME data types
- Domains
- Variable length character strings
- A CASE expression
- CAST functions between data types
- Full — Adds provisions for (again, not
exhaustive):
- Connection management
- A BIT string data type
- Deferrable integrity constraints
- Derived tables in the FROM clause
- Subqueries in CHECK clauses
- Temporary tables
The entry-level standard does not include features such as outer
joins, the new inner join syntax, and so on. Transitional does
specify outer join syntax and inner join syntax. Intermediate
adds more, and Full is, of course all of SQL92. Most books on
SQL92 do not differentiate between the various levels leading to
confusion on the subject. They demonstrate what a theoretical
database implementing SQL92 FULL would look like. It makes it
impossible to pick up a SQL92 book, and apply what you see in the
book to just any SQL92 database. For example, in SQL Server the
'inner join' syntax is supported in SQL statements, whereas in
Oracle it is not. But, they are both SQL92-compliant databases.
You can do inner joins and outer joins in Oracle, you will just
do it differently than in SQL Server. The bottom line is that
SQL92 will not go very far at the entry-level and, if you use any
of the features of intermediate or higher, you risk not being
able to 'port' your application.
You should not be afraid to make use of vendor-specific features
— after all, you are paying a lot of money for them. Every
database has its own bag of tricks, and we can always find a way
to perform the operation in each database. Use what is best for
your current database, and re-implement components as you go to
other databases. Use good programming techniques to isolate
yourself from these changes. The same techniques are employed by
people writing OS-portable applications. The goal is to fully
utilize the facilities available to you, but ensure you can
change the implementation on a case-by-case basis.
For example, a common function of many database applications is
the generation of a unique key for each row. When you insert the
row, the system should automatically generate a key for you.
Oracle has implemented the database object called a SEQUENCE for
this. Informix has a SERIAL data type. Sybase and SQL Server have
an IDENTITY type. Each database has a way to do this. However,
the methods are different, both in how you do it, and the
possible outcomes. So, to the knowledgeable developer, there are
two paths that can be pursued:
- Develop a totally database-independent method of generating a
unique key.
- Accommodate the different implementations and use different
techniques when implementing keys in each database.
The theoretical advantage of the first approach is that to move
from database to database you need not change anything. I call it
a 'theoretical' advantage because the 'con' side of this
implementation is so huge that it makes this solution totally
infeasible. What you would have to do to develop a totally
database-independent process is to create a table such as:
create table id_table ( id_name varchar(30), id_value number );
insert into id_table values ( 'MY_KEY', 0 );
Then, in order to get a new key, you would have to execute the
following code:
update id_table set id_value = id_value + 1 where id_name = 'MY_KEY';
select id_value from id_table where id_name = 'MY_KEY';
Looks simple enough, but the outcome is that only one user at a
time may process a transaction now. We need to update that row to
increment a counter, and this will cause our program to serialize
on that operation. At best, one person at a time will generate a
new value for this key. This issue is compounded by the fact that
our transaction is much larger then we have outlined above. The
UPDATE and SELECT we have in the example are only two statements
of potentially many other statements that make up our
transaction. We have yet to insert the row into the table with
this key we just generated, and do whatever other work it takes
to complete this transaction. This serialization will be a huge
limiting factor in scaling. Think of the ramifications if this
technique was used on web sites that processed orders, and this
was how we generated order numbers. There would be no multi-user
concurrency, so we would be forced to do everything sequentially.