An instance by any other name is still an instance...or is it
October 29, 2003
Whenever you start using a new database, you will more than likely begin to relate it to other databases that you have used. This allows you to gain trust of the new vendor quickly and learn where particular pieces fit in relationship to the administrative tasks that you are used to performing. This article examines the terminology of PostgreSQL in comparison to the other large vendors, to get you up to speed with PostgreSQL terminology.
Every database vendor has the notion of an instance. An instance is typically the processes running on a host computer system that manages everything within the database you are using. It will usually also manage connections, recoverability and logging. While this may be true, many of the vendors will still use a slight twist on the term. Let's take a look.
In our last article, we learned how to start a PostgreSQL server through the postmaster. The server that gets started by spawning operating system processes is called a PostgresSQL instance. It is this instance in PostrgreSQL that manages all connections into the server, the physical structures of the server such as data files and the logical structures such as tables and indexes. It also carries out the instructions that you or an application gives to manipulate and retrieve information. A server process, which manages the database files, accepts connections to the database from client applications, and performs actions on the database on behalf of the clients. The database server program is called postmaster.
In Oracle, the instance is considered the memory structures defined in the system global area (SGA) and all of the background processes started on the operating system level and the threads that access the server. These memory structures and background processes are responsible for managing a database's data and serving up the data through user and application requests. It is possible to have multiple instances running on the same computer. Typically, one instance is associated with one database.
For UDB, the instance is considered the set of code instructions and associated data structures that manage the data and access to it. Each instance is managed, at the operating system level, by a unique user ID and it is this operating system user ID that identifies the uniqueness of the instance through a set of files that describe the instance. Multiple instances are possible using this unique user account. The instance is considered a complete environment solely unto itself. This instance is also referred to as the database manager.
Within SQL Server, the instance is a copy of SQL Server instructions running on a computer. The instance also consists of operating system processes and threads running on the computer. A single computer can run many instances of the SQL Server 2000 but only one instance of the SQL Server version 7.0 and earlier. Within SQL Server, there are four system databases and one or more databases associated with an instance.
For Sybase Adaptive Server an instance is the executables that possibly communicate with other Adaptive Server engines through shared memory structures. There can also be more than one Sybase engine running on a computer system. The instance is also known as the collection of datafiles or other wise known as devices.
The database is typically all the underlying structures that facilitate the storing and retrieval of information. This is always tables and indexes but also contains things like procedures, triggers, recovery logs, control files and configuration files. Again, let's take a look at how each of the vendors define the term.
Within PostgreSQL an instance of PostgreSQL server may have one or many databases. A database consists of an organized set of database objects such as tables, indexes and functions that are organized under a schema. Users and applications connect to a database, and use the database objects to store and retrieve information. Typically, a single database is created to serve a distinct application or single user purpose.
In Oracle, a database is considered a collection of data that is treated as a single unit. It is the database's responsibility to store information and the ability to find and present the information back to a user. The database is all of the structures that make it possible to store and retrieve information. It is the catalog, the set of system level data files, redo log files and control files that make up a database. The database is both physical structures like system level data files and logical structures such as tablespaces.
For UDB, a database is the set of system catalog tables that describe the logical and physical structures for access to data. Also part of the UDB database is the configuration file and recovery log for keeping track of transactions. A UDB instance can have multiple databases defined but none exists when an instance is created.
In SQL Server, a database is also a collection of information. It is the tables and objects that are construed to keep track of data and for the purpose of retrieval back to a user or application. A SQL Server instance may contain many databases per instance. When an instance is created, four system databases are created also. When a user or application needs a database a separate database is created. Thus, one instance of SQL Server may contain many user databases that can be granted access to users or applications.
For Sybase a database is the set of related data tables and other database objects that are created, organized, and made available for users and applications and are designed to provide a distinct application purpose. The database is and created on one or many devices that are related to the database.
It is in the storage area that things start to diverge for each of the database vendors. Table 1 gives a quick look at the different terminology each of the vendors use and how the terms relate across vendor platforms.
Individual database vendor storage terminology
In PostgreSQL a database cluster is created that designates where the database will live on disk. An initialization procedure must be done on this area before you create an instance. You then create the databases you wish, all consuming this area of disk. So in fact, a database cluster is a group of databases that consume a single area on disk. This database cluster when talked about in file system terms is called a data directory or data area. When a table is created, it is assigned its own heap disk file. If there are long columns defined on this table then it also gets associated to a TOAST file. Each of the tables and indexes are comprised of pages of information that are typically 8K in size.
In Oracle, database storage is composed of logical structures called tablespaces that relate directly to one or more physical datafiles. When an object such as a table is created in Oracle, it is placed in a tablespace and the logical unit of storage called a segment is created to keep track of the consumed storage for that object. Every time an object requires more space through inserts or updates, extents are taken within the tablespace / datafiles to accommodate the extra space required. These logical extents are composed of another logical structure called a data block that is typically anywhere from 2K to 32K in size. Data blocks are composed of one or more operating system blocks.
For UDB, database storage is composed of tablespaces that point to containers which are devices or data files. When an object is created, it is placed in a tablespace where it can extend and take extents. These extents are composed of logical structures called pages, which are one or more operating system blocks.
In SQL Server, file groups are created that contain one or more datafiles. When an object such as a table is created, it is placed in a file group and is composed of extents. When an object requires more space, another extent is taken which consists of 8 contiguous pages. These pages are composed of one or more operating system blocks.
For Sybase, devices are created that point to any piece of disk through a data file. When objects are created, they consume space on a device. When a table or index requires space a block of 8 pages are taken to accommodate the increase in size, this increase in size is called an extent. The size of the extent is dependent on the page size being used. This page size directly relates to and uses one or more operating system blocks to accommodate the page size.
No matter what database you are currently using someone somewhere will ask about the relationship of one database to another, the ins and the outs, the benefits, and the problems. Your ability to recognize the individual components is of utmost important if you are to speak intelligently on the subject. You need to understand what makes up a database from the instance level down to the operating system blocks. Many of the database vendors have the same terminology or something very similar. Take your favorite database and compare to the others as I have and just remember, it's all in the terminology.