Whenever you start using a new database, you will more than
likely begin to relate it to other databases that you have used. This allows you to gain trust of the new
vendor quickly and learn where particular pieces fit in relationship to the
administrative tasks that you are used to performing. This article examines the
terminology of PostgreSQL in comparison to the other
large vendors, to get you up to speed with PostgreSQL
terminology.
Terminology Set
Instance
Every database vendor has the notion of an instance. An
instance is typically the processes running on a host computer system that
manages everything within the database you are using. It will usually also
manage connections, recoverability and logging. While this may be true, many of
the vendors will still use a slight twist on the term. Let’s
take a look.
PostgreSQL
In our last article, we learned how to start a PostgreSQL server through the postmaster. The server that
gets started by spawning operating system processes is called a PostgresSQL instance. It is this instance
in PostrgreSQL that manages all connections into the
server, the physical structures of the server such as data files and the
logical structures such as tables and indexes. It also carries out the
instructions that you or an application gives to manipulate and retrieve
information. A server process, which manages the database files, accepts
connections to the database from client applications, and performs actions on
the database on behalf of the clients. The database server program is called postmaster.
Oracle
In Oracle, the instance is considered the memory structures
defined in the system global area (SGA) and all of the background processes
started on the operating system level and the threads that access the server.
These memory structures and background processes are responsible for managing a
database’s data and serving up the data through user and application requests.
It is possible to have multiple instances running on the same computer. Typically,
one instance is associated with one database.
UDB
For UDB, the instance is considered the set of code
instructions and associated data structures that manage the data and access to
it. Each instance is managed, at the operating system level, by a unique user
ID and it is this operating system user ID that identifies
the uniqueness of the instance through a set of files that describe the
instance. Multiple instances are possible using this unique user
account. The instance is considered a complete environment solely unto itself.
This instance is also referred to as the database manager.
SQL Server
Within SQL Server, the instance is a copy of SQL Server
instructions running on a computer. The instance also consists of operating
system processes and threads running on the computer. A single computer can run
many instances of the SQL Server 2000 but only one instance of the SQL Server
version 7.0 and earlier. Within SQL Server, there are four system databases and
one or more databases associated with an instance.
Sybase
For Sybase Adaptive Server an instance is the executables
that possibly communicate with other Adaptive Server engines through shared
memory structures. There can also be more than one Sybase engine running on a
computer system. The instance is also known as the collection of datafiles or other wise known as devices.
Database
The database is typically all the underlying structures that
facilitate the storing and retrieval of information. This is always tables and
indexes but also contains things like procedures, triggers, recovery logs,
control files and configuration files. Again, let’s
take a look at how each of the vendors define the term.
PostgreSQL
Within PostgreSQL
an instance of PostgreSQL server may have one or many
databases. A database consists of an organized set of database objects such as
tables, indexes and functions that are organized under a schema. Users and
applications connect to a database, and use the database objects to store and
retrieve information. Typically, a
single database is created to serve a distinct application or single user
purpose.
Oracle
In Oracle, a database is considered a collection of data
that is treated as a single unit. It is
the database’s responsibility to store information and the ability to find and
present the information back to a user. The database is all of the structures
that make it possible to store and retrieve information. It is the catalog, the
set of system level data files, redo log files and
control files that make up a database. The database is both physical structures
like system level data files and logical structures such as tablespaces.
UDB
For UDB, a database is the set of system catalog tables that
describe the logical and physical structures for access to data. Also part of
the UDB database is the configuration file and recovery log for keeping track
of transactions. A UDB instance can have multiple databases defined but none
exists when an instance is created.
SQL Server
In SQL Server, a database is also a collection of
information. It is the tables and objects that are construed to keep track of
data and for the purpose of retrieval back to a user or application. A SQL
Server instance may contain many databases per instance. When an instance is
created, four system databases are created also. When a user or application
needs a database a separate database is created. Thus, one instance of SQL
Server may contain many user databases that can be granted access to users or
applications.
Sybase
For Sybase a database is the set of related data tables and
other database objects that are created, organized, and made available for
users and applications and are designed to provide a distinct application
purpose. The database is and created on one or many devices
that are related to the database.
Storage
It is in the storage area that things start to diverge for
each of the database vendors. Table 1 gives a
quick look at the different terminology each of the vendors use and how the
terms relate across vendor platforms.
Table 1
Individual database vendor storage
terminology
PostgreSQL Term |
Oracle Term |
SQL Server Term |
Sybase Term |
UDB Term |
instance |
instance |
instance |
instance |
instance |
database |
database |
database |
database |
database |
database cluster |
tablespace |
filegroup |
device |
tablespace |
heap disk file |
datafile |
database data file |
database file. |
container |
schema |
schema |
owner |
schema |
schema |
segment |
segment |
segment |
||
extent |
extent |
extent |
extent |
|
page |
data block |
page |
page |
Page |
Table |
Table |
Table |
Table |
Table |
Index |
Index |
Index |
Index |
Index |
PostgreSQL
In PostgreSQL
a database cluster is created that designates where the database will live on
disk. An initialization procedure must be done on this area before you create an
instance. You then create the databases you wish, all consuming this area of
disk. So in fact, a database cluster is a group of databases that consume a
single area on disk. This database cluster when talked about in file system
terms is called a data directory or data area. When a table is created, it is
assigned its own heap disk file. If there are long columns defined on this
table then it also gets associated to a TOAST file. Each of the tables and
indexes are comprised of pages of information that are typically 8K in size.
Oracle
In Oracle, database storage is composed of logical
structures called tablespaces that relate directly to one or more physical datafiles. When an object such as a table is created in Oracle,
it is placed in a tablespace and the logical unit of storage called a segment
is created to keep track of the consumed storage for that object. Every time an
object requires more space through inserts or updates, extents are taken within
the tablespace / datafiles to accommodate the extra
space required. These logical extents are composed of another logical structure
called a data block that is typically anywhere from 2K to 32K in size. Data blocks are composed of one or more operating
system blocks.
UDB
For UDB, database storage is composed of tablespaces that
point to containers which are devices or data files. When an object is created,
it is placed in a tablespace where it can extend and take extents. These
extents are composed of logical structures called pages, which are one or more
operating system blocks.
SQL Server
In SQL Server, file groups are created that contain one or
more datafiles. When an object such as a table is
created, it is placed in a file group and is composed of extents. When an
object requires more space, another extent is taken which consists of 8 contiguous pages. These pages are composed of one or more
operating system blocks.
Sybase
For Sybase, devices are created that point to any piece of
disk through a data file. When objects are created, they consume space on a
device. When a table or index requires space a block of 8
pages are taken to accommodate the increase in size, this increase in size is
called an extent. The size of the extent is dependent on the page size being
used. This page size directly relates to and uses one or more operating system
blocks to accommodate the page size.
No matter what database you are currently using someone
somewhere will ask about the relationship of one database to another, the ins
and the outs, the benefits, and the problems. Your ability to recognize the
individual components is of utmost important if you are to speak intelligently
on the subject. You need to understand what makes up a database from the
instance level down to the operating system blocks. Many of the database
vendors have the same terminology or something very similar. Take your favorite
database and compare to the others as I have and just remember, it’s all in the terminology.