We all know the path up the
physical and logical chains of an Oracle database. A fairly typical statement
in most database administration books states that a tablespace is where the
physical meets the logical. Further, these books go into great detail about the
building blocks of each chain. The lowest starting point on the physical chain
is the operating system block and how it compares to a database block (usually
one-to-one or where the database block is a multiple of the operating system
block size). What’s typically left out of the physical description is something
lower than the block, and that is what comprises a block, namely bytes. The
ordering of bytes is an important aspect of how programs and operating systems
deal with data, and this is one of those areas where Oracle does things that
users take for granted.
A byte is a byte, right? For
the most part this is true, but when you factor in the platform on which a byte
lives, the details of what a byte is starts to differ. It’s still eight bits to
a byte, and the ordering of bits is generally not an issue. What is an issue is
the ordering of bytes, where byte ordering is referred to as endianness. Let’s
use a common analogy about computer words and spoken words. In a language, how
would you say aloud the number 24? In English, we say twenty-four. In other
languages, 24 is spoken as four and twenty. The difference between the two
methods is based on where the most significant digit or part occurs. In a
language where twenty comes first, being where the most significant part comes
first (20 is more significant that 4), the ordering is big endian. In the
opposite case, where the least significant digit comes first, the ordering is said
to be little endian (or Little-Endian or little-endian, same formatting for
big).
Defining Endianness
In a white paper,
Intel defines endianness as follows.
Endianness is
the format to how multi-byte data is stored in computer memory. It describes
the location of the most significant byte (MSB) and least
significant byte (LSB) of an address in memory. Endianness is dictated by the CPU
architecture implementation of the system. The operating system does not
dictate the endian model implemented, but rather the endian model of the CPU
architecture dictates how the operating system is implemented.
There is no better or worse
involved when deciding which ordering to use, as each has distinct advantages.
Little-Endian form supports multiple precision math routines, while Big-Endian
form is very good at performing binary to decimal conversions (plus the
advantage of how it writes out values from big to small because that is how the
bytes are arranged). If you’ve looked through any trace or dump files, more
than likely you’ve seen values written in hexadecimal format (“words” beginning
with 0x signify hexadecimal format).
The number 287454020, for
example, can be written as 0x11223344. Computer science books where computer
math is involved typically have conversion problems related to the base of a
number. Convert 1234 to base 2, convert 12.34 to binary, and convert 287454020
to base 16 are examples. What is assumed when converting base whatever into
base 16 or hexadecimal is the byte ordering. In a Big-Endian system, the
conversion of 287454020 to 0x11223344 would be correct. But what if your system
was little-endian? The hex value would be stored as 0x44332211.
Practical Implications
The most important sentence
in the white paper should immediately bring to mind something Oracle deals with.
If the target
application is currently running on a Big Endian platform and the goal is to
port to a Little Endian platform, or visa-versa, byte ordering may become an
issue.
For those who
are relatively new to Oracle, this statement precisely frames the issue of
using transportable tablespaces. If anyone ever tells you that *NIX (whatever
version of UNIX) is all the same except for some commands and how shells
behave, ask them about byte ordering or the endianness of brand A versus that
of brand B. The bottom line is that with respect to UNIX, there is a significant
difference when it comes to how data is stored. In platform ID order (what was
platform 14?), a query out of V$TRANSPORTABLE_PLATFORM shows the following.
PLATFORM_ID PLATFORM_NAME ENDIAN_FORMAT
———– ——————————– ————–
1 Solaris[tm] OE (32-bit) Big
2 Solaris[tm] OE (64-bit) Big
3 HP-UX (64-bit) Big
4 HP-UX IA (64-bit) Big
5 HP Tru64 UNIX Little
6 AIX-Based Systems (64-bit) Big
7 Microsoft Windows IA (32-bit) Little
8 Microsoft Windows IA (64-bit) Little
9 IBM zSeries Based Linux Big
10 Linux IA (32-bit) Little
11 Linux IA (64-bit) Little
12 Microsoft Windows 64-bit for AMD Little
13 Linux 64-bit for AMD Little
15 HP Open VMS Little
16 Apple Mac OS Big
17 Solaris Operating System (x86) Little
Solaris and HP flavored UNIX
have to deal with both formats, and so does Linux, depending on where it comes
from (thanks, IBM). Windows is consistent (and we’re not going into the debate
about why one should or should not put a production system on Windows), and
Oracle on the Apple Mac OS will be a short-lived side venture.
The Conversion Command
So what if you were doing a
double-sided migration, that is, going from Oracle version X to X plus one, and
going from HP Tru64 to 64-bit Linux? The endianness of the operating systems differs
(technically, the difference lies within the CPU architecture), so that
requires a transformation or conversion – and that is if the versions of Oracle
support that to begin with. To use a simple case, let’s suppose a tablespace
named USERS has one datafile. The problem comes down to the “how” of “How do
you convert the little-endian datafile to one that is big-endian?” The answer
is simple: do some byte swapping.
Fortunately, Oracle 10g
abstracts this process for us via RMAN. It may seem odd at first that RMAN
would be involved in converting a file from one format to another, but in a way
it makes sense when considering what RMAN can do with respect to managing
files. Although there are some restrictions (see the details in the Backup
and Recovery Reference), the ability of Oracle to perform the endian
conversion for us lifted a major limitation on using transportable tablespaces.
In prior versions, you had to be on the same platform, period. With the CONVERT
command in RMAN, under the right, but not so narrow conditions, you can perform
a database version upgrade in significantly less time than what it had been.
The growth in the size of databases
reminds me of the old joke about how money is tossed around by Congress. A
billion here, a billion there, and pretty soon you’re talking about some real
money. The same thing applies with data. A terabyte here, a terabyte there, and
pretty soon we’re talking about some real size. If you’ve ever had to use
export and import on a near terabyte-sized database, the number of hours
involved was daunting. Transportable tablespaces helped a good bit, but you
were limited to being on the same platform. The ability to go cross-platform
now gives you much more freedom with respect to choosing an operating system
that suits your needs.
In Closing
Knowing why (in the past)
you were limited to using transportable tablespaces on the same platform is
easily understood now. Quite simply, Oracle didn’t have the capability to
perform the byte swapping needed to go from one endian format to the other.
Knowing the limitations or features of older versions of Oracle has relevance
in at least two areas. One is that this knowledge gives you an appreciation of
how Oracle has developed over the years, and the second is that one day you may
find yourself working with an older version.
And where did “endian” come
from? This explanation comes from Webopedia.
The terms big-endian and little-endian
are derived from the Lilliputians of Gulliver’s Travels, whose major political
issue was whether soft-boiled eggs should be opened on the big side or the
little side. Likewise, the big-/little-endian computer debate has much more to
do with political issues than technological merits.