Nathan Segal interviews Benny Souder, Vice President of Distributed Database Development
for Oracle,
and Jeff Jones of IBM.
According
to said Benny Souder, Vice President of Distributed Database Development for
Oracle, Grid Computing is where you have a network
of computers which tap into a main server. The concept comes from the
electrical grid and would be arranged in a system that functions in a similar
fashion. If you take an appliance and plug it into a wall outlet, then you
become a client of the electrical grid. As a client, you don’t know how the
grid is implemented, whether the power station is in the next state or next
door. All you want is power; you plug in and you get it. That’s the highest
logical level of Grid Computing.”
N:
How do you maximize the potential of Grid Computing?
S:
“Through centralization. This
includes consolidation, centralization, and cost savings. As the nodes or
points in the grid get bigger and you have a small number of large nodes, you
can do a more effective job of Grid Computing, just as a power company has a
small number of large power generators, rather than a power generator per
house. The power company works this way because they’re trying to get real
efficient utilization of their resources, because that keeps the rates down.”
“If you have little
islands of computation, you have to size them for peak, but most of the time
they’re pretty idle. A good way to get high utilization is to pool these
islands into larger nodes. If you then have the right technology and software,
you can dynamically allocate these computers to the priorities of your
business.”
N:
Can you offer an example of Grid Computing in actual practice?
S: “Yes. Let’s pretend that you’re an Internet retailer
selling books on the web and you’ve got two databases, one that powers your
website and keeps track of all the books, and the database is a data warehouse
of all the click stream data, etc. Right now, you need every computer you’ve
got powering your web site, because if the website is slow, people are going to
leave.”
“In December, a
mountain of data is collected about transactions on your website, but in
January, you’ll want to analyze that data and begin planning for next
Christmas. If you use separate SMP (symmetric multiprocessor) machines for
those two databases, it’s very hard to put all your CPU’s behind the website
and then switch them 30 days later and have almost all your CPU’s on the data
warehouse.”
“To get around the
problem, you would use Oracle technology and some new hardware called Server
Blades. You could do it with SMP, but you’d have to take the cabinet and machines
apart. That’s a big job and while you’re doing it, the website’s down for sure.”
N:
What is the advantage of using Server Blades?
S: “Server Blades are like a computer on a
board, with a CPU, some memory, a local disk for caching stuff and a backplane
plug. These blades, (about the size of a skinny pizza box) plug into a rack,
which has a power supply, a cooling fan and a network connection. Typically,
there are 10-30 blades in a rack. Since there are commodity CPU’s on these
boards and they share a common power supply, they’re very economical to make, about
80-90 cheaper than SMP.”
“With the blade
technology, we can run our database as well as real applications. Other
database vendors will tell you that these blades are great, but don’t put the
database on them. The reason is that their database on blades doesn’t run real
applications. Their cluster database runs benchmarks. There’s no application
vendor that’s certified on their cluster database. Whereas on our database,
what we call real application clusters,
SAP
is certified as well as Oracle applications and we have a hundred + production
reference customers who are running their business on this cluster database.”
N:
What happens if you attempt to run applications that are not certified?
S: “They don’t work. If you call up SAP, they will
tell you that it’s not supported. We can take a blade off or add a blade to our
database while it’s running. So if you’re running your website and data
warehouse on our blades, you can move the blades back and forth without any
down time. That means it’s really easy to allocate computing to what your
business priority is. That’s the first thing we’ve got for grid computing.”
“The second thing
is information sharing technology. For example, we have this stuff called
Transportable Tablespaces. This lets you snap data off one database and snap it
onto another. The file is on a disk, meaning that you don’t have to load or
unload the data. We also have Oracle Stream, which is a complete solution for
information sharing asynchronous. It does messaging, replication, events,
publishing, subscribing, and has a rules engine all in one integrated database.
“The third thing is
that we’re completely portable. So the application you’ve already written on
your SMP machine ports right into this grid technology, you don’t have to
rewrite the application.”
“The fourth thing
we’ve got is Globus,
a small organization that’s trying to develop open source software for grid
computing. They built this thing called the Globus Toolkit that we’ve
integrated with the Oracle database. We have a free for download customized,
integrated version of the Globus Toolkit with the Oracle database, so you don’t
have to figure out how to make these two things work together. We do that for
you.”
A different perspective
was shared by Jeff Jones of IBM.
He said: “Grid Computing is an effort to make computing resources appear to be
utilities that you tap into as necessary. In DB2 (Version 8), several aspects
have been enhanced, making it a good candidate for that type of processing. The
first is scalability. A grid requires and expects an enormous amount of data to
be supported and an enormous number of users to be coming after that data. So
very large scale processing is the norm in a grid.”
“Some of our experiences
with Grid Computing are the Life Sciences based grid done by Oxford University in England with us to support Breast Cancer research. Another
one exists at the University of Pennsylvania that’s a Mammography sharing grid, all of
which have been built on DB2.”
N: How
does DB2 work with Grid Computing?
J: “With
DB2, we have a Share Nothing
architecture. Here, any physical number of servers can be clustered together
and you can run one instance of DB2 across all of it. One server takes the
requests and breaks them up into pieces and farms the pieces out to all the
other servers to work in parallel, then reassembles everything at the end and
provides a complete answer back when questions are asked.”
“Each server in the
cluster receives an independent subset of the complete set of data and operates
separately and independently on its subset of the problem to be solved. This
form of independent cluster processing is extremely scalable with little loss
of efficiency as you add more servers to the cluster.”
N:
How does this compare to Oracle?
J: “Their
approach is to have a very
large common memory. Each instance of the database shares a common memory and
is being gone after by the same user population, so traffic management becomes
the hard problem to be solved.”
N:
Oracle spoke about server blades. Would you have to shut your system down to
add more servers?
J:
“No. Server blades are new form
factors for servers that can be rack mounted in very large numbers and can be
pulled in and pulled out and plugged back in again. It’s not a new paradigm;
it’s just a more efficient way of clustering hardware. Their approach and our
approach both enable clusters to be grown or shrunk with not nearly as much
agony as in the past.”
“With DB2, our
approach is to offer utilities. When we add a new server to a cluster, we apply
a rebalancing utility that allows you to redistribute the data and populate the
new server. This type of housecleaning has to be done on anybody’s system. You
shouldn’t let any vendor convince you that it’s painless, but today both
vendors have made it much more bearable and much of the process can be done
with nothing coming down.”
“In a fault
tolerance sense, this is good, because servers can be paired together and one
can serve the idle standby for the other. This is something that both Oracle
and IBM do in a similar fashion. You can have an eight server cluster where
four of the servers are actually doing work, while the other four are twins of
the first four; waiting to be failed over to if something goes wrong. This is
very common high tolerance, high availability configuration for servers. And
racks and blade servers simply make that more efficient.”
Souder said that
the goal of Grid Computing is where “you want information, answers,
computation, and get it. That’s the fundamental idea, the dream and the goal. We’re
a long way from being there, but that’s the direction that we’re moving in.”