By James Lee
Generating and storing large volumes of data has reached a
critical mass. Where companies once managed megabytes and gigabytes, they now
handle terabytes and petabytes of data and storage. Comprehensive application
software, the Internet and new computing and storage technologies have made it
easier to create, collect and store all types of data. In addition to
duplicating data for backup and recovery purposes, enterprise data must also be
retained for years to comply with government-mandated retention requirements.
The meaningful collection of data comprises the information
that companies rely on to improve decision-making and to gain a competitive
advantage. As companies accumulate increasing volumes of enterprise data,
managing, accessing and storing this information cost effectively is one of the
most critical IT challenges. As a result, the trend toward Information
Lifecycle Management (ILM) is giving rise to a class of solutions designed to
help companies meet these goals, and database archiving is a critical component
in any ILM implementation.
Reducing Storage Costs throughout the Information Lifecycle
The concept that all information has a lifecycle is not really
new. This lifecycle begins when data or information is acquired and ends when
it is no longer needed and can be deleted. Over time, as the information is
accessed less frequently, its business value decreases. However, when this
rarely accessed data is needed, its value increases immediately, and it must be
accessible on demand. In fact, penalties may be involved if the data is not
readily available. For this reason, most companies continue to store rarely
accessed data in high-cost, fast performance systems, increasing operational
costs and wasting valuable processing resources.
ILM has evolved to define an approach for managing and
storing information on the most cost-effective storage medium over time, based
on its business value and access requirements. At first glance, ILM shares many
similarities with the more familiar concept of Hierarchical Storage Management,
which allows for automatically managing information and moving it to a higher
or lower performance storage medium based on access rates. However, while HSM
is best suited for managing files, ILM is designed to manage all types of data
and provides a framework for data and storage management.
Although a simple concept, ILM is not without its challenges
in today’s heterogeneous IT environments because data can be managed and stored
in structured relational databases; in semi-structured file systems, such as
email; and as unstructured fixed content, like documents and graphic files. As
companies strive to implement ILM, they need to select storage and data
management solutions carefully. Among the storage strategies for implementing
ILM, the ability to archive information is of paramount importance for reducing
costs, managing data efficiently and complying with data/information retention
requirements.
How to Get Started Implementing ILM
Before developing an ILM strategy, an organization must
first identify all the types of enterprise information managed in its
environment. This process provides a comprehensive understanding of what the
data is and how it is used, as well as its data retention and storage
requirements.
Typical corporate information includes all transactional
data from enterprise business applications and the associated databases, such
as payroll, customer information systems and purchasing systems. In addition to
mission-critical systems, companies must also consider their online
collaboration information, such as email, instant messaging (IM) and web-based
conferencing. Information residing on local workstations, including documents,
spreadsheets and presentations, must also be considered. It is critical to
account for all types of information in order to determine the lifecycle,
retention requirements and appropriate archiving and storage strategies for
managing each type of data.
During this analysis, companies can identify the data that
must remain online, as well as the data that should be archived. Upon
completing an analysis of corporate data retention and archiving requirements,
the next step is to perform gap and risk analysis to identify the types of data
that must be archived and data that does not need to be archived immediately.
For example, if the decision is to keep data online rather than archiving it, a
company must assess the risk of not archiving, in the context of degraded
application performance and availability, as well as the increased cost of
operations. The goal of an effective ILM strategy is to keep historical data as
long as required, but no longer. Consider this approach "Just-in-Time"
data accessibility. Information retained past the required time can increase
costs associated with storing and managing the data after it is no longer
needed.
Understanding Data Retention Requirements
Managing and storing structured, semi-structured and
unstructured fixed content data presents unique challenges. In addition there
are associated costs for storage and easy access to comply with regulatory
requirements. For example, data retention requirements can range from seven
years for financial data, 20 years for pharmaceutical research data, and 50+
years for data associated with nuclear facilities. The following questions can
help determine how to manage different types of data to comply with regulatory
requirements and eliminate unnecessary costs:
-
What are the specific regulatory requirements that pertain to
each type of data and how do they affect data retention? -
After the data retention period has expired, when can data be
deleted? -
Are there business reasons for keeping data beyond the time
required to comply with regulatory data retention requirements? -
Are there penalties associated with retaining this data beyond
the retention period? - What is the risk of not archiving this data?
-
What is the risk if this data is archived and cannot be retrieved
in a timely manner? -
How often will access to this data be required and what is the
expected response time for data access? -
What type of media will be used for storage (tape,
magneto-optical, compact disk, etc.)? - What is the shelf life of the selected storage media?
- Can the shelf life be extended through media renewal (copy)?
- Is technology obsolescence a consideration?
- Is off-site storage required?
- Which applications are affected and how are they impacted?
- Are there application requirements for accessing older data?
- What are the application compatibility rules?
After the data has been analyzed, the next step is to select
the appropriate solution(s) to help achieve these objectives. The remainder of
this discussion will focus on archiving structured data stored in relational
databases.
What is Database Archiving?
Companies invest millions of dollars each year in
maintaining and upgrading business critical applications that rely on complex
relational databases. These databases collect increasing amounts of data for
business operations and decision-making. As a consequence, overloaded databases
degrade performance and limit the availability of the comprehensive
capabilities these applications were designed to deliver. Ironically, most of
this data is stored online in production databases but is rarely accessed.
Database archiving allows for archiving and removing this
rarely accessed data and storing it on a variety of storage mediums while
providing easy access – a critical requirement of most data retention
legislation. IT organizations can analyze the best mix of storage alternatives.
Maintaining current, active data online and selecting the most appropriate
storage medium for archived data ensures a cost-effective balance throughout
the information lifecycle. This process also ensures that enterprise
application databases are maintained at a manageable size to improve
performance and availability of critical systems.
Database archiving also allows IT organizations to maximize
the benefits of existing SANs, NAS and HSM storage solutions. Database
archiving complements these technologies, especially HSM systems, to enable a
best-practice "staged" approach for managing historical relational
data. This approach can be an integral part of enterprise ILM.
Selecting an Ideal Database Archiving Solution
In selecting the ideal database archiving solution,
companies should consider whether the technology under consideration allows
them to:
-
Safely archive and remove precise subsets of rarely used data
from complex relational databases with 100 percent accuracy, -
Preserve the referential integrity and business context of the
data even for the most complex data model, - Intelligently index archived data to ensure fast retrieval,
-
Selectively delete data to retain relevant contextual information
in the production database and remove only specific subsets, -
Preview the data after it is archived and before it is deleted
from the production database to prevent deleting data inadvertently, -
Store archived data on a variety of alternative storage mediums
and keep it "active" for easy access when needed, -
Quickly locate, browse and manage specific archived data no
matter where it is stored, -
Selectively restore referentially intact subsets of archived data
in a single step. (It should not be necessary to restore large amounts of data
for the sake of a few rows), -
Accommodate current and future archiving needs across
applications, databases, operating systems and hardware platforms.
Benefits of ILM and Database Archiving
Database archiving is a key component for implementing ILM
to manage complex relational data. Safely archiving and removing rarely
accessed data frees processing power to improve application performance and
availability and to implement new applications – all without expensive hardware
and capacity upgrades. In addition, regularly scheduled database archiving
frees up significant amounts of disk capacity that can be made available for
other uses, saving millions of dollars in hardware and software upgrades.
Database archiving provides an effective long-term solution to the problem of
explosive database growth, while reducing the Total Cost of Ownership (TCO).
A comprehensive enterprise database archiving methodology
must provide the capability to archive data from a variety of relational databases
and platforms. The ideal database archiving solution must also ensure the
referential integrity and business context of the archived data and provide for
easy access. In addition, there must be a capability for managing and storing
archived data on the most cost-effective storage medium (online in an archive
database, near-line on a file server, optical devices, or offline to tape or
disk-based WORM devices). Implementing database archiving is an integral part
of every ILM strategy to ensure that data and storage resources are well
managed throughout the information lifecycle.
James Lee
Jim Lee joined Princeton Softech in 1997 and under his direction as the
Development Manager, the project teams have delivered several new releases of
Princeton Softech’s Relational Tools" and Archive for DB2". He also
contributed his technical expertise at several Database Archiving
Workshops" to assist companies in determining the benefits of an effective
database archiving strategy.
In his new role, Jim will direct Princeton Softech’s Product Marketing efforts,
coordinating with Development, Marketing Communications and Sales. With over 15
years of experience in application development and consulting, Jim’s background
includes application development, short and long-term product planning, risk
assessment, cost-benefit analysis, customer consulting and evaluating emerging
technologies.
Jim holds a Bachelor of Science Degree in Computer Applications and Information
Systems and a Bachelor of Science Degree in Accounting from New York University, College of Business and Public
Administrations.