Some of the possibilities with the IBM Data Studio vision
Lets assume youre a retailer and you offer credit cards as
a form of payment and thus need to comply with the PCI standard. Lets further
assume you use IBM Rational Data Architect (IBM RDA) to model your physical and
logical design.
A Moment on PCI:Its
outside the scope of this article to delve into the details of the PCI
standard. In a nutshell, major credit card companies have mandated that vendors
who use their services must comply with the PCI standard. The ultimate goal is to
provide coaching to those companies that handle credit card data to protect
cardholder information vis-à-vis secure networks, encryption, and so on.
Vendors who fail to comply with the
PCI standard are subject to some really hefty fines; in addition, any retailer
found to have leaked sensitive cardholder data can be required to submit to
external audits or risk losing their right to offer credit card services (from
those credit vendors that participate in the program). Most recently, a major US
retailers credit card processing company was fined $880,000 in the summer of
2007 and will continue to be fined $100,000/month until compliance is met. The IT
challenges posed by the PCI standard are significant: recent studies estimate
that only 40% of the major retailers in North America are compliant!
There are 12 parts to the PCI
standard. An example of just one of its declarations is that credit card
numbers on any report for those without a need to know has to put Xs instead
of the credit card numbers leading digits (except the last block of digits).
For example, a billing report:
Another declaration states that
PIN codes such as those you enter to access your account online must be
represented with *s, ·s, or
some masking symbol.
Another specification (Part 3.1)
notes that the storage of cardholder data should be kept to a minimum, and that network
diagrams that illustrate all connections to cardholder data (Part 1) are
required too. There are a lot of other declarations within this standard, such as
how to handle test data. You can learn more about the PCI standard
at: www.pcisecuritystandards.org/.
So lets assume you are building a retail database. What if you
could specify in the IBM RDA data model that a specific column contains credit
card numbers, or that another column contains PINs? If you could, then a governance
tool could know that these two fields need to be governed by the PCI standard
and represent these columns in your business glossary, the
governance solution could automatically put fine-grained access control (FGAC)
rules in place to mask the data whenever these columns are accessed (except for
applications that are specifically authorized to bypass these rules).
Consider another scenario for this same retailer. The PCI
standard notes that live customer data cannot be used as test data. This means that
real credit card numbers, real addresses, real names, and so on, cannot be used.
If you do, you can be fined. Now your IT department is faced with a challenge
to create a test database for the development organization. This is a perfect
example of different roles being able to leverage the same toolset to solve a
business problem. The DBA has to change the real data such that when it goes
into the test database it is PCI-compliant. But what would happen if the DBA didnt
do this but instead loaded up a new sample database with this data? Typically, nothing
would happen, which could lead to some large fines. Now imagine a scenario
where that same governance tool from the previous paragraph could detect that
PCI-compliant data was being loaded into a test database. If that tool could flag
this violation at data movement time, it would be a very valuable asset.
Now lets assume that you want to extract all the records in
your database for an advertising campaign but are not governed by the PCI
standard. However, your enterprise has its own set of privacy rules that
guarantee clients that their information wont be distributed to other vendors
without their permission. Now expand this scenario to the data model. If the
data server understands that column A is the opt-in or opt-out field to make
your home address visible, then every time a query is run against the data
server for a campaign extract, the data server can consult this field to see if
the address should be returned to the application or not.
In addition to this retail scenario, there are lots of other
examples from other industries. The Health Information Protection and
Portability Act (HIPPA) is a medical privacy initiative that pretty much says
you cant show medical records in reports (similar to the PCI standard for
credit card information). If a column has personal medical history information,
that column can only be shown to the patients physician. You cant show it to
just anyone who has DBA privileges. Same idea -- different industry.
The goal is to make the design layer smart enough to understand
the different kinds of fields that have these kinds of characteristics. That
will provide you with a very efficient environment.
Now, dont let all my talk about the tooling and the role
identification fool you; IBM is still focusing on the client-side runtime APIs
because if you cant run the application in a robust and high-performing
manner, there is no sense in having any tooling whatsoever. We plan to
enhance our already strong API story with a new suite of tools around the jobs
and roles of the personnel that actually do the work.
Note: The DB2 and Studio.NET experience has
had a long and rich history of deep integration. Since Visual Studio is the IDE
where .NET developers tend to work, IBM has provided plug-ins into this
environment since the DB2 UDB Version 8.1.2 release. Ive written extensively on this
particular integration and this API and IDE, so it isnt discussed in this
article.
Heres my attempt to express IBMs intentions for the new
IBM Data Studio:
The goal of a comprehensive solution for the entire life
cycle of an application is unique and ambitious. Im talking about everything
from early up-front design of business processes and logical data models for
all of your applications, to the time when you start to design, code, and test
those applications. But the process isnt finished at that point. You then have
to create a physical database model including their schemas; you have to
allocate storage. You have to perform a lot of security work. You have to
accommodate for the day-to-day management of these systems such as backup and
recovery or meeting service level agreements. Finally, dont forget all the governance
work you have to perform (a growing percentage of the work required in the
application life cycle) to satisfy the regulatory compliance decrees placed
upon your business and more.
While there are lots
and lots of tools out there that address part of the application life cycle,
there are no other single toolsets that currently cover the full life cycle.
Indeed, as I look across the market, most tools are piece-part solutions, which
means you need two or more of them to address the methodology I outlined in
this article.
Quite frankly, the
IBM Data Studio vision is an ambitious undertaking, but one with extremely high
value and high potential for our clients since no one else really has this
vision. Whats more, the vision has the potential to grow beyond the set of relational
IBM data servers that IBM Data Studio currently supports. There would be a lot
of value in a cross-data server toolset to solve the inefficient conundrum with
respect to how enterprises align their data server personnel today: skills and
resources tend to be aligned and constrained in direct correlation to a
database vendor. In my opinion, this hurts productivity. A cross-brand full life
cycle toolset would lead to better control over staffing, better control over
resource allocation, and an unbeatable edge over the competition!
How the year 2007 will finish up for IBM Data Studio
IBM has now announced the general availability of IBM Data
Studio and best of all its free for everyone. Although the first iteration of
this IDE is far from the full life cycle vision Ive discussed in this article,
some valuable stuff is being delivered. For example, Data Studio contains a
subset of some of the functionality in IBM RDA. (See Part
3 in this series for an example of the overview data dependency diagrams,
which allow you to see tables and their relationships, and so on.) It also has entity-relationship
diagrams, a simple data distribution viewer (which allows you to see your
datas distribution), and more.
From a development perspective, youll get most of the things
that youre used to seeing as a developer, such as building SQL and XQuery
statements, routines, and so on. In addition to what its predecessor, DB2 9
Developer Workbench, gave you, IBM Data Studio has a set of rich enhancements designed
to make developers even more productive. Theres also a new Web services
framework called IBM Data Web Services. Finally, a lot of features were added
for DBA activities. From a governance perspective, IBM Data Studio gives you
the ability to work with security roles.
Obviously, I cant detail all the features in this article
because thats what the point of this series is but the following figure
should give you an idea of the things you can do free of charge with IBM Data
Studio and an IBM data server:
The capabilities outlined in the previous figure are all
free. Since the toolset is extensible, you can buy add-ons that add new
features to context menus or enable disabled features (in the same way that you
can add to the base tooling provided by vendors such as Quest or Embarcadero).
For example, when IBM Data Studio became generally available, IBM announced two
such chargeable add-ons: IBM Data Studio
Developer and IBM Data Studio
pureQuery Runtime, which are geared towards the incredibly cool Java
enhancements that go with pureQuery
(a subject for a future installment of this series).
Where is IBM Data Studio heading?
In this section, I outline some of the top priorities for
the IBM Data Studio. (Of course, these are not commitments but rather goals.)
The first goal is to reduce the footprint associated with
this toolset and, since client-side offerings arent tied to server releases,
IBM Data Studio can have its own availability schedule thats more aggressive
than that of a data server.
In addition, DB2 or Informix IDS must be seen as the unquestionable
choice for IBM WebSphere Application Server. The debut of pureQuery is a great
first step here, but expect to see initiatives along other integration points
too. Some possible examples include JDBC Capture to enable pureQuery for any
Java program, significant performance monitoring and problem determination
aids, openJPA support for pureQuery, Spring, iBatis, and so on. Also being considered is tight integration with Object
Grid (SQL syntax for caching, DB2 or Informix IDS persistence options,
replication from DB2 or Informix IDS to Object Grid, and more).
Since IBM Data Studio now represents a standard toolset with
a pluggable interface for all sorts of roles and features, youre likely to see
key IBM Information Management products integrated into this toolset such as DB2
Change Management Expert, DB2 Performance Expert, the DB2 Design Studio from
DB2 Warehouse Edition, IBM Optimum suite (from the Princeton Softech
acquisition), and DB2 High Performance Unload, to name a few. Expect to see
some sort of performance management add-ons for IBM Data Studio too. For
example, such a feature could provide SQL statement-level performance details
and historical trend analysis. In addition, data server personnel could have
the ability to report database resource usage in multiple facets such as by SQL
statement, package or collection, application, application server, Java class
name, and more.
Finally, look for expanded functionality in the
administration piece with alter anything capabilities (essentially,
integrating the DB2 Change Management Expert in the toolset): the ability to
view and change registry, database, and database manager configuration
parameters; FTAs for deadlock and timeout events; and other items from the DB2 Control
Center that improve the up-and-running experience for an IBM data server. Remember, these are goals, not commitments.
Wrapping it up
In this article, I tried to describe the vision behind the
newly announced IBM Data Studio. I attempted to outline just how unique and
beneficial our envisioned toolset platform would be for all those involved in
the application life cycle. Specifically, such a toolset would help by:
-
Slashing development time up to 50% with an integrated
data management environment
-
Promoting collaboration across roles to optimize data server and
application performance
-
Accelerating Java development productivity with new pureQuery
data access
-
Simplifying development of applications by implementing industry-specific
XML standards
-
Monitoring data server operation and performance anywhere,
anytime from a Web browser.
IBM Data Studio is expected to simplify and speed the development
of new skills by providing a learn once, use with all supported data servers
toolset thats engineered with an easy-to-use and integrated user interface
thats compatible with the IBM Rational Software development platform.
Finally, IBM Data Studio is expected to accelerate information
as a service by allowing you to develop and publish data-related services as Web
services without programming; and since its Info 2.0 ready, with support for
Web 2.0 protocols and formats, its going to power your applications into the
next technology wave.
With all this excitement about IBM Data Studio, you should
know where to get it. Check out www.ibm.com/software/data/studio
for IBM Data Studio-related FAQs, tutorials, downloads, blogs, and more. A
supporting forum and set of blogs is available at: www.ibm.com/developerworks/forums/dw_forum.jsp?forum=1086&cat=19.
»
See All Articles by Columnist Paul C. Zikopoulos
About the Author
Paul C. Zikopoulos, BA, MBA, is an award-winning writer and
speaker with the IBM Database Competitive Technology team. He has more than 13
years of experience with DB2 and has written more than 150 magazine articles
and is currently working on book number 12. Paul has authored the books
Information on Demand: Introduction to DB2 9.5 New Features, DB2 9 Database
Administration Certification Guide and Reference (6th Edition), DB2 9: New
Features, Information on Demand: Introduction to DB2 9 New Features, Off to the
Races with Apache Derby, DB2 Version 8: The Official Guide, DB2: The Complete
Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A
DBA's Guide to Databases on Linux. Paul is a DB2 Certified Advanced Technical
Expert (DRDA and Cluster/EEE) and a DB2 Certified Solutions Expert (Business
Intelligence and Database Administration). In his spare time, he enjoys all
sorts of sporting activities, including running with his dog Chachi, avoiding
punches in his MMA class, and trying to figure out the world according to Chloë
his daughter. You can reach him at: paulz_ibm@msn.com.
Trademarks
IBM, DB2, DB2 Universal Database, i5/OS, Informix, pureXML, Rational,
Rational Data Architect, WebSphere, WebSphere Application Server, and z/OS are trademarks
or registered trademarks of International Business Machines Corporation in the United
States, other countries, or both.
Other company, product, or service names may be trademarks
or service marks of others.
Copyright International Business Machines Corporation, 2007.
All rights reserved.
Disclaimer
The opinions, solutions, and advice in this article are from
the authors experiences and are not intended to represent official
communication from IBM or an endorsement of any products listed within. Neither
the author nor IBM is liable for any of the contents in this article. The
accuracy of the information in this article is based on the authors knowledge
at the time of writing.