DB2 9.5 and IBM Data Studio - Things I Couldn't Tell You Before DB2 9.5 Was Announced - Page 2
December 4, 2007
Some of the possibilities with the IBM Data Studio vision
Lets assume youre a retailer and you offer credit cards as a form of payment and thus need to comply with the PCI standard. Lets further assume you use IBM Rational Data Architect (IBM RDA) to model your physical and logical design.
A Moment on PCI:Its outside the scope of this article to delve into the details of the PCI standard. In a nutshell, major credit card companies have mandated that vendors who use their services must comply with the PCI standard. The ultimate goal is to provide coaching to those companies that handle credit card data to protect cardholder information vis-à-vis secure networks, encryption, and so on.
Vendors who fail to comply with the PCI standard are subject to some really hefty fines; in addition, any retailer found to have leaked sensitive cardholder data can be required to submit to external audits or risk losing their right to offer credit card services (from those credit vendors that participate in the program). Most recently, a major US retailers credit card processing company was fined $880,000 in the summer of 2007 and will continue to be fined $100,000/month until compliance is met. The IT challenges posed by the PCI standard are significant: recent studies estimate that only 40% of the major retailers in North America are compliant!
There are 12 parts to the PCI standard. An example of just one of its declarations is that credit card numbers on any report for those without a need to know has to put Xs instead of the credit card numbers leading digits (except the last block of digits).
For example, a billing report:
Another declaration states that PIN codes such as those you enter to access your account online must be represented with *s, ·s, or some masking symbol.
Another specification (Part 3.1) notes that the storage of cardholder data should be kept to a minimum, and that network diagrams that illustrate all connections to cardholder data (Part 1) are required too. There are a lot of other declarations within this standard, such as how to handle test data. You can learn more about the PCI standard at: www.pcisecuritystandards.org/.
So lets assume you are building a retail database. What if you could specify in the IBM RDA data model that a specific column contains credit card numbers, or that another column contains PINs? If you could, then a governance tool could know that these two fields need to be governed by the PCI standard and represent these columns in your business glossary, the governance solution could automatically put fine-grained access control (FGAC) rules in place to mask the data whenever these columns are accessed (except for applications that are specifically authorized to bypass these rules).
Consider another scenario for this same retailer. The PCI standard notes that live customer data cannot be used as test data. This means that real credit card numbers, real addresses, real names, and so on, cannot be used. If you do, you can be fined. Now your IT department is faced with a challenge to create a test database for the development organization. This is a perfect example of different roles being able to leverage the same toolset to solve a business problem. The DBA has to change the real data such that when it goes into the test database it is PCI-compliant. But what would happen if the DBA didnt do this but instead loaded up a new sample database with this data? Typically, nothing would happen, which could lead to some large fines. Now imagine a scenario where that same governance tool from the previous paragraph could detect that PCI-compliant data was being loaded into a test database. If that tool could flag this violation at data movement time, it would be a very valuable asset.
Now lets assume that you want to extract all the records in your database for an advertising campaign but are not governed by the PCI standard. However, your enterprise has its own set of privacy rules that guarantee clients that their information wont be distributed to other vendors without their permission. Now expand this scenario to the data model. If the data server understands that column A is the opt-in or opt-out field to make your home address visible, then every time a query is run against the data server for a campaign extract, the data server can consult this field to see if the address should be returned to the application or not.
In addition to this retail scenario, there are lots of other examples from other industries. The Health Information Protection and Portability Act (HIPPA) is a medical privacy initiative that pretty much says you cant show medical records in reports (similar to the PCI standard for credit card information). If a column has personal medical history information, that column can only be shown to the patients physician. You cant show it to just anyone who has DBA privileges. Same idea -- different industry.
The goal is to make the design layer smart enough to understand the different kinds of fields that have these kinds of characteristics. That will provide you with a very efficient environment.
Now, dont let all my talk about the tooling and the role identification fool you; IBM is still focusing on the client-side runtime APIs because if you cant run the application in a robust and high-performing manner, there is no sense in having any tooling whatsoever. We plan to enhance our already strong API story with a new suite of tools around the jobs and roles of the personnel that actually do the work.
Note: The DB2 and Studio.NET experience has had a long and rich history of deep integration. Since Visual Studio is the IDE where .NET developers tend to work, IBM has provided plug-ins into this environment since the DB2 UDB Version 8.1.2 release. Ive written extensively on this particular integration and this API and IDE, so it isnt discussed in this article.
Heres my attempt to express IBMs intentions for the new IBM Data Studio:
The goal of a comprehensive solution for the entire life cycle of an application is unique and ambitious. Im talking about everything from early up-front design of business processes and logical data models for all of your applications, to the time when you start to design, code, and test those applications. But the process isnt finished at that point. You then have to create a physical database model including their schemas; you have to allocate storage. You have to perform a lot of security work. You have to accommodate for the day-to-day management of these systems such as backup and recovery or meeting service level agreements. Finally, dont forget all the governance work you have to perform (a growing percentage of the work required in the application life cycle) to satisfy the regulatory compliance decrees placed upon your business and more.
While there are lots and lots of tools out there that address part of the application life cycle, there are no other single toolsets that currently cover the full life cycle. Indeed, as I look across the market, most tools are piece-part solutions, which means you need two or more of them to address the methodology I outlined in this article.
Quite frankly, the IBM Data Studio vision is an ambitious undertaking, but one with extremely high value and high potential for our clients since no one else really has this vision. Whats more, the vision has the potential to grow beyond the set of relational IBM data servers that IBM Data Studio currently supports. There would be a lot of value in a cross-data server toolset to solve the inefficient conundrum with respect to how enterprises align their data server personnel today: skills and resources tend to be aligned and constrained in direct correlation to a database vendor. In my opinion, this hurts productivity. A cross-brand full life cycle toolset would lead to better control over staffing, better control over resource allocation, and an unbeatable edge over the competition!
How the year 2007 will finish up for IBM Data Studio
IBM has now announced the general availability of IBM Data Studio and best of all its free for everyone. Although the first iteration of this IDE is far from the full life cycle vision Ive discussed in this article, some valuable stuff is being delivered. For example, Data Studio contains a subset of some of the functionality in IBM RDA. (See Part 3 in this series for an example of the overview data dependency diagrams, which allow you to see tables and their relationships, and so on.) It also has entity-relationship diagrams, a simple data distribution viewer (which allows you to see your datas distribution), and more.
From a development perspective, youll get most of the things that youre used to seeing as a developer, such as building SQL and XQuery statements, routines, and so on. In addition to what its predecessor, DB2 9 Developer Workbench, gave you, IBM Data Studio has a set of rich enhancements designed to make developers even more productive. Theres also a new Web services framework called IBM Data Web Services. Finally, a lot of features were added for DBA activities. From a governance perspective, IBM Data Studio gives you the ability to work with security roles.
Obviously, I cant detail all the features in this article because thats what the point of this series is but the following figure should give you an idea of the things you can do free of charge with IBM Data Studio and an IBM data server:
The capabilities outlined in the previous figure are all free. Since the toolset is extensible, you can buy add-ons that add new features to context menus or enable disabled features (in the same way that you can add to the base tooling provided by vendors such as Quest or Embarcadero). For example, when IBM Data Studio became generally available, IBM announced two such chargeable add-ons: IBM Data Studio Developer and IBM Data Studio pureQuery Runtime, which are geared towards the incredibly cool Java enhancements that go with pureQuery (a subject for a future installment of this series).
Where is IBM Data Studio heading?
In this section, I outline some of the top priorities for the IBM Data Studio. (Of course, these are not commitments but rather goals.)
The first goal is to reduce the footprint associated with this toolset and, since client-side offerings arent tied to server releases, IBM Data Studio can have its own availability schedule thats more aggressive than that of a data server.
In addition, DB2 or Informix IDS must be seen as the unquestionable choice for IBM WebSphere Application Server. The debut of pureQuery is a great first step here, but expect to see initiatives along other integration points too. Some possible examples include JDBC Capture to enable pureQuery for any Java program, significant performance monitoring and problem determination aids, openJPA support for pureQuery, Spring, iBatis, and so on. Also being considered is tight integration with Object Grid (SQL syntax for caching, DB2 or Informix IDS persistence options, replication from DB2 or Informix IDS to Object Grid, and more).
Since IBM Data Studio now represents a standard toolset with a pluggable interface for all sorts of roles and features, youre likely to see key IBM Information Management products integrated into this toolset such as DB2 Change Management Expert, DB2 Performance Expert, the DB2 Design Studio from DB2 Warehouse Edition, IBM Optimum suite (from the Princeton Softech acquisition), and DB2 High Performance Unload, to name a few. Expect to see some sort of performance management add-ons for IBM Data Studio too. For example, such a feature could provide SQL statement-level performance details and historical trend analysis. In addition, data server personnel could have the ability to report database resource usage in multiple facets such as by SQL statement, package or collection, application, application server, Java class name, and more.
Finally, look for expanded functionality in the administration piece with alter anything capabilities (essentially, integrating the DB2 Change Management Expert in the toolset): the ability to view and change registry, database, and database manager configuration parameters; FTAs for deadlock and timeout events; and other items from the DB2 Control Center that improve the up-and-running experience for an IBM data server. Remember, these are goals, not commitments.
Wrapping it up
In this article, I tried to describe the vision behind the newly announced IBM Data Studio. I attempted to outline just how unique and beneficial our envisioned toolset platform would be for all those involved in the application life cycle. Specifically, such a toolset would help by:
IBM Data Studio is expected to simplify and speed the development of new skills by providing a learn once, use with all supported data servers toolset thats engineered with an easy-to-use and integrated user interface thats compatible with the IBM Rational Software development platform.
Finally, IBM Data Studio is expected to accelerate information as a service by allowing you to develop and publish data-related services as Web services without programming; and since its Info 2.0 ready, with support for Web 2.0 protocols and formats, its going to power your applications into the next technology wave.
With all this excitement about IBM Data Studio, you should know where to get it. Check out www.ibm.com/software/data/studio for IBM Data Studio-related FAQs, tutorials, downloads, blogs, and more. A supporting forum and set of blogs is available at: www.ibm.com/developerworks/forums/dw_forum.jsp?forum=1086&cat=19.
About the Author
Paul C. Zikopoulos, BA, MBA, is an award-winning writer and speaker with the IBM Database Competitive Technology team. He has more than 13 years of experience with DB2 and has written more than 150 magazine articles and is currently working on book number 12. Paul has authored the books Information on Demand: Introduction to DB2 9.5 New Features, DB2 9 Database Administration Certification Guide and Reference (6th Edition), DB2 9: New Features, Information on Demand: Introduction to DB2 9 New Features, Off to the Races with Apache Derby, DB2 Version 8: The Official Guide, DB2: The Complete Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A DBA's Guide to Databases on Linux. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Cluster/EEE) and a DB2 Certified Solutions Expert (Business Intelligence and Database Administration). In his spare time, he enjoys all sorts of sporting activities, including running with his dog Chachi, avoiding punches in his MMA class, and trying to figure out the world according to Chloë his daughter. You can reach him at: firstname.lastname@example.org.
IBM, DB2, DB2 Universal Database, i5/OS, Informix, pureXML, Rational, Rational Data Architect, WebSphere, WebSphere Application Server, and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Copyright International Business Machines Corporation, 2007. All rights reserved.
The opinions, solutions, and advice in this article are from the authors experiences and are not intended to represent official communication from IBM or an endorsement of any products listed within. Neither the author nor IBM is liable for any of the contents in this article. The accuracy of the information in this article is based on the authors knowledge at the time of writing.