There has never been a more busy time for data architects, database administrators and database developers than now. Programs and initiatives require more creative solutions and in the push to implement solutions that cost less and do more, some design fundamentals are being dropped from the development lifecycle. Read on to learn more.
Okay, so here we are in the 21st century, there has never been a
more busy time for data architects, database administrators and database
developers. There are more and more projects, programs and initiatives underway
that require us to deploy more creative solutions. There are so many new and
improved software solutions making large amounts of data immediately accessible
for cloud computing, federating more data that was once buried within legacy
applications and enabling data to become more agile.
However, in the push to implement solutions that cost less and do more, it
has been my experience that some design fundamentals were left out of the
development lifecycle. More specifically, database design fundamentals. Why? It
may be a result of aggressive project time frames, the decrease in performance
times for service level agreements or the reduction in the window for batch
processing. The root cause is yet to be determined.
The fact is that as projects go through the software development lifecycle
(SDLC), there are a number of tasks and activities that we as data architects
and database administrators should always incorporate that will ensure database
applications are designed within scope, cost and service level agreements for
performance. These tasks or activities are integral parts of specific layers of
the solutions architecture. What are those layers? The data layer and the
application layer. The tasks here can make or break the database application. They
are the fundamental building blocks to an effective database design where
refactoring and rework is reduced to significantly low levels. These tasks
enable the construction phase to be completed timely with little or no
guesswork about the design. It should be noted that one of the more important
threads throughout the execution of these tasks is documentation. It cannot be
emphasized enough how important documenting the design is. This point will be
discussed further in this article. So, let's look at some of the tasks that
should be incorporated to effectively design a database application system.
The Data Layer
This is traditionally the center of all database design activities. Once the
logical data model has been created, the database administrators are engaged to
create the physical data model, the database and the related objects. The work
that is involved in creating the physical data model is focused primarily on
translating the logical data model into an artifact that can be used to
generate the database definition language (DDL) and of course the database,
taking into account the features and constraints of the DBMS whether it's
Oracle, SQL Server or UDB. During the work effort to create the physical data
model, the team should be mindful of the project scope and business service
level agreements related to performance and accessibility. Where does this
information exist? It should be defined in the business requirements and
functional specification documents for the project. These documents will define
the type of database needed. So it should be pretty clear that the database
application being designed is data warehouse, a transactional database or a
data mart. Using this set of documentation and periodic reviews with the
business analyst(s), the database administrators and application development
team should be able to create a physical data model, the database and related
objects in phases.
The first phase is to ensure that all existing database standards are
applied to the physical data model. This includes naming conventions for
columns, primary and foreign key constraints, tables and indexes. The second
phase is to review the detailed business requirements and functional
specification documents to align the physical data model to data accessibility
requirements. For example, if the database application being designed is part
of a decision support system or a business intelligence framework, having
immediate access to data is a high priority. Elements to be considered should
include the use of surrogate keys for uniquely identifying each row of data and
natural keys (including composite keys) for the way data is accessed through
common business data dependent functions. Also included is the cardinality of
the columns that will be a part of the alternate keys. Namely, if a composite
key is required, the column with the highest cardinality must be the first
column in the index followed by the other columns such that the column with the
lowest cardinality is the last column of the index. This ensures that when this
specific index is used, that a smaller result set of data is created that is
specific to the request instead of a much larger result set with only a few
rows qualified for the request. Let's also not forget that during these
activities, the metadata repository is being augmented with information
generated from the physical data model design effort. The third phase is to
organize the tables in the physical model based on performance, scalability and
availability requirements. What this means in practice is organizing tables,
indexes, tablespaces and containers that possess certain characteristics. At
one end of the spectrum, tables that include small data volume, frequent access
and low volatility (i.e. updated infrequently) should be grouped together in
the same tablespace. In this way, the data accessed in these tables can be
brought into and kept in memory or cache thus reducing read I/O. At the other
end of the spectrum, tables that will possess large data volumes that must be
processed and made available within limited windows of time should be
partitioned. This ensures that these tables are segmented into manageable
chunks such that the data can be spread across multiple CPUs. The application
and data management utilities can now leverage parallel processing and to some
extent grid computing to ensure that performance SLAs are met and CPU-intensive
operations are completed within scheduled timeframes on large data volumes when
needed.
The Application Layer
Before I begin, I need to state that the details discussed here are specific
only to good coding practices for SQL. So now that I have cleared that up, let's
talk about the components of the application layer that actually touch the data
and the related tasks.
Designing the data layer is only part of the overall effort. Remember, the
database design is only as good as the application that uses it. Here too the
business requirements and functional specification documents provides
information that is used to design and build out the application layer. These
documents contain the specifications for how the data will be used.
Additionally, the design of the application layer should be collaborative. The
DBA, application developers and business analyst should be part of the design
team for the application layer. The creation of the application layer should
employ a top-down approach; the top being the scope of the project, with each
lower lever defining the components that will become the integral parts of the
application. This is an effective way of defining the tasks of creating an
application by deconstructing the business requirements and functional
specifications in order to define and create the processes required for the
project deliverables. The other set of tools needed is a set of guiding
principles that governs application design. The theme for these guiding
principles is that the application must be maintainable and efficiently use
resources.
Within this framework, the team, mindful of the project scope, should begin
by defining the application processes that will be required to provide a
particular set of services. Once this is done, the last set of tasks should
focus on identifying the data needs for each application process and determine
the best way to access and/or process that data. This means effectively using
resources to retrieve and process the data. It's about employing best practices
for coding SQL statements. For example, defining the specific columns to be
used in the "SELECT" statement instead of retrieving the entire row
by using "SELECT *" statements (only retrieve what is needed),
comparisons in the "WHERE" clause with columns that are of the same
data type and definition, limited use of negative logic in the
"WHERE" clause.
Documentation
So now all that all these activities have been completed, let's talk about
documentation. In my experience, this is always the weakest area in the design
phase of SDLC. It could be that as the deadline for delivery looms, it's often
considered as one of the products that is not mandatory and can always be
completed at a later date. Here's the thing, each phase of the SDLC creates
artifacts and documentation is just that, a project artifact. This
documentation pulls together the entire design effort. It aligns the design of
the data layer and the application layer to the business requirements and the
functional specification documents. Design documentation provides the
explanations of decisions made. It is also the forerunner of the production
runbook completed in the deployment phase of the project.
The Last Word
These activities are a cohesive set that work towards a common goal. That is
to ensure that the solution performs within scope and established service level
agreements with maintainability and scalability at its core.
»
See All Articles by Columnist
Denise Rogers