Relational Databases 101
October 31, 2006
Relational Databases 101 is excerpted from the Addison-Wesley title, Hitchhiker's Guide to Visual Studio and SQL Server.
Buy this book
Many of my readers come from backgrounds that don't include formal training on the best ways to design and create efficient, business-class relational databases. If you arrive here with Microsoft Access or FoxPro experience, you're at an advantageyou know that, for the most part, the process of creating a database is hidden from you by the application's IDEyou just use drag-and-drop or use wizards to build the databases and tables you want. That's not at all bad, but without an in-depth understanding of how to best create, tune, and protect a relational database, I suspect that the relational "normalcy," relational integrity constraints, performance, and scalability of the result might not be particularly stellar. And, more importantly, the data might not be particularly secure. By "normalcy," I mean how well the database conforms to the recognized standards of relational design where database designers attempt to "normalize" databases to at least the third level. If you're not sure how to do this, or even what this means, have no fearI'll explain this later. The SQL gurus with whom I work (like Peter Blackburn, Kimberly Tripp, and a litany of others) are convinced that more problems can be solved by efficient database design than by the cleverest, best-written application front-end.
Getting Started with Solid Database Design
The Microsoft Books Online (BOL) documentation seems to fall a bit short in this important subject, so this chapter might be helpful for those who need a more complete understanding of how to create a best-practice relational database. The problem faced by any database designer is knowing what's going to be stored ahead of timebefore the first table is created. That's always been (and always will be) a problem. As I've said before, a customer rarely knows what they want until they don't get it.
To get started on the right foot, I recommend a good course in relational theory like Extended Relational Analysis. This can do a world of goodbut its depth is well beyond the scope of this book. In courses like this, you learn how to ask the right questions for each "entity" you expect to store in the database.
I also think that using a (big) whiteboard to lay out the database with your team (or customer) can help visualize the data. Getting everyone who is going to consume the data is essential. How I design a database for a single-focused project is very different than the way it's designed for projects where a small multitude of groups expect to consume the data. Admittedly, database development by committee is tough, and one should try to avoid those situations, but leaving town might not be an option.
Before we get into the academics of normalization, let's spend a few moments in quiet contemplation and focus on a few guiding principles. As you design your database, you should keep these basic tenants in mind:
Understanding Relational Database Normalization
In a nutshell, building a "good" optimized, relational database is mostly about normalization. Once you understand the basic principles of normalization, SQL Server should be able to manage your data more efficiently, the applications you write should be able to return data more efficiently, and you'll find it a lot easier to protect your database's data and relational integrity.
So, what is "normalization" and how does it help performance and all that other good stuff? Well, normalization is simply the set of relational database techniques developed to efficiently organize the information you want to manage in a relational database. The academics talk about (at least) five "normal forms," but most database designers stick to the first three forms and seldom go further. The benefit of implementing further levels is usually not that great when compared to the costsespecially in smaller databases.
Here are basic tenants of the first three normal forms.
No, this is not an in-depth discussion of normalization or relational theory, but it's enough to get you started. It's also important to know that many database developers bend these rules from time to time to get more efficiency out of their databases. Sometimes, they add a bit more detail for an entity in a parent table, so it's not always necessary to JOIN to another table just to get one or two bits of information. Yes, these changes mean that the data must be kept current in two different tables, and if someone else comes along and does not realize what's going on....
Understand as well that stored procedures or object-based approaches can (and do) help resolve these issues. By blocking direct access to base tables, developers can write server-side code to dereference the data in the base tables and get away with some tactics that would cause quite a bit of trouble if direct table access were permitted.
Once you have decided what tables you need, you need to use one of the SQL Server or Visual Studio tools to create them (as I illustrate in Chapter 4, "Getting Started with Visual Studio"). But before doing that, I often draw these tables on a whiteboard, which makes it easier to "see" how the data is to be stored and how the tables are related. In Visual Studio, you can use the database diagramming tool to help at this phase, and the ink does not stain your fingers as much.