dcsimg

Sample Database Design

February 7, 2001

Introduction

I decided to start making notes as I designed a database to hold the articles that I am writing. When I started writing, it was easy to keep track of what I submitted. Not that I am writing for more than one publisher and they have different editorial cycles (some as long as 4-5 months) it is becoming increasingly difficult to keep track of what is submitted where and when it will be published. Actually, it is impossible given that I have a full-time job elsewhere and a busy family life.

So, as I design the database and build a front end to keep track of it, I will be writing notes and including code here for people to use if they would like. Since this is a work in progress, the writing may be a little rough. I will not be spending too much time editing my notes as I implement the application.

Before I go any further, I have to give credit to Jerry Pournelle for giving me the idea for this project. I am a long time reader of his columns in Byte (www.byte.com) and occassionally follow the notes he keeps at Chaos Manor (www.jerrypournelle.com).

The Problem

As a writer, I have a substantial number of articles in the works, completed, but not published, as well as completed and published. Recently I also was asked to write an article on NT whereas I have been primarily concerned with writing about SQL Server.

Over the last few weeks, I have spent nearly as much time trying to figure out what articles to submit for publication as I have writing articles each week. As with most applications, I decided to build a quick prototype to organize my thoughts.

Now the first prototype was not using Access, SQL Server, VB, or anything else. I started with a quick notepad document. Since I am also trying to keep my ASP skills sharp, I entered the data in notepad as an ASP document which I could view in a web page. This also allowed me to quickly format the data and reference it from any computer without having to transport a file around. This was my substitute for a paper prototype since my goal is a mostly paperless office.

The initial prototype resulted in the following information for each article:

  • Article Title - Title of the article (working)
  • Synopsis - Brief description
  • Submitted to - published I submitted it to
  • Submitted Date - date I submitted the article
  • Published Title - Actual title the article is published under
  • Publication Date - Date the article was published
  • Copyright Date - Date I mark the article as completed
  • Category - To classify each article (DTS, SoapBox, etc.)
  • Article Type - This was added as I write about NT as well as SQL Server
  • Payment - Not a big deal, but I was curious how I am doing.
Of course, doing this in ASP/HTML results in a really long list with embedded lists inside for other information. It did, however, give me a starting point to begin designing a database.

Analysis

Before creating and tables, I need to examine the list of information above. >From experience, I know that I have not thought of everything, and I know the design will evolve as I learn more about what I need to track and start to request reports from the application. Therefore, I am going to start with the list above and see where this takes me.

When I start to design a database, I look for the places where I can make the design flexible. I try to anticipate changes that might be asked by the end user. I also try to then "break" my design by testing some changes against it to see if it can "handle" them without modification.

How do I then convert the data above into a schema? First I will start at the top and design my "Product" table. I could call this table "Articles", but my products are articles and I am trying to get into the book writing business and I would also argue my book reviews are not really articles, so I will stick with Products. I also may one day decide to speak at some event, these speeches could also be stored in this table.

I will go through the fields one by one and explain how I look at this information. First, Article Title is the main identifier that differentiates each article. Now I tend to like identity fields even though a number of database designers feel this is a poor design. My feelings are that the identity represents a hash to me and whether I store the data in a table and update it each time, get the max(), or use an identity, the result is the same. The identity is also smaller (less storage) joins quicker, and is easy to manage (IMHO).So while Article Title may be unique and will probably be my PK, I will also add an ID field to my first table. This gives me

Product
	-----------
	ProductID
	ProductName (I like this better than title)

The next piece of information is the synopsis. This is a description of the item. My first instinct is to consider whether there will be multiple descriptions for each article. If there are then, this would likely by put into a child table. Since I do not expect my internal description to be a many for each article, I will keep it in the Product table. There may be a different description for each submission, however, so I will start another table for submissions that includes a description as well. Now we have:

Product
	-----------
	ProductID	
	ProductName 	(I like this better than title)

	Submissions
	------------
	ProductID	
	Description

Next we have the "Submitted To" data element. This is definitely not a part of the product. First I have submitted the same article to two different sources (at different times, of course), so this is definitely part of a child table. Second, an element (A) that is a multiple of another element (B) (many submissions per article) as well as having the the other element (B) be a multiple of the original element (A) (the same company may get multiple submissions) is a great candidate for its own table. The "submitted to" is in reality a publisher or a company. An example of this is "Swynk.com" where this article is published. I write many articles that are all submitted to Swynk.

Another way to look at "Submitted To" is to consider this as a lookup table. When I go to submit an article, I will really want a lookup against past companies as well as the chance to add a new company. The result is that this element results in a new table and a new field in the "Submissions" table. Now I have:

Product
	-----------
	ProductID	
	ProductName 	(I like this better than title)

	Submissions
	------------
	ProductID	
	Description
	CompanyID

	Companies
	------------
	CompanyID
	CompanyName
My middle table has become a many-many join table between essentially two lookup tables. What about "submitted date"? This is a one-one for each submission. Therefore we will add this to the Submissions table as another field. Dates are generally not good candidates for lookups or their own tables.

Next we have Published Title which is most likely a one-one for each submission as well. It is possible, however, for a publisher to re-release the article under another title at a later date. I will choose not to implement the many relationship here. My rational is that if an Article were to be re-released, I would add it as another Submission for the same article. I realize that this is a limitation of the design, but I understand the implications and since this will not likely be a design that serves thousands of clients (in my dreams), having a few rows of (mostly) duplicate data is acceptable. If this were something that happened for most of my articles, then I might feel differently. However it will be a relatively rare event, so it goes into Submissions. The same logic goes for the Publication Date. Now we have:

Product
	-----------
	ProductID	
	ProductName 	(I like this better than title)

	Submissions
	------------
	ProductID	
	Description
	CompanyID
	PublicationName
	PublicationDate

	Companies
	------------
	CompanyID
	CompanyName
The Copyright Date is a one-one date for the article. It is essentially the date that I assign to the article as being completed. Therefore this goes into the "Product" table.

Next we have category. This is a classification attribute of the product. Whenever I see something that is used to classify a data element I immediately think "lookup table". Since this is the first classification, I will create a new table to hold this and add a field to the product table. There may be the need to include this field in submissions, but I am not sure, so I will leave it out for now.

Product
	-----------
	ProductID	
	ProductName 	(I like this better than title)
	CopyrightDate
	CategoryID

	Submissions
	------------
	ProductID	
	Description
	CompanyID
	PublicationName
	PublicationDate

	Companies
	------------
	CompanyID
	CompanyName

	Category
	-----------
	CategoryID
	CategoryName

Article Type is the next data element that I have to consider. This is also a classifier for the product. Now, I have to consider the following things. One, If I add this as another category, I have two fields that have FKs to the category table. If I perform a join, I need to be sure to include category twice in the join so I get both items. This can be confusing since I would need something like:

select *
	 from product p, category c1, categoryc2
 	 where p.categoryid = c1.categoryid
	 and p.articletype = c2.articletype
I prefer to make another lookup table to hold article types. This makes for quite a few lookup tables, but I have not had a problem managing thirty or so lookup tables. As with Article above, I changed the name to ProductType.

The last field is payment. Since I get payed by submission, not article, I will include this in the submissions table. I suppose the possibility exists that there could be multiple payments for a submission (not that I have seen this one), but I do not want to track this for now. The final design looks like this:

Product
	-----------
	ProductID	
	ProductName 	(I like this better than title)
	CopyrightDate
	CategoryID
	ProductTypeID

	Submissions
	------------
	ProductID	
	Description
	CompanyID
	PublicationName
	PublicationDate
	Payment

	Companies
	------------
	CompanyID
	CompanyName

	Category
	-----------
	CategoryID
	CategoryName

	ProductType
	-------------
	ProductTypeID
	ProductTypeName

The Design

This has gone on long enough. I will tackle the design next time and include some images of the ERD that I build.

Conclusions

This is the start of a multi-part series that I will be releasing over the next few months. In reality, the entire project will take only a few days, but I need some time to organize the short, quick notes I take as I build this into something easier to read.

Now I realize that quite a few of my regular readers may ask "Where are the abbreviations?" They are coming! This is the analysis phase of this project and an initial table design. As I start to draw this up and build the tables, I will build a table of abbreviations and include them in the actual DDL code. But for the business presentation and analysis, I prefer to keep the longer names. It prevents confusion when explaining to non-technical people. Plus, I hate when non-technical people actually know the names in the DB. They do not use them enough to be familiar and quickly confuse everyone. This is one of the arguements to use long names all the time (see the alternative viewpoint in Abbreviations), but since I really hate typing and business rules and nomenclature can change, I will stick with my abbreviations for code.

As always, I welcome feedback and hope that this series will help some of the more junior designers understand how I approach a database design. This same approach would apply for enhancements to a schema as well.

Steve Jones
January 2001







The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers