Download the
code for this article.
This is part 15 of a series. If you have not read part 14,
you can find it here,
but this article does not require any knowledge of the previous work in the
series.
Distribution: Spreading Tables Across Multiple Databases
When you design a database from scratch, you probably put
all the related tables in the same database. This allows for such
database-level validations as foreign keys and simple two-part JOINs (schema.table).
However, there are often reasons that you have to deal with tables that are
related but spread across multiple databases.
History is often a factor, but there are other
possibilities. Deployment, manageability, and security concerns can affect
which data is on which database.
In this series’ current quest to understand costs of
different architectures, we will look at what cost is paid when tables are separated
into multiple databases on the same server. We will not look at separate-server
scenarios because the number of variables is too high. However, the scripts
that accompany this article could be modified to run separate-server scenarios
if someone were looking for results for a specific set of servers. We’re also
not recording the timing differences for the INSERT statements. These records
may be created separately, or they may be created as part of a single
transaction. Since all stored procedure calls are coming from outside of the
core OLTP databases (similar to an application making calls into a database),
the differences might be zero (where records are created one-at-a-time). If the
records are created as a group, the differences would vary widely depending on
the implementation.
Building the Tests
Test Introduction
In previous installments of this series, I introduced three
tables that I will reuse here: Card, Vendor, and Purchase. The DDL for the tables is the same for each layout except
-
the name is different (Card1 for layout1, Card2 for layout2,
etc.). - the database is different depending on the layout.
-
tables in the same database do have foreign keys where
separated tables do not have foreign keys.
CREATE TABLE dbo.Card1 (
CardID int NOT NULL IDENTITY,
CardNumber1 bigint NOT NULL,
CardNumber2 bigint NOT NULL,
CardNumber3 bigint NOT NULL,
CardNumber4 bigint NOT NULL,
SecurityCode1 smallint NOT NULL,
SecurityCode2 smallint NOT NULL,
SecurityCode3 smallint NOT NULL,
SecurityCode4 smallint NOT NULL,
SecureString1 varchar(36) NOT NULL,
SecureString2 varchar(36) NOT NULL,
SecureString3 varchar(36) NOT NULL,
SecureString4 varchar(36) NOT NULL,
CONSTRAINT Card1_PK PRIMARY KEY CLUSTERED (CardID)
)
CREATE TABLE dbo.Vendor1 (
VendorID int NOT NULL IDENTITY,
BusinessName varchar(40) NOT NULL,
Address1 varchar(40) NOT NULL,
Address2 varchar(40) NOT NULL,
City varchar(40) NOT NULL,
[State] char(2) NOT NULL,
ZipCode varchar(10) NOT NULL,
DateCreated datetime NOT NULL
CONSTRAINT Vendor1_DateCreated_DF DEFAULT GETUTCDATE(),
DateLastUpdated datetime NOT NULL
CONSTRAINT Vendor1_DateLastUpdated_DF DEFAULT GETUTCDATE(),
StatusCd tinyint NOT NULL
CONSTRAINT Vendor1_StatusCd_DF DEFAULT 1,
CONSTRAINT Vendor1_PK PRIMARY KEY CLUSTERED (VendorID)
)
CREATE TABLE dbo.Purchase1 (
PurchaseID int NOT NULL IDENTITY,
CardID int NOT NULL,
VendorID int NOT NULL,
Amount money NOT NULL,
DateCreated datetime NOT NULL
CONSTRAINT Purchase1_DateCreated_DF DEFAULT GETUTCDATE(),
CONSTRAINT Purchase1_PK PRIMARY KEY CLUSTERED (PurchaseID),
CONSTRAINT Purchase1_Card_FK FOREIGN KEY (CardID)
REFERENCES dbo.Card1 (CardID),
CONSTRAINT Purchase1_Vendor_FK FOREIGN KEY (VendorID)
REFERENCES dbo.Vendor1 (VendorID)
)
The basic idea of the test here is that there are three
“spreads” tested and compared:
- Layout 1: All three tables on database DB1
- Layout 2: Card and Purchase on DB1, Vendor on DBv
- Layout 3: Purchase on DB1, Vendor on DBv, Card on DBc
Pictures help significantly here.
The test script was run from the “_test” database and called
stored procedures in DB1, DBv, and DBc as appropriate.
Each run of the test script performs all of the following
steps:
-
Truncate or DELETE FROM all tables and reset identity seeds where
appropriate - Create 10,000 Card records
- 10,000 records in SqlCreditDB1.dbo.Card1
- 10,000 records in SqlCreditDB1.dbo.Card2
- 10,000 records in SqlCreditDBc.dbo.Card3
- Create 10,000 Vendor records
- 10,000 records in SqlCreditDB1.dbo.Vendor1
- 10,000 records in SqlCreditDBv.dbo.Vendor2
- 10,000 records in SqlCreditDBv.dbo.Vendor3
- Create 50,000 Purchase records
-
50,000 records in SqlCreditDB1.dbo.
Purchase1 -
50,000 records in SqlCreditDB1.dbo.
Purchase2 -
50,000 records in SqlCreditDB1.dbo.
Purchase3 - DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS
- Run UPDATE STATISTICS on all nine tables
-
Read 500,000 random Purchase1 records (JOINed to Card1 and Vendor1)
for Layout 1 (pass 1: ignored) -
Read 500,000 random Purchase1 records (JOINed to Card1 and Vendor1)
for Layout 1 (pass 2: elapsed time recorded) -
Read 500,000 random Purchase2 records (JOINed to Card2 and Vendor2)
for Layout 2 (pass 1: ignored) -
Read 500,000 random Purchase2 records (JOINed to Card2 and Vendor2)
for Layout 2 (pass 2: elapsed time recorded) -
Read 500,000 random Purchase3 records (JOINed to Card3 and Vendor3)
for Layout 3 (pass 1: ignored) -
Read 500,000 random Purchase3 records (JOINed to Card3 and Vendor3)
for Layout 3 (pass 2: elapsed time recorded)
Given a standard workload (SELECTs only), what will the
penalty be for moving the tables into separate databases? You probably have an
idea, so write down your guesses now. Layout 1 is the baseline (call it “+0%”).
Layout 2: +___%. Layout 3: +___%.
Results
Here are the results of four tests. The integers (times) are
in milliseconds.
Test |
Layout 1 |
Layout 2 |
Penalty |
Layout 3 |
Penalty |
1 |
21,470 |
22,780 |
5.75% |
23,983 |
10.48% |
2 |
21,550 |
22,923 |
5.99% |
24,030 |
10.32% |
3 |
21,580 |
22,860 |
5.60% |
24,080 |
10.38% |
4 |
21,580 |
22,923 |
5.86% |
24,030 |
10.20% |
Average |
21,545 |
22,871 |
5.80% |
24,031 |
10.35% |
The times and penalties are very consistent from test
to test.
Conclusion
Spreading tables across multiple databases is done for many
reasons, usually very specific to a particular environment and set of
applications. When tables are separated, there is a performance penalty, but
the numbers are relatively small. However, if your tables are currently spread,
consider the performance gain available if you were to pull the tables into a
common database.
What are your
thoughts here? What are the reasons that your particular
application has multiple databases? Take a minute and drop a comment in the
forum.
Next month: The Cost of Bloat.
Download the
code for this article.