This article contains common different methods of functional partitioning and common considerations for database setup and capacity. Company DBAs, database developers, engineers and architects should consider the pros and cons of any method of sharding or partitioning since compromises will have to be made given the pros and cons of a system setup.
This post contains common different methods of functional
partitioning and common considerations for database setup and capacity. In any
case, design or company DBAs, developers, engineers and architects should
consider the pros and cons of any method of sharding or partitioning. In the end,
compromises will have to be made given the pros and cons of a system setup.
Common Considerations
I have seen many different database system setups and overall
architectures that span the full scope of centralization and decentralization
from a database perspective. In the typical Web 2.0 startup, it can be hard to
determine what the systems will actually do when brought live in production;
that said, we have to start somewhere and base our decisions on many factors.
Some of those factors are budget, horizontal and vertical scaling, projected
growth and projected usage based on trends and other hypothetical information.
In any Web 2.0 company, you might face problems such as data
growth sprits, a massive influx of unpredicted usage either caused by the users
of the product or from an external source. In many cases, I have found that
good database and query design, proper indexing and horizontal functional
partitioning are the main keys for success as a DBA.
MySQL/Database Specific Considerations
As a MySQL professional I typically analyze the following at great
length before, during and after the system is in place:
- Schema
- DDL/DML
- Column type
selection - Indexing
- Compression of the
dataset not -
Referential
integrity - Within InnoDB
- Within the
application -
What
type of locking will you use -
Size
of a MySQL server - Number of users per
host - Number of users per
database - Amount of data per
host - Amount of data per
database -
MySQL
instance(s) - One MySQL server
per host - A number of MySQL
instances per host - Replication
- Statement based
- Row based
- Failover
- Automatic
- Manual
-
Data
Integrity -
Service
Level Agreements - Data loss
- Performance and
uptime - Migration of data
- Backups
- Connection limits and spikes
- Naming conventions for databases
- Multi-tenant, multi-database, multi-instance and or a mixture of the
three
For the sake of this article, we will look further into functional
partitioning and go through a few examples. This type of partitioning offers a
good way to separate numbers of users from each other in a horizontal fashion.
From here, you should have a better understanding of what is possible for your
installation(s) and what to consider when planning functional partitioning.
Single Instance – Single database – Multi-User per Database
Server Host ..............|_____MySQL Process Running on (3306) ........................|____MyDATABASE ................................|_______Table1 ................................|_______Table2 ................................|_______Table3 ................................|_______TableN
In this example, notice we have 1 host (server), 1 MySQL instance
(mysqld process), one database (schema) and multiple tables. In this case
please imagine multiple users sharing the same schema, so, all users will have
their own specific rows throughout each table in the schema.
Overall data size, connections and growth are what I would be
looking at specifically in this and all setups. This setup is beneficial when
you have many users with small datasets or row print within the database
(megabytes worth of data per user). The expected data growth per user in this
setup is expected to be minimal over time and the amount of users would be
expected to grow to the determined capacity and performance of your server.
Gauging the capacity is not a difficult task but it will differ based on your
data and performance metrics.
This setup can be detrimental when a specific user’s data grows
outside the average data needed to support that user. In the event one or a
select few users grow out of scope we will need to migrate the large user(s)
off or migrate all the smaller users to another MySQL server.
Single Instance – Mulitple Databases – One database per User
Server Host ..............|_____MySQL Process Running on (3306) ........................|____MyUSER1_DB ........................|.......|_______Table1 ........................|.......|_______Table2 ........................|.......|_______Table3 ........................|.......|_______TableN ........................|____MyUSER2_DB ................................|_______Table1 ................................|_______Table2 ................................|_______Table3 ................................|_______TableN
In this setup, we are running one MySQL instance on one host and
every user has its own database with its own schema. The flexibility of this
setup is somewhat optimal, operationally speaking, given that the migration of
users from one MySQL server to another would be relatively easy. On the other hand,
you might have to deal with hardware specific limitations such as open file
limitations and operational overhead such as a schema change.
With this setup, it is easy to identify specific users on the
server and gather more granular data on a per user basis. The migration process
for larger users is also simplified given that all of a user’s data is stored
in a unique database. This allows operations to migrate N networks on a 1:1
ratio to another server either for performance or space requirements.
Multi-Instance – Single database – Multi-User Per Database
Server Host ..............|_____MySQL Process Running on (3306) ........................|____MyUSER1_DB ................................|_______Table1 ................................|_______Table2 ................................|_______Table3 ................................|_______TableN ..............|_____MySQL Process Running on (3307) ........................|____MyUSER2_DB ................................|_______Table1 ................................|_______Table2 ................................|_______Table3 ................................|_______TableN ..............|_____MySQL Process Running on (3308) ........................|____MyUSER3_DB ................................|_______Table1 ................................|_______Table2 ................................|_______Table3 ................................|_______TableN
This setup has all the benefits of a single instance, multiple
database, one database per network with the added complexity of multiple
instances. Yep, I said it, added complexity. I have run this setup before on a
few installations and have to say that I would prefer NOT to do this but felt
it necessary to add it to this post.
You can think of each instance as its own mysqld process either
running on a bound IP address or on a different port. In the example above, I
am using multiple ports, 3306, 3307, and 3308.
In this case, we would be able to move N number of users in an
entire instance off to a new server. If you don’t really know how your system
and or users will grow this might be a viable option with the following
constraints:
1. You are planning on
your data and user base getting larger but don’t really have the hardware or
FTE recourses’ or trend analysis to make a good decision.
2. You are planning to
eventually to migrate full instances to their own MySQL server or cluster.
3. You have the
ability and are willing to functionally shard your user base on multiple levels.
Some Conclusions
I am not initially inclined to go with any of the preceding
examples until I have some data on which to base my decision. In any case,
you’ll need to figure out what way works best for you and your specific system.
More than likely, you’ll need to compromise on one or many levels based on your
specific requirements. Below are three of the top requirements I typically run
into before deciding on functional partitioning.
1. Identifying problem
user fast
2. Migration of one to
many databases/schemas/users to new MySQL server
3. System resource
usage and measurement
1. NOTE: we can
eventually tie this into a usage plan for all users
Please note that there are many other ways to shard or partition
your data. These are just examples but the list could go on and on.
Additional Resources
MySQL Partitioning