I am working on a SaaS application using AWS MySQL.
The application is for multiple organizations. We use the same database and tables to store multiple org data. We store
org_id
in most of the tables and use this in select, update and delete queries as where org_id = ?
.
### DB Model
* **Organization**
org_id
* **User**
user_id
org_id
* **client**
client_id
org_id
* **client_contact**
client_id
phone_number
phone_type
* **project**
project_id
org_id
* **user_project**
project_id
user_id
The data is growing rapidly and we need to shard the data. We have customers from all over the world.
If I use a different sharding id for different table, then a join might happen across the nodes. For example, for user, sharding can be on shard_id
and for client, sharding can be on client_id
and for project, sharding can be on project_id
. Here, the problem would be joining project and user. The user_id
and project_id
might be on different nodes and join will happen across nodes.
What is the best approach here? I am thinking of sharding based on org_id
as I store org_id
in most of the tables.
I see two problems here:
1. A few child tables don't store org_id
as the parent table is storing. Do I need to store the org_id
in all the tables?
2. Some orgs might have more data and load which might lead to a hot spot and more storage on a particular node. Is it possible scale a particular node alone with AWS RDS?
Please suggest the best approach.
**Note** Each org might have up to 500,000 records in any table with around 10 columns. We could reach more than 5,000 organizations in 6 months. 500,000 multiplied by 5,000 orgs will reach 2.5 billion records.
Asked by user1578872
(79 rep)
May 16, 2021, 03:37 PM
Last activity: Jun 21, 2025, 05:04 AM
Last activity: Jun 21, 2025, 05:04 AM