Scaling from Multiple Database to Single Database Architecture in SQL Server
1
vote
2
answers
184
views
My application is centered around self-contianed "workspaces". For many really good reasons (everything from management to security), we have always had a one-database-per-workspace architecture. Each database has identical schema, stored procedures, triggers, etc. There is a "database of databases" that coordinates all of this. Works great.
The problem: scalability. It was recently proposed that a customer might want to have 100,000 workspaces. Obviously this is a non-starter for one SQL instance. Plus, each workspace might be rather small, but there'd also be a very wide size distribution - the biggest workspace could be 100x the size of the _median_. The top 1% of workspaces could easily constitute 90+% of the rows across all workspaces.
I'm looking for options for rearchitecting things to support this scenario, and here are some things I've considered and the issues I see with each.
- Keep the multi-database architecture but spread across multiple SQL instances. The problem is cost (both administrative and infrastructure). If we stick to a limit of 1,000 DBs on each instance, that's still 100 instances, spread across who knows how many actual VMs. But since so many of the workspaces will be small (much smaller than our current average), the revenue won't nearly scale accordingly. So I think this is probably out of the question and I'm focusing now on single-database architectures.
- Every workspace shares the same tables, indexed by workspace ID. So every table would need a new workspace ID column and every query needs to add the workspace condition in the WHERE clause (or more likely every real table is wrapped in an inline table-valued function that takes the WorkspaceID; anyway...) The primary key of every table would also have to be redefined to include the workspace ID since not every PK now is globally unique. Programming-wise this is all fine, but even with proper indexing and perfect query design (and no, not all our queries are perfect - the dreaded row scan still happens on occasion) is there any conceivable way this could perform as well - for everyone - as separate databases? More specifically can we guarantee that small projects won't suffer from the presence of big projects which could be taking up 100x more rows than the small ones? And what specific steps would need to be taken, whether it be the type of index to use or how to write queries to guarantee that the optimizer always narrows things down by workspace ID before it does literally anything else?
- Partitioning - from what I've read, this doesn't help with query performance, and it appears MS recommends limiting tables or indexes to 1000 partitions so this also won't help.
- Create the same set of tables but with a new schema for each workspace. I thought of this because there are no limits to the number of tables a database can have other than the overall 2G object limit. But I haven't explored this idea much. I'm wondering if there would be performance concerns with 100,000 schemas and millions of tables, views, stored procs, etc.
With all that, here is the specific question -
What specific features of SQL Server, and/or general strategies, including but not limited to things I've considered, would be most useful for maintaining a large number of self-contained data sets with identical schemas in a single giant database? To reiterate, maintaining performance as close as possible to a multi-database architecture is of top priority.
And needless to say, if any part of my assessment above seems incorrect or misguided I'd be glad to be corrected. Many thanks.
Asked by Peter Moore
(113 rep)
Aug 17, 2023, 05:30 PM
Last activity: Aug 20, 2023, 06:23 PM
Last activity: Aug 20, 2023, 06:23 PM