How to efficiently manage large amount data in Web apps
1
vote
0
answers
87
views
I am developing an application that needs to store a large amount of product data and display it in a web app. The data is generated every day, and each product will have approximately 100 rows per day, resulting in a total of approximately 1 million rows for the entire application per day.
One item looks like following:
{
"ProductId": "P-1",
"ReportDate": "2023-12-12",
"Type": "T-1",
"Usage": 100
}
Requirements:
- store data from now back 3 years
- performance for the last 3 months must be very fast
- user can choose any date range from now to 3 years back, *i.e. 2023-12-01 - 2023-12-12 or 2020-12-12 - 2023-12-12.*
The data is displaying in the table. The table contains a list of products with pagination, set to a default of 50 but can be changed up to 100. Each product is expandable, revealing an inner table with the product's data.
**Now, the problem I am facing is** that one day has 100 rows per product. So, for one page with 50 items, I need to load 50 * 100 rows. However, the date range makes it more complicated because a longer range means more data is loaded. Essentially, the data is not loaded directly to the web app; it is aggregated in memory, and the web app still displays only 100 rows per product. Since the date range depends entirely on the user, I don't have aggregated data by date range combinations.
Main query looks like following:
SELECT ProductId, Type, SUM(Usage) FROM c WHERE c.ProductId = "P-1" AND c.ReportDate >= '2023-12-01' and c.ReportDate <= '2023-12-12' GROUP BY c.ProductId, c.Type
Now, I am exploring how to handle this. Where should I store it? Our application uses Azure SQL, and some data is stored in CosmosDB.
Initially, I considered storing the data in CosmosDB. However, the data has dependencies on Azure SQL, such as the product name. Since the name can change, and these changes must be reflected in the product's data, I don't think it's a good idea to store them together. Additionally, the data is filtered by permissions stored in Azure SQL. For these reasons, I am leaning towards using Azure SQL. The potential issue is the size of the data; the table will be approximately 750 GB for three years of data.
I am also considering storing the last 3 months of data in Azure SQL as 'hot data,' and storing all remaining data, including those 3 months, in a different storage, perhaps CosmosDB, as 'cold data.
Moreover, I am considering sharding, but I'm unsure if it's suitable when I need access to some common data.
Please, do you have any advice on how to approach this?
Thanks for any advice.
Asked by Flow
(11 rep)
Dec 12, 2023, 06:21 PM