Database schema design of company time series data
1
vote
0
answers
338
views
**Purpose**:
To monitor the changes in total capital and paid-in capital of various companies, and identify trends among the different company categories. By determining whether the sum of total capital or paid-in capital is increasing or decreasing for a given category, we can understand which categories of companies are growing or not.
**Data structure**:
I have data for several companies with the following structure:
* company_name: string
* company_id: string
* company_category: string
* date: timestamp
* total_capital: numeric (allows null values)
* paidin_capital: numeric (allows null values)
The "date" field represents the date on which the total capital and paid-in capital were recorded for this company. Different companies may not have the same recording dates.
**Data challenge**:
If I want to compare the trends of categories by month, I will need to create many aggregated records with the same month in order to sum the total capital for that month. For example:
Raw record 1:
* company_name: coop a
* company_category: category A
* total_capital: 1000
* paidin_capital: null
* date: 2022-03-28
Raw record 2:
* company_name: coop a
* company_category: category A
* total_capital: 2000
* paidin_capital: null
* date: 2022-05-12
(After)
New record 1 (edited from Raw record 1):
* company_name: coop a
* company_category: category A
* total_capital: 1000
* paidin_capital: null
* month: 2022-03
New record 2 (edited from Raw record 1):
* company_name: coop a
* company_category: category A
* total_capital: 1000
* paidin_capital: null
* month: 2022-04
New record 3 (edited from Raw record 2):
* company_name: coop a
* company_category: category A
* total_capital: 2000
* paidin_capital: null
* month: 2022-05
Aggregated table:
Aggregated record 1 (aggregated from New record 1):
* company_category: category A
* total_capital: 1000
* paidin_capital: null
* month: 2022-03
Aggregated record 2 (aggregated from New record 2):
* company_category: category A
* total_capital: 1000
* paidin_capital: null
* month: 2022-04
Aggregated record 3 (aggregated from New record 3):
* company_category: category A
* total_capital: 2000
* paidin_capital: null
* month: 2022-05
The above arrangement of data seems very inefficient and will also result in a large number of duplicated records. Is there a better schema structure that I should use if I continue to use a relational database management system (RDBMS) or switch to time-series database engine?
Asked by Planetoid Hsu
(13 rep)
Jan 9, 2023, 03:41 AM
Last activity: Jan 31, 2023, 09:46 AM
Last activity: Jan 31, 2023, 09:46 AM