Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
1
votes
1
answers
175
views
reasonablility about having a bridge table between three or more tables for roles, more efficient alternatives
One security officer can work at many offices. Depending on the office that the security officer is working on that day, the officer can have a different role. I modeled entities Officers, Offices, and Roles. I modeled the requirement with bridge table Officers_Offices_Roles_Bridge that has composit...
One security officer can work at many offices. Depending on the office that the security officer is working on that day, the officer can have a different role.
I modeled entities Officers, Offices, and Roles. I modeled the requirement with bridge table Officers_Offices_Roles_Bridge that has composite key (officer_id, office_id, role_idj).
For example, an officer working in the Manhattan office on Mondays has junior security specialist role, and working in the Bronx office on Tuesdays has dispatcher role.
What is reasonable and unreasonable about a bridge table between three or more tables?
What are more efficient alternatives?
dd_novice
(11 rep)
Apr 21, 2023, 02:00 PM
• Last activity: Jul 8, 2025, 10:58 AM
0
votes
1
answers
939
views
Periodic snapshot fact table with monthly grain - Question on dimensions
I am fairly new to data-warehousing but have read my fair share of books and online tutorials to gain an introductory understanding of the basic components that build up a data warehouse. My end goal is to be able to do a headcount of employees at the end of the month, therefore I designed my fact t...
I am fairly new to data-warehousing but have read my fair share of books and online tutorials to gain an introductory understanding of the basic components that build up a data warehouse.
My end goal is to be able to do a headcount of employees at the end of the month, therefore I designed my fact table as a monthly snapshot where each end-of-month I will populate my fact table (one row per employee per month).
My dimension table includes a
DimEmployee
table which is an SCD2 (any change made to an employee's information causes a new row to be introduced in the data source and I am keeping track of the most recent employee information via an is_current
flag set to 'Y' or 'N').
My question is: do I need to update the dimensions daily and only insert whatever data I have at end-of-month into the fact table? Or do I update the dimensions end-of-month as well?
Thank you in advance!
Alistair
(141 rep)
Apr 16, 2017, 08:01 PM
• Last activity: Jun 3, 2025, 12:03 PM
2
votes
1
answers
1169
views
Dimension modelling for HR with Employee Dimension and multiple departments in a Data warehouse
What is the best way to configure a dimension model (preferably star schema) when we have the following requirements? 1. There is an Employees table (25 attributes) where we are required to make some of the attributes to SCD2. For e.g. Salary, LastSalaryIncreaseDate, LastBonusAmount, LastBonuesDate,...
What is the best way to configure a dimension model (preferably star schema) when we have the following requirements?
1. There is an Employees table (25 attributes) where we are required to make some of the attributes to SCD2. For e.g. Salary, LastSalaryIncreaseDate, LastBonusAmount, LastBonuesDate, Designation. We don't have to maintain the reporting hierarchy.
2. There are different Departments. Every Department is head by exactly a single Department head (Employee).
3. An Employee may belong to multiple Departments and vice versa.
4. Monthly payroll information is required to maintain for every Employee.
**Understanding and Questions**
1. Should we split the Employees entity into two considering only 5/25 of the attributes are SCD2 based (historical)?
2. I suppose there is a bridge table required for the Employee and the Departments. So every employee must have an attribute (e.g. DepatementGroupCode) showing multiple departments in the bridge table. Correct?
3. There is a direct relationship between employees and the Department. So Department will have the attribute EmployeeKey in it. How do I deal with SCD2 changes of employees with respect to the Department entity?
4. The payroll periodic Fact entity will be linked only with the Employee and the date dimension. This should not be linked with the Department because it is already linked with the Employee entity...Please correct my understanding.
Irfan Gowani
(21 rep)
Jul 8, 2021, 07:39 PM
• Last activity: Mar 10, 2025, 07:06 PM
0
votes
1
answers
80
views
Must fact tables be designed for aggregation?
I have been reading up on Microsoft SQL Server's columnstore indexes recently and I have become concerned that my experience does not match that of the industry. Columnstore indexes are designed to make aggregates easy. It is very frequently said that _for this reason_ they are an ideal fit for fact...
I have been reading up on Microsoft SQL Server's columnstore indexes recently and I have become concerned that my experience does not match that of the industry. Columnstore indexes are designed to make aggregates easy. It is very frequently said that _for this reason_ they are an ideal fit for fact tables.
This make me uneasy. **In my experience, fact tables are not aggregated**. The column list for every fact table I have ever made has been of the form "primary key followed by several dimension columns, followed by every bit of info we store about the primary key". They then get dumped into some BI tool like PowerBI or ThoughtSpot, so non-technical people can filter based on the dimensions and find out whatever they want to know. They might aggregate from time to time, but this certainly isn't the main use of the table. I expect such tables to have well over 100 columns.
Is the table design and access pattern I am describing what is expected of a fact table? Or am I describing an entirely different type of table, the name of which I am ignorant?
J. Mini
(1237 rep)
Dec 1, 2024, 01:30 AM
• Last activity: Dec 1, 2024, 03:26 AM
0
votes
0
answers
28
views
Figuring out dimensional modeling requirements for OLAP in a database containing many pre-aggregated fact tables
I have the situation where we have an application that uses a PostgreSQL database for OLAP queries for reporting (less DWH/exploratory). I am trying to find an optimal database design for this workload. The initial incoming transaction data is preaggregated using an ETL pipeline and aggregations for...
I have the situation where we have an application that uses a PostgreSQL database for OLAP queries for reporting (less DWH/exploratory). I am trying to find an optimal database design for this workload.
The initial incoming transaction data is preaggregated using an ETL pipeline and aggregations for different dimensions are streamed into 'per-aggregation' fact tables where each deals with slightly different dimensions.
The requirements for this ETL Pipeline is to be near-realtime and as such something like a DWH with a nighly batch refresh of a single fact table with a star/snowflake schema was unfortunately not an option .
These pre-aggregated fact tables looks as follows:
CREATE TABLE IF NOT EXISTS hourly_revenue
(
hour TIMESTAMP(6) WITH TIME ZONE,
outlet_id UUID,
gross_revenue NUMERIC(18,3),
net_revenue NUMERIC(18,3),
PRIMARY KEY (outlet_id, hour)
);
CREATE TABLE IF NOT EXISTS hourly_article_revenue
(
article_group_name character varying(30) NOT NULL,
article_name character varying(30) NOT NULL,
hour timestamp(6) with time zone NOT NULL,
plu character varying(255) NOT NULL,
outlet_id uuid NOT NULL,
base_price numeric(18,3),
line_totals numeric(18,3),
sum_quantity numeric(18,3),
CONSTRAINT hourly_article_revenue_pkey PRIMARY KEY (outlet_id, article_name, plu, article_group_name, hour)
);
CREATE TABLE IF NOT EXISTS hourly_waiter_revenue
(
hour timestamp(6) with time zone NOT NULL,
outlet_id uuid NOT NULL,
waiter_name character varying(30) NOT NULL,
waiter_number character varying(10) NOT NULL,
gross_revenue numeric(18,3),
net_revenue numeric(18,3),
CONSTRAINT hourly_waiter_revenue_pkey PRIMARY KEY (outlet_id, waiter_name, waiter_number, hour)
);
and others.
The application supports multi-tenancy which is based on the outlet_id
discriminator column. Currently there are no plans for partitioning, but this may change.
I am trying to do performance benchmarking on certain types of planned queries which would take these preaggregated hourly facts and then additionally perform ad-hoc aggregations for non-specific time frames (i.e. days to months), and have run into issues with the current design regarding the bloat of multiple fact tables (e.g. highly bloated primary keys) which screams denormalization of dimensions.
The worst case scenario is for there to be entries in each of these fact tables for each over for up to 2 years - with around 50000 outlets this yields between 700M and several billions of rows for each "fact" table depending on any additional dimensions. The grain of these individual facts is always hourly.
To combat this, I believe I will need some degree of dimensional modeling to better support the types of queries that will be run (queries that impact <1day of data are rather trivial).
Taking the hourly_article_revenue
fact table as an example, I would have done something like the following:
1. Extract the article dimension into a separate dim_article
table, give it a surrogate ID.
2. Add a unique constraint over the current business definition over uniqueness: (outlet_id, article_name, plu, article_group_name)
.
3. Reference the dim_article
surrogate key as a foreign key in the relevant fact table.
repeating this process for the rest of the tables and their dimensions (e.g. waiters).
However, this approach smells somewhat... *off* due to the fact that - theoretically, I only require one fact table which is sourced from the pre-aggregated data with the highest number of dimensions (and then just drill-up based on the specific requirements). For example, the following two queries sourced from different fact tables:
SELECT outlet_id, date_trunc('month', hour)
FROM hourly_revenue
WHERE outlet_id = '...'
and:
SELECT outlet_id, date_trunc('month', hour)
FROM hourly_article_revenue
WHERE outlet_id = '...'
GROUP BY outlet_id
are essentially semantically equivalent, at which point I may as well have just used the raw transaction facts (notwithstanding the near-realtime requirement of data freshness).
The obvious drawback is that I cannot necessarily know all the required dimensions beforehand, and as such the fact table would necessarily need to add more dimensions as time goes on, which would be a monumental maintenance burden as opposed to creating a new fact table pertaining to the dimensions that are relevant for those key insights given by a new ETL resultset.
Is there a more general approach for this kind of modeling with multiple "fact" tables? Do those principles still hold if the different fact tables are - essentially - drilled up versions of other fact tables?
Lastly: is there any further reading I can do on this topic? I have read Kimball et al but I do not have it near my person at the moment and so cannot reference it at this time to verify...
user991710
(111 rep)
Nov 22, 2024, 05:49 PM
• Last activity: Nov 25, 2024, 09:34 AM
0
votes
1
answers
44
views
ideas to reduce association tables while building a data warehouse
I'm trying to create a data warehouse for my BI project. Initially I have a raw database that contains 100 data tables, most of them are association tables. My goal is to create normalized dimensional modelling in order to centralize my data the most possible but it seems impossible. Here is an exam...
I'm trying to create a data warehouse for my BI project. Initially I have a raw database that contains 100 data tables, most of them are association tables. My goal is to create normalized dimensional modelling in order to centralize my data the most possible but it seems impossible.
Here is an example use case : imagine having a firmware, device and many more tables with many to many relationship with a user table, this way association tables would be created automatically to save every device or firmware created by users ,I want to get rid of these association tables in order to create more centralized dimension/fact tables ,is there any proposition and a solution for this specific use case?
Jalel MED
(1 rep)
Jun 20, 2024, 11:19 PM
• Last activity: Jun 24, 2024, 04:28 PM
0
votes
2
answers
643
views
Replace primary key with unique constraint and other clustered index to improve query performance without doubling table size
I've got a SQL Server 2019 with a large fact table (currently ~100 million rows but will most likely be >1 billion within a year and continue to increase for a long time). The primary objective is to make it efficient to query the table by a date range (done via power bi). The table definition (omit...
I've got a SQL Server 2019 with a large fact table (currently ~100 million rows but will most likely be >1 billion within a year and continue to increase for a long time).
The primary objective is to make it efficient to query the table by a date range (done via power bi).
The table definition (omitting a bunch of stuff not relevant for the question):
CREATE TABLE [fact].[dv](
[system_id] [int] NOT NULL,
[system_row_id] [bigint] NOT NULL,
[timestamp] [datetime2](3) NOT NULL,
[equipment_id] [int] NOT NULL,
[event_id] [int] NOT NULL,
[event_value] [int] NOT NULL,
[date] AS cast([timestamp] as date) PERSISTED,
constraint pk__fact_dv primary key clustered(system_id, system_row_id)
)
Example query:
select
x,y,z
from fact.dv
where [date] = '20231001'
and system_id = 20
and equipment_id in (1,2,3)
and event_id in (5,6,7)
The current design is useless for querying by date so my first thought was to create a non-clustered compound index with the date as the leading column. Since I need to be able to query for all columns this would effectively double the size of my largest table and in practice double the size of the database.
My second idea (and this is where I need input from people more experienced than me) is to:
* Replace the current primary key with a unique constraint (and keept the not null constraints on the two columns).
* Use a clustered compound index with date as the leading column.
If I'm correct this solution would let me:
* Keep the current data integrity (assuring that none of system_id or system_row_id is null and that their combination is unique).
* Make it efficient to query the table by date.
* Leave me with roughly the same table size (mainly adding the size of the index that supports the unique constraint)
Are there any drawbacks with this solution except for the fact that you would react on not seeing a primary key on the table?
Are any of my assumptions wrong?
Any other suggestions?
Martin Riddar
(196 rep)
Oct 24, 2023, 08:40 AM
• Last activity: Nov 16, 2023, 11:38 AM
1
votes
1
answers
978
views
My Fact table has data weekly but it is not visible in SSAS cube
I have my fact table that has data in weekly basis, so I am mapping with Week dimension to pull records in Piviot table. My ssas cube doesnt show all data that is in fact. Should I add any specific scope to show in cube. for Example, My fact has data for week 201839 as well as 201840 (Sep month week...
I have my fact table that has data in weekly basis, so I am mapping with Week dimension to pull records in Piviot table. My ssas cube doesnt show all data that is in fact. Should I add any specific scope to show in cube.
for Example, My fact has data for week 201839 as well as 201840 (Sep month weeks)
but in cube I can see only for one week 201839 but not for 201840
Sandhya Balakrishnan
(41 rep)
Apr 3, 2019, 08:51 AM
• Last activity: Sep 26, 2023, 07:02 PM
1
votes
0
answers
1109
views
Dimensional Model for CRM Sales Opportunity and Sales Leads: Fact or Dimension?
I'm working for a business lender, and trying to come up with a basic dimensional model around the subject area of loan approvals. We use a CRM, (Salesforce) and one of the core objects is a (Sales) "Opportunity". Sales usually start off as a lead, if we get traction on a lead, we will create an acc...
I'm working for a business lender, and trying to come up with a basic dimensional model around the subject area of loan approvals.
We use a CRM, (Salesforce) and one of the core objects is a (Sales) "Opportunity".
Sales usually start off as a lead, if we get traction on a lead, we will create an account, at least one contact, and an "Opportunity" which is a potential loan (our business calls it a "Deal"). All of the loan approval process happens under a particular "Opportunity". I know that Account and Contact would be good dimension candidates, I am unsure of Lead and Opportunity though.
I'm interested in opinions on which category Leads and Opportunities belong in: **Fact or Dimension**?
Here is an example of some opportunity fields that may be relevant to a data warehouse:
lang-sql
CREATE TABLE
dbo.Opportunity
(
OpportunitySK BIGINT NOT NULL
IDENTITY(1,1)
CONSTRAINT
PK_dbo_Opportunity
PRIMARY KEY CLUSTERED
, AccountId NCHAR(18) NULL
, ACTUAL_AMT_COLLECTED DECIMAL(18,2) NULL
, ACTUAL_AMT_OFFERED DECIMAL(18,2) NULL
, ActualDailyPayment DECIMAL(18,2) NULL
, ACTUAL_MCA_TERM NVARCHAR(10) NULL
, ActualPaymentFrequency NVARCHAR(255) NULL
, ApprovalAmount DECIMAL(18,2) NULL
, APPROVAL_DATE DATETIME NULL
, ApprovalExpirationDate DATETIME NULL
, BrokerAccount NCHAR(18) NULL
, CreatedDate DATETIME NOT NULL
, CreditAnalyst NCHAR(18) NULL
, CreditProcessor NCHAR(18) NULL
, CreditSubmittalDate DATETIME NULL
, DocsInDate DATETIME NULL
, DocsOutDate DATETIME NULL
, eNoahCompleteDate DATETIME NULL
, ExternalId NVARCHAR(30) NULL
, FundDate DATETIME NULL
, FundedAmount DECIMAL(18,2) NULL
, Funder NCHAR(18) NULL
, Id NCHAR(18) NOT NULL
, InternalCreditAnalyst NCHAR(18) NULL
, LenderAccount NCHAR(18) NULL
, MerchantNumber NVARCHAR(32) NULL
, OfferInDate DATETIME NULL
, OfferOutDate DATETIME NULL
, OwnerId NCHAR(18) NOT NULL
, PhysicalAddress1 NVARCHAR(1300) NULL
, PhysicalAddress2 NVARCHAR(1300) NULL
, PhysicalAddressCity NVARCHAR(1300) NULL
, PhysicalAddressState NVARCHAR(1300) NULL
, PhysicalAddressZip NVARCHAR(1300) NULL
, PreQualDate DATETIME NULL
, RecordTypeId NCHAR(18) NULL
, RequestedAmount DECIMAL(18,2) NULL
, StageName NVARCHAR(40) NOT NULL
, StatementsInDate DATETIME NULL
, Type NVARCHAR(40) NULL
, UpForFundingDate DATETIME NULL
, WC_CreditDecisionRecord NCHAR(18) NULL
, YearsInBusiness DECIMAL(18,0) NULL
, SystemModstamp DATETIME NOT NULL
, AccountExternalId NVARCHAR(100) NULL
, ACH VARCHAR(5) NULL
, ACH_Debit VARCHAR(5) NULL
, ACH_DebitAmount DECIMAL(18,2) NULL
, ACH_QUALIFIED VARCHAR(5) NULL
, ACH_StartDate DATETIME NULL
, Amount DECIMAL(18,2) NULL
, AmountOffered DECIMAL(18,2) NULL
, Approved VARCHAR(5) NULL
, BalancePaid100PercentDate DATETIME NULL
, BalancePaid50PercentDate DATETIME NULL
, BalancePaid60PercentDate DATETIME NULL
, BalancePaidatRenewal DECIMAL(18,2) NULL
, BalancePaidPercent DECIMAL(18,2) NULL
, BalanceRemaining DECIMAL(18,2) NULL
, BalanceRemainingPercent DECIMAL(18,2) NULL
, BOFI_Deal NVARCHAR(255) NULL
, BOFI_RestrictedState NVARCHAR(1300) NULL
, BookingStarted VARCHAR(5) NULL
, BrokerTier NVARCHAR(255) NULL
, CampaignID NCHAR(18) NULL
, ClosedLostDate DATETIME NULL
, CloseDate DATETIME NOT NULL
, CompanyDBA NVARCHAR(1300) NULL
, ContractNumber NVARCHAR(32) NULL
, DateEstablished DATETIME NULL
, DateonDocs DATETIME NULL
, DaysbwAccountActivationDatePrequal DECIMAL(18,0) NULL
, DaysbwActivationDateFundDate DECIMAL(18,0) NULL
, DaysbwApprovalDateFundDate DECIMAL(18,0) NULL
, DaysbwDocsinDateFundDate DECIMAL(18,0) NULL
, DaysInApprovalStage DECIMAL(18,0) NULL
, DaysInOfferOutStage DECIMAL(18,0) NULL
, DeclinedDate DATETIME NULL
, Deposit DECIMAL(18,2) NULL
, EqAmount DECIMAL(18,0) NULL
, EqAmtLeased DECIMAL(18,2) NULL
, EqExperience NVARCHAR(32) NULL
, EqHasQuote VARCHAR(5) NULL
, EqLeasedBefore VARCHAR(5) NULL
, EqNew VARCHAR(5) NULL
, EqNumVendors NVARCHAR(255) NULL
, EqTerm DECIMAL(18,0) NULL
, EqUsed VARCHAR(5) NULL
, EqWhat NVARCHAR(64) NULL
, EqWhen NVARCHAR(128) NULL
, EqWhy NVARCHAR(64) NULL
, EqWithWhom NVARCHAR(64) NULL
, EquipCostForActual DECIMAL(18,2) NULL
, EquipCostPaidByLenderForActual DECIMAL(18,2) NULL
, EquipCostPerDocRequest DECIMAL(18,2) NULL
, EquipmentCondition NVARCHAR(255) NULL
, EquipmentGrandTotalSummary DECIMAL(18,2) NULL
, EquipmentSeller NVARCHAR(255) NULL
, EquipmentStory NVARCHAR(255) NULL
, EquipmentYear NVARCHAR(255) NULL
, Fax NVARCHAR(1300) NULL
, FinalFundDate DATETIME NULL
, FiscalQuarter INT NULL
, FiscalYear INT NULL
, IsClosed VARCHAR(5) NOT NULL
, IsSplit VARCHAR(5) NOT NULL
, IsWon VARCHAR(5) NOT NULL
, LeadSource NVARCHAR(40) NULL
, LeaseNumber NVARCHAR(32) NULL
, LegalName NVARCHAR(1300) NULL
, LegalStatus NVARCHAR(1300) NULL
, LendersLeaseNumber NVARCHAR(32) NULL
, LendingTreeId NVARCHAR(9) NULL
, LendVantageCommission DECIMAL(18,2) NULL
, LendVantageId NVARCHAR(255) NULL
, LoanStartDate DATETIME NULL
, LoanStartDateVerified VARCHAR(5) NULL
, Locked VARCHAR(5) NULL
, MailingAddrDisplay NVARCHAR(1300) NULL
, MailingAddress1 NVARCHAR(1300) NULL
, MailingAddressCity NVARCHAR(1300) NULL
, MailingAddressState NVARCHAR(1300) NULL
, MailingAddressZip NVARCHAR(1300) NULL
, MarketingSubmissionDate DATETIME NULL
, MaturityDate DATETIME NULL
, MerchantID NVARCHAR(32) NULL
, MonthlyGrossCC_Volume DECIMAL(18,2) NULL
, MonthlySales DECIMAL(18,2) NULL
, MonthlyVisaMC_Volume DECIMAL(18,2) NULL
, MonthsInBusiness DECIMAL(18,0) NULL
, Name NVARCHAR(120) NOT NULL
, NF_DeclineDate DATETIME NULL
, OpportunityID18 NVARCHAR(18) NULL
, PartnerAccountId NCHAR(18) NULL
, PastDueBalance DECIMAL(18,2) NULL
, PaymentsBehind DECIMAL(18,0) NULL
, PaymentsBounced DECIMAL(18,0) NULL
, PaymentsCollected DECIMAL(18,0) NULL
, Phone NVARCHAR(1300) NULL
, Probability DECIMAL(18,0) NULL
, ProcessorUserID NVARCHAR(32) NULL
, RateReviewCompleted VARCHAR(5) NULL
, Ratio NVARCHAR(32) NULL
, RenewalNumber DECIMAL(18,0) NULL
, ResubmitDate DATETIME NULL
, SplitFunding NVARCHAR(255) NULL
, SplitOpportunity VARCHAR(5) NULL
, SplitOpportunitywith NCHAR(18) NULL
, Status NVARCHAR(255) NULL
, SubmissionDate DATETIME NULL
, SyndicationDate DATETIME NULL
)
Bee-Dub
(41 rep)
May 18, 2019, 12:32 AM
• Last activity: Feb 28, 2023, 01:50 PM
0
votes
1
answers
1781
views
Handling multiple dates on fact table with performance concerns
I am working on a simple data marts and has once again stumbled with the handling date and time. [![enter image description here][1]][1] This is a design on a vehicle maintenance form records to collect the status record/remarks for analytics/visualisation. There are several date and time columns in...
I am working on a simple data marts and has once again stumbled with the handling date and time.
This is a design on a vehicle maintenance form records to collect the status record/remarks for analytics/visualisation. There are several date and time columns in this fact table.
My questions are
1. Fact table should be narrow and long, and not wide. With this many date and time columns, it makes the table wide. If I query a

LEFT JOIN
with Vehicle.dim
and VehMaintenance.fact
to consolidate all details and output in the frontend, wouldn't the performance be horrible? Now imagine I have even more dimensions to join, then doing a LEFT JOIN
onto the fact table
will make it even more wider. I think my design is horribly wrong.
2. Vehicle.dim
is linked back to Date.dim
. I read somewhere that dimension table
should not link back to another unless it is a snowflake schema
and if it is a parent / child
. Is it still fine to link both dimension table
like this?
Take note that my fact table
will be batch updated on a 15 minutes basis from a data warehouse. It will always be an INSERT
query to insert records (there will be multiple same records, except the date will be different on these same records). There will be NULL
in some of the date columns
.
user3118602
(147 rep)
Sep 5, 2022, 02:05 PM
• Last activity: Sep 7, 2022, 12:30 PM
0
votes
1
answers
90
views
How do we assign FK to a newly loaded data in fact table?
While reading up on SQL, I got confused on how a newly loaded data into the fact table will get its FK. Say using the example below (image taken from Google): [![enter image description here][1]][1] [1]: https://i.sstatic.net/vaODI.png This is my thoughts on how the data is loaded 1. `SaleID` is aut...
While reading up on SQL, I got confused on how a newly loaded data into the fact table will get its FK. Say using the example below (image taken from Google):
This is my thoughts on how the data is loaded
1.

SaleID
is auto generated (eg incremental) to give a unique ID to the new row
2. SalesQuantity
, SalesPrice
, SalesAmount
, ReceiptID
, TimeStamp
are loaded into the factSales
The question is how does this newly loaded data knows what FK (DateKey
, CustomerKey
, StoreKey
, ProductKey
) it should add? My understanding of a dimension table is that it describe the data in factSales
.
For example, dimDate
will have pre-populated date data up to year 2050. If a date data is loaded into factSales
, where it should be in TimeStamp
with a data such as 2020-05-01 13:00
, how can the FK1 DateKey
know how to auto assign it to the date 2020-05-01? The same goes for other dimensions such as CustomerKey
, where there will already exist some customer data.
Thank you.
user3118602
(147 rep)
Sep 1, 2022, 03:05 PM
• Last activity: Sep 1, 2022, 11:08 PM
1
votes
1
answers
1823
views
How to deal with fact table data that needs to be version controlled?
I have the following simplified `sport_match` 'fact' table: | match_id | tournament_id | player_id_p1 | player_id_p2 | p1_final_score | p2_final_score | |----------|---------------|--------------|--------------|----------------|----------------| | 1 | 1 | 1 | 2 | 1 | 0 | | 2 | 1 | 1 | 2 | 3 | 1 | |...
I have the following simplified
sport_match
'fact' table:
| match_id | tournament_id | player_id_p1 | player_id_p2 | p1_final_score | p2_final_score |
|----------|---------------|--------------|--------------|----------------|----------------|
| 1 | 1 | 1 | 2 | 1 | 0 |
| 2 | 1 | 1 | 2 | 3 | 1 |
| 3 | 2 | 3 | 2 | 2 | 3 |
| 4 | 2 | 3 | 2 | 4 | 0 |
The table is updated from an API that issues INSERT
, UPDATE
and DELETE
SQL instructions via text files.
Occasionally there is a mistake in the scores and because I need to be able to run historical analyses from a specific point in time I need to capture the incorrect entry and the correct entry. For this reason I started to look at adopting a Slowly Changing Dimension Type 2 method and translating all the API instructions to INSERT
. This would give me a table that looked like this:
| match_key | match_id | tournament_id | player_id_p1 | player_id_p2 | p1_final_score | p2_final_score | start_date | current_flag |
|-----------|----------|---------------|--------------|--------------|----------------|----------------|------------------|--------------|
| 1 | 1 | 1 | 1 | 2 | 1 | 0 | 01/01/2000 00:00 | Y |
| 2 | 2 | 1 | 1 | 2 | 3 | 1 | 02/01/2000 00:00 | Y |
| 3 | 3 | 2 | 3 | 2 | 2 | 3 | 03/01/2000 00:00 | Y |
| 4 | 4 | 2 | 3 | 2 | 4 | 0 | 04/01/2000 00:00 | N |
| 5 | 4 | 2 | 3 | 2 | 4 | 1 | 04/01/2000 00:01 | Y |
However, I realised I was applying a 'dimension' principle to a 'fact' table.
Is this a viable approach or should I be looking at a different design?
Jossy
(83 rep)
Jun 6, 2022, 08:28 PM
• Last activity: Jun 6, 2022, 09:04 PM
1
votes
1
answers
518
views
How to handle NULL Dates in an "Accumulating fact table"
One type of fact table in Dimensional modeling is `Accumulating Snapshot fact Table`. if you think you need to review the meaning and definition of this type , you can take a look at this article : https://www.holistics.io/blog/the-three-types-of-fact-tables/. Imagine we have an accumulating fact ta...
One type of fact table in Dimensional modeling is
Accumulating Snapshot fact Table
. if you think you need to review the meaning and definition of this type , you can take a look at this article :
https://www.holistics.io/blog/the-three-types-of-fact-tables/ .
Imagine we have an accumulating fact table which has three important date column in it
'Order_Datekey , Manufacturing_Datekey , Ship_Datekey'
When the table is being inserted for the first time ,there is no data for columns Manufacturing_Datekey and Ship_Datekey'
. We only know the date in which the product has been ordered (Order_date). What I need to know is that how we can handle null values for Date columns ?
The important point I read in Kimbal book is that null should be avoided in a foreign key column.
'Referential
integrity is violated if you put a null in a fact table column declared as a foreign key
to a dimension table'
On the other hand we don't have any value for these two date columns at first. What do you suggest?
Thanks in advance
Pantea
(1510 rep)
Jun 28, 2021, 10:04 AM
• Last activity: Jun 28, 2021, 08:00 PM
1
votes
1
answers
59
views
Type 2 Dimension for a fact measure
I have a need to store products along with their prices and quantities (and many other things) and track the history. Now, given both price and quantity are stored as integers, it is my understanding these are fact measures. So we might have a table like: f_products: id sku price quantity As the pri...
I have a need to store products along with their prices and quantities (and many other things) and track the history.
Now, given both price and quantity are stored as integers, it is my understanding these are fact measures. So we might have a table like:
f_products:
id
sku
price
quantity
As the price varies over time we can insert a new fact, and I can introduce effective dates to the table to allow for querying the price or quantity at a given time.
But with this design, it means if the quantity changes, I must insert another row, repeating the unchanged price value again. And while this isn't too bad for 2 columns, I have in fact 10.
So, my design is:
f_products:
id
sku
price:
id
product_id
price
start_date
end_date
another "dimension":
id
product_id
value
start_date
end_date
and so on
Now the "fact" no longer needs to change when the price changes, and I can query for historical values across these tables.
So my question is, this doesn't seem to fit the standard fact / dimension designs, so I wonder how more experienced modellers would approach these requirements.
sksamuel
(113 rep)
Feb 23, 2021, 02:11 AM
• Last activity: Feb 23, 2021, 02:49 AM
0
votes
0
answers
53
views
Distinguishing between fact and dimension tables in data warehouse
Is it true that facts in a fact table of a data warehouse always have a relation to time? I.e. is it true that a fact must always be assignable to a specific point in time? If this is correct, can we conclude that dimension elements (except for maybe a time-dimension) do not need to be assignable to...
Is it true that facts in a fact table of a data warehouse always have a relation to time? I.e. is it true that a fact must always be assignable to a specific point in time?
If this is correct, can we conclude that dimension elements (except for maybe a time-dimension) do not need to be assignable to a specific point in time?
Thanks for your help!
Kevin Südmersen
(101 rep)
Aug 11, 2020, 04:26 PM
1
votes
3
answers
854
views
The right way to model multiple FACTs schema
**Background** I'm in a process of designing a database (using a STAR schema). There are three tables to model: *products*, *tests*, *states*. The database will be used to store results of *tests* conducted on *products* (in a great simplification). There can be many *tests* pointing out to a single...
**Background**
I'm in a process of designing a database (using a STAR schema).
There are three tables to model: *products*, *tests*, *states*.
The database will be used to store results of *tests* conducted on *products* (in a great simplification). There can be many *tests* pointing out to a single product but each *test* is unique (they are not shared among *products*). Besides, I need to record the current *state* of the product, at the time when the *test* was conducted. Let's assume that the *state* of a *product* describes its current location and owner, which are changing very often. That will most likely involve SCD lvl 2 - to track the history of *state* changes and to be able to locate a *product* with all its *tests* as well as the *states* it had during these *tests*.
**Problem**
I'm not entirely sure how to model this problem. It seems obvious to store every *test* in a FACT table. This table would then consist of thousands of transactions. On the other hand, there will also be hundreds (and later thousands) of *products*, so I should probably keep them in a second FACT table. Then, there will also be thousands of *state* changes, so in order to record their entire history, I would need to keep them in a ... FACT table as well? I've been told that FACT tables are typically used to store multiple-rows data but on the other hand where are the DIMs in this model?
I also don't know how to model the relationships between these tables. *Products* - *states* is a 1:* relationship. *Products* - *tests* is a 1:* as well. Finally, *states* - *tests* is a 1:* too. I would then go with linking *products* to *states* and then *states* to *tests* (*products* 1<-* *states* 1<-* *tests*), what would allow me to find all *states* for a particular *product* and all *tests* (in all *states* or in a selected *state*). What do you think about that? The problem here is that, as I keep adding *states*, I have two options: either keep duplicating *products* in the *products* table (with added "recorded_timestamp" column) or use a SCD lvl 2 in *states* table, pointing out to the *products* table with a FK, but this would effectively make the *product* table a DIM!
Any help here would be very appreciated.
JustGiveMeData
(113 rep)
Jul 12, 2020, 10:14 PM
• Last activity: Jul 17, 2020, 01:12 PM
0
votes
1
answers
52
views
Deciding what values should be used in a data warehouse dimension?
I'm beginning to look at data warehousing and I'm coming across examples where, for instance, we have check payment transactions. I understand that this should be the "fact" and we would have dimensions around it that would describe the fact (account, product, etc.). However, it also contains things...
I'm beginning to look at data warehousing and I'm coming across examples where, for instance, we have check payment transactions. I understand that this should be the "fact" and we would have dimensions around it that would describe the fact (account, product, etc.). However, it also contains things like "memo line" and string "transaction id" values which, as strings, don't really belong in the fact table.
My question is, should there be a separate payment "meta" dimension that contains these descriptors, or is it ok to say that something like the memo can't be reported on since it's different every time, so we should leave it out of the warehouse. I know it's somewhat of an abstract question, but I'm not finding much on this in my reading. Any suggestions would be greatly appreciated.
jordaniac89
(103 rep)
Apr 29, 2020, 02:49 PM
• Last activity: Apr 29, 2020, 04:52 PM
0
votes
0
answers
62
views
Trying to do a fact and dim table in Sql
so far this is what i have code wise, but im getting an error 1239 . Is it not referencing something correctly . CREATE TABLE CustomerDim ( C_CustomerID_Cu INT(10) NOT NULL, C_FacilityReservationNumber_Cu INT(10) NOT NULL, I_EventCode_Cu INT(10) NOT NULL, CONSTRAINT C_CustomerID_Cu FOREIGN KEY (C_Cu...
so far this is what i have code wise, but im getting an error 1239 . Is it not referencing something correctly .
CREATE TABLE CustomerDim
(
C_CustomerID_Cu INT(10) NOT NULL,
C_FacilityReservationNumber_Cu INT(10) NOT NULL,
I_EventCode_Cu INT(10) NOT NULL,
CONSTRAINT C_CustomerID_Cu FOREIGN KEY (C_CustomerID_Cu)
REFERENCES CustomerDIM
);
CREATE TABLE RoomDim
(
C_RoomNumber_Ro INT(10) NOT NULL,
C_RoomType_Ro INT(5) NOT NULL,
C_RegionID_Re INT(4) NOT NULL,
C__ResortCode_Fa INT(10) NOT NULL,
CONSTRAINTC_RoomNumber_Ro FOREIGN KEY(C_RoomType_Ro)
refereREFERENCES nces RoomDIM
);
CREATE TABLE ReservationFact
(
C_RoomReservationNumber_Rr INT (10) NOT NULL,
C_CustomerID_Cu INT (10) NOT NULL,
C__RoomType_Ro INT (10) NOT NULL,
C_D_ArrivalDate_Rr DATE,
C_EmployeeNumber_Em INT (5) NOT NULL
C_ResortCode_Rs INT (10),
CONSTRAINT C_RoomReservationNumber_Rr_pk PRIMARY KEY
(C_RoomReservationNumber_Rr)
);
masongirl
(1 rep)
Mar 30, 2020, 07:31 PM
• Last activity: Mar 30, 2020, 09:35 PM
0
votes
0
answers
221
views
How to handle table that has an id which references multiple dimension tables?
I'm having a hard time figuring out how to create a lookup table or a table that references multiple dimension tables. I've got a "tracking" table that tracks the number of impressions for an e-commerce website. Each type of impression: manufacturer, product, evaluation is contained is a single obje...
I'm having a hard time figuring out how to create a lookup table or a table that references multiple dimension tables. I've got a "tracking" table that tracks the number of impressions for an e-commerce website. Each type of impression: manufacturer, product, evaluation is contained is a single objectid column, which references the corresponding dimension table for either manufacturer, product, or evaluation.
What I am struggling with is:
- manufacturer impressions = manufacturer + product + evaluation impressions
- product impressions = product + evaluation impressions
I need to be able to query all of the manufacturer impressions, which is a combination of the 3 types of impressions above. If I just query those that match on manufacturer, I will be leaving out the impressions that match on product and/or evaluation. The same thing goes for product impressions.
What is the best way to create a reference table or to be able to query all manufacturer impressions? I need to be able to write a query that combines objectid = manufacturerid + objectid = productid (which has a corresponding manufacturerid) + objectid = evaluationid (which has a corresponding productid and then a corresponding manufacturerid that links to the productid).
Here's a sample query I'm using, but it's not returning any rows when I add in the second join statement:
select * from tracking t
left join products p on t.objectid = p.productid and t.objecttype = 'Products'
left join evaluations e on t."ObjectID" = e."ProductEvaluationID" and t.[ObjectType] = 'Evaluations'
left join manufacturers m on t.objectid = m.manufacturerid and t.objecttype='Manufacturers'
Any advice is greatly appreciated, thanks!
danieltisdale
(1 rep)
Mar 2, 2020, 07:37 PM
• Last activity: Mar 2, 2020, 08:58 PM
1
votes
0
answers
174
views
How does one determine the number of FACT tables that are required in a data-warehouse?
When analysing data sources that need to be included in the data-warehouse (Kimball), how does a developer determine whether a schema should be a star type or constellation type? Business processes which can be reflected at one of 3 levels (transaction, periodic snapshot or accumulating snapshot). H...
When analysing data sources that need to be included in the data-warehouse (Kimball), how does a developer determine whether a schema should be a star type or constellation type?
Business processes which can be reflected at one of 3 levels (transaction, periodic snapshot or accumulating snapshot). However, there can exist business processes where the captured data is effectively identical, at any of these levels for any number of these processes.
What are the remaining considerations? (Performance..)
GettingItDone
(109 rep)
Oct 17, 2019, 03:40 PM
• Last activity: Oct 17, 2019, 06:28 PM
Showing page 1 of 20 total questions