Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes

1 answers

175 views

reasonablility about having a bridge table between three or more tables for roles, more efficient alternatives

One security officer can work at many offices. Depending on the office that the security officer is working on that day, the officer can have a different role. I modeled entities Officers, Offices, and Roles. I modeled the requirement with bridge table Officers_Offices_Roles_Bridge that has composit...

                                  One security officer can work at many offices. Depending on the office that the security officer is working on that day, the officer can have a different role.

I modeled entities Officers, Offices, and Roles. I modeled the requirement with bridge table Officers_Offices_Roles_Bridge that has composite key (officer_id, office_id, role_idj).

For example, an officer working in the Manhattan office on Mondays has junior security specialist role, and working in the Bronx office on Tuesdays has dispatcher role. 

What is reasonable and unreasonable about a bridge table between three or more tables?

What are more efficient alternatives?

dd_novice (11 rep)

Apr 21, 2023, 02:00 PM • Last activity: Jul 8, 2025, 10:58 AM

0 votes

1 answers

939 views

Periodic snapshot fact table with monthly grain - Question on dimensions

data-warehouse dimensional-modeling facttable slowly-changing-dimension

I am fairly new to data-warehousing but have read my fair share of books and online tutorials to gain an introductory understanding of the basic components that build up a data warehouse. My end goal is to be able to do a headcount of employees at the end of the month, therefore I designed my fact t...

                                  I am fairly new to data-warehousing but have read my fair share of books and online tutorials to gain an introductory understanding of the basic components that build up a data warehouse.

My end goal is to be able to do a headcount of employees at the end of the month, therefore I designed my fact table as a monthly snapshot where each end-of-month I will populate my fact table (one row per employee per month).

My dimension table includes a DimEmployee table which is an SCD2 (any change made to an employee's information causes a new row to be introduced in the data source and I am keeping track of the most recent employee information via an is_current flag set to 'Y' or 'N').  

My question is: do I need to update the dimensions daily and only insert whatever data I have at end-of-month into the fact table? Or do I update the dimensions end-of-month as well?

Thank you in advance!

Alistair (141 rep)

Apr 16, 2017, 08:01 PM • Last activity: Jun 3, 2025, 12:03 PM

2 votes

1 answers

1169 views

Dimension modelling for HR with Employee Dimension and multiple departments in a Data warehouse

dimensional-modeling star-schema facttable dimension snowflake

What is the best way to configure a dimension model (preferably star schema) when we have the following requirements? 1. There is an Employees table (25 attributes) where we are required to make some of the attributes to SCD2. For e.g. Salary, LastSalaryIncreaseDate, LastBonusAmount, LastBonuesDate,...

                                  What is the best way to configure a dimension model (preferably star schema) when we have the following requirements?

There is an Employees table (25 attributes) where we are required to make some of the attributes to SCD2. For e.g. Salary, LastSalaryIncreaseDate, LastBonusAmount, LastBonuesDate, Designation. We don't have to maintain the reporting hierarchy.
There are different Departments. Every Department is head by exactly a single Department head (Employee). 
An Employee may belong to multiple Departments and vice versa. 
Monthly payroll information is required to maintain for every Employee.

**Understanding and Questions**
 Should we split the Employees entity into two considering only 5/25 of the attributes are SCD2 based (historical)?
I suppose there is a bridge table required for the Employee and the Departments. So every employee must have an attribute (e.g. DepatementGroupCode) showing multiple departments in the bridge table. Correct?
There is a direct relationship between employees and the Department. So Department will have the attribute EmployeeKey in it. How do I deal with SCD2 changes of employees with respect to the Department entity? 
The payroll periodic Fact entity will be linked only with the Employee and the date dimension. This should not be linked with the Department because it is already linked with the Employee entity...Please correct my understanding.


                                

Irfan Gowani (21 rep)

Jul 8, 2021, 07:39 PM • Last activity: Mar 10, 2025, 07:06 PM

0 votes

1 answers

80 views

Must fact tables be designed for aggregation?

database-design aggregate data-warehouse columnstore facttable

I have been reading up on Microsoft SQL Server's columnstore indexes recently and I have become concerned that my experience does not match that of the industry. Columnstore indexes are designed to make aggregates easy. It is very frequently said that _for this reason_ they are an ideal fit for fact...

                                  I have been reading up on Microsoft SQL Server's columnstore indexes recently and I have become concerned that my experience does not match that of the industry. Columnstore indexes are designed to make aggregates easy. It is very frequently said that _for this reason_ they are an ideal fit for fact tables.

This make me uneasy. **In my experience, fact tables are not aggregated**. The column list for every fact table I have ever made has been of the form "primary key followed by several dimension columns, followed by every bit of info we store about the primary key". They then get dumped into some BI tool like PowerBI or ThoughtSpot, so non-technical people can filter based on the dimensions and find out whatever they want to know. They might aggregate from time to time, but this certainly isn't the main use of the table. I expect such tables to have well over 100 columns.

Is the table design and access pattern I am describing what is expected of a fact table? Or am I describing an entirely different type of table, the name of which I am ignorant?

J. Mini (1237 rep)

Dec 1, 2024, 01:30 AM • Last activity: Dec 1, 2024, 03:26 AM

0 votes

0 answers

28 views

Figuring out dimensional modeling requirements for OLAP in a database containing many pre-aggregated fact tables

database-design schema data-warehouse dimensional-modeling facttable

I have the situation where we have an application that uses a PostgreSQL database for OLAP queries for reporting (less DWH/exploratory). I am trying to find an optimal database design for this workload. The initial incoming transaction data is preaggregated using an ETL pipeline and aggregations for different dimensions are streamed into 'per-aggregation' fact tables where each deals with slightly different dimensions. The requirements for this ETL Pipeline is to be near-realtime and as such something like a DWH with a nighly batch refresh of a single fact table with a star/snowflake schema was unfortunately not an option . These pre-aggregated fact tables looks as follows:

CREATE TABLE IF NOT EXISTS hourly_revenue
(
    hour TIMESTAMP(6) WITH TIME ZONE,
    outlet_id UUID,
    gross_revenue NUMERIC(18,3),
    net_revenue NUMERIC(18,3),
    PRIMARY KEY (outlet_id, hour)
);

CREATE TABLE IF NOT EXISTS hourly_article_revenue
(
    article_group_name character varying(30) NOT NULL,
    article_name character varying(30) NOT NULL,
    hour timestamp(6) with time zone NOT NULL,
    plu character varying(255) NOT NULL,
    outlet_id uuid NOT NULL,
    base_price numeric(18,3),
    line_totals numeric(18,3),
    sum_quantity numeric(18,3),
    CONSTRAINT hourly_article_revenue_pkey PRIMARY KEY (outlet_id, article_name, plu, article_group_name, hour)
);

CREATE TABLE IF NOT EXISTS hourly_waiter_revenue
(
    hour timestamp(6) with time zone NOT NULL,
    outlet_id uuid NOT NULL,
    waiter_name character varying(30) NOT NULL,
    waiter_number character varying(10) NOT NULL,
    gross_revenue numeric(18,3),
    net_revenue numeric(18,3),
    CONSTRAINT hourly_waiter_revenue_pkey PRIMARY KEY (outlet_id, waiter_name, waiter_number, hour)
);

and others. The application supports multi-tenancy which is based on the outlet_id discriminator column. Currently there are no plans for partitioning, but this may change. I am trying to do performance benchmarking on certain types of planned queries which would take these preaggregated hourly facts and then additionally perform ad-hoc aggregations for non-specific time frames (i.e. days to months), and have run into issues with the current design regarding the bloat of multiple fact tables (e.g. highly bloated primary keys) which screams denormalization of dimensions. The worst case scenario is for there to be entries in each of these fact tables for each over for up to 2 years - with around 50000 outlets this yields between 700M and several billions of rows for each "fact" table depending on any additional dimensions. The grain of these individual facts is always hourly. To combat this, I believe I will need some degree of dimensional modeling to better support the types of queries that will be run (queries that impact <1day of data are rather trivial). Taking the hourly_article_revenue fact table as an example, I would have done something like the following: 1. Extract the article dimension into a separate dim_article table, give it a surrogate ID. 2. Add a unique constraint over the current business definition over uniqueness: (outlet_id, article_name, plu, article_group_name). 3. Reference the dim_article surrogate key as a foreign key in the relevant fact table. repeating this process for the rest of the tables and their dimensions (e.g. waiters). However, this approach smells somewhat... *off* due to the fact that - theoretically, I only require one fact table which is sourced from the pre-aggregated data with the highest number of dimensions (and then just drill-up based on the specific requirements). For example, the following two queries sourced from different fact tables:

SELECT outlet_id, date_trunc('month', hour) 
FROM hourly_revenue
WHERE outlet_id = '...'

and:

SELECT outlet_id, date_trunc('month', hour) 
FROM hourly_article_revenue
WHERE outlet_id = '...'
GROUP BY outlet_id

are essentially semantically equivalent, at which point I may as well have just used the raw transaction facts (notwithstanding the near-realtime requirement of data freshness). The obvious drawback is that I cannot necessarily know all the required dimensions beforehand, and as such the fact table would necessarily need to add more dimensions as time goes on, which would be a monumental maintenance burden as opposed to creating a new fact table pertaining to the dimensions that are relevant for those key insights given by a new ETL resultset. Is there a more general approach for this kind of modeling with multiple "fact" tables? Do those principles still hold if the different fact tables are - essentially - drilled up versions of other fact tables? Lastly: is there any further reading I can do on this topic? I have read Kimball et al but I do not have it near my person at the moment and so cannot reference it at this time to verify...

user991710 (111 rep)

Nov 22, 2024, 05:49 PM • Last activity: Nov 25, 2024, 09:34 AM

0 votes

1 answers

44 views

ideas to reduce association tables while building a data warehouse

database-design data-warehouse dimensional-modeling facttable

I'm trying to create a data warehouse for my BI project. Initially I have a raw database that contains 100 data tables, most of them are association tables. My goal is to create normalized dimensional modelling in order to centralize my data the most possible but it seems impossible. Here is an exam...

                                  I'm trying to create a data warehouse for my BI project. Initially I have a raw database that contains 100 data tables, most of them are association tables. My goal is to create normalized dimensional modelling in order to centralize my data the most possible but it seems impossible. 

Here is an example use case : imagine having a firmware, device and many more  tables with many to many relationship with a user table, this way association tables would be created automatically to save every device or firmware created by users ,I want to get rid of these association tables in order to create more centralized dimension/fact tables ,is there any proposition and a solution for this specific use case?

Jalel MED (1 rep)

Jun 20, 2024, 11:19 PM • Last activity: Jun 24, 2024, 04:28 PM

0 votes

2 answers

643 views

Replace primary key with unique constraint and other clustered index to improve query performance without doubling table size

sql-server facttable sql-server-2019

I've got a SQL Server 2019 with a large fact table (currently ~100 million rows but will most likely be >1 billion within a year and continue to increase for a long time). The primary objective is to make it efficient to query the table by a date range (done via power bi). The table definition (omit...

CREATE TABLE [fact].[dv](
	[system_id] [int] NOT NULL,
	[system_row_id] [bigint] NOT NULL,
	[timestamp] [datetime2](3) NOT NULL,
	[equipment_id] [int] NOT NULL,
	[event_id] [int] NOT NULL,
	[event_value] [int] NOT NULL,
	[date] AS cast([timestamp] as date) PERSISTED,

    constraint pk__fact_dv primary key clustered(system_id, system_row_id)
)

Example query:

select
    x,y,z
from fact.dv
where [date] = '20231001'
    and system_id = 20
    and equipment_id in (1,2,3)
    and event_id in (5,6,7)

The current design is useless for querying by date so my first thought was to create a non-clustered compound index with the date as the leading column. Since I need to be able to query for all columns this would effectively double the size of my largest table and in practice double the size of the database. My second idea (and this is where I need input from people more experienced than me) is to: * Replace the current primary key with a unique constraint (and keept the not null constraints on the two columns). * Use a clustered compound index with date as the leading column. If I'm correct this solution would let me: * Keep the current data integrity (assuring that none of system_id or system_row_id is null and that their combination is unique). * Make it efficient to query the table by date. * Leave me with roughly the same table size (mainly adding the size of the index that supports the unique constraint) Are there any drawbacks with this solution except for the fact that you would react on not seeing a primary key on the table? Are any of my assumptions wrong? Any other suggestions?

Martin Riddar (196 rep)

Oct 24, 2023, 08:40 AM • Last activity: Nov 16, 2023, 11:38 AM

1 votes

1 answers

978 views

My Fact table has data weekly but it is not visible in SSAS cube

sql-server ssas facttable dimension

I have my fact table that has data in weekly basis, so I am mapping with Week dimension to pull records in Piviot table. My ssas cube doesnt show all data that is in fact. Should I add any specific scope to show in cube. for Example, My fact has data for week 201839 as well as 201840 (Sep month week...

                                  I have my fact table that has data in weekly basis, so I am mapping with Week dimension to pull records in Piviot table.  My ssas cube doesnt show all data that is in fact. Should I add any specific scope to show in cube.
for Example, My fact has data for week 201839 as well as 201840 (Sep month weeks)
but in cube I can see only for one week  201839 but not for 201840
                                

Sandhya Balakrishnan (41 rep)

Apr 3, 2019, 08:51 AM • Last activity: Sep 26, 2023, 07:02 PM

1 votes

0 answers

1109 views

Dimensional Model for CRM Sales Opportunity and Sales Leads: Fact or Dimension?

data-warehouse dimensional-modeling facttable dimension

I'm working for a business lender, and trying to come up with a basic dimensional model around the subject area of loan approvals. We use a CRM, (Salesforce) and one of the core objects is a (Sales) "Opportunity". Sales usually start off as a lead, if we get traction on a lead, we will create an account, at least one contact, and an "Opportunity" which is a potential loan (our business calls it a "Deal"). All of the loan approval process happens under a particular "Opportunity". I know that Account and Contact would be good dimension candidates, I am unsure of Lead and Opportunity though. I'm interested in opinions on which category Leads and Opportunities belong in: **Fact or Dimension**? Here is an example of some opportunity fields that may be relevant to a data warehouse:

lang-sql
    CREATE TABLE
      dbo.Opportunity
        (
            OpportunitySK                       BIGINT        NOT NULL
                                                IDENTITY(1,1)
                                                CONSTRAINT 
                                                  PK_dbo_Opportunity
                                                PRIMARY KEY CLUSTERED      
          , AccountId                           NCHAR(18)      NULL                              
          , ACTUAL_AMT_COLLECTED                DECIMAL(18,2)  NULL
          , ACTUAL_AMT_OFFERED                  DECIMAL(18,2)  NULL
          , ActualDailyPayment                  DECIMAL(18,2)  NULL
          , ACTUAL_MCA_TERM                     NVARCHAR(10)   NULL
          , ActualPaymentFrequency              NVARCHAR(255)  NULL
          , ApprovalAmount                      DECIMAL(18,2)  NULL
          , APPROVAL_DATE                       DATETIME       NULL
          , ApprovalExpirationDate              DATETIME       NULL
          , BrokerAccount                       NCHAR(18)      NULL
          , CreatedDate                         DATETIME       NOT NULL
          , CreditAnalyst                       NCHAR(18)      NULL
          , CreditProcessor                     NCHAR(18)      NULL
          , CreditSubmittalDate                 DATETIME       NULL
          , DocsInDate                          DATETIME       NULL
          , DocsOutDate                         DATETIME       NULL
          , eNoahCompleteDate                   DATETIME       NULL
          , ExternalId                          NVARCHAR(30)   NULL
          , FundDate                            DATETIME       NULL
          , FundedAmount                        DECIMAL(18,2)  NULL
          , Funder                              NCHAR(18)      NULL
          , Id                                  NCHAR(18)      NOT NULL
          , InternalCreditAnalyst               NCHAR(18)      NULL
          , LenderAccount                       NCHAR(18)      NULL
          , MerchantNumber                      NVARCHAR(32)   NULL
          , OfferInDate                         DATETIME       NULL
          , OfferOutDate                        DATETIME       NULL
          , OwnerId                             NCHAR(18)      NOT NULL
          , PhysicalAddress1                    NVARCHAR(1300) NULL
          , PhysicalAddress2                    NVARCHAR(1300) NULL
          , PhysicalAddressCity                 NVARCHAR(1300) NULL
          , PhysicalAddressState                NVARCHAR(1300) NULL
          , PhysicalAddressZip                  NVARCHAR(1300) NULL
          , PreQualDate                         DATETIME       NULL
          , RecordTypeId                        NCHAR(18)      NULL
          , RequestedAmount                     DECIMAL(18,2)  NULL
          , StageName                           NVARCHAR(40)   NOT NULL
          , StatementsInDate                    DATETIME       NULL
          , Type                                NVARCHAR(40)   NULL
          , UpForFundingDate                    DATETIME       NULL
          , WC_CreditDecisionRecord             NCHAR(18)      NULL
          , YearsInBusiness                     DECIMAL(18,0)  NULL
          , SystemModstamp                      DATETIME       NOT NULL
          , AccountExternalId                   NVARCHAR(100)  NULL
          , ACH                                 VARCHAR(5)     NULL
          , ACH_Debit                           VARCHAR(5)     NULL
          , ACH_DebitAmount                     DECIMAL(18,2)  NULL
          , ACH_QUALIFIED                       VARCHAR(5)     NULL
          , ACH_StartDate                       DATETIME       NULL
          , Amount                              DECIMAL(18,2)  NULL
          , AmountOffered                       DECIMAL(18,2)  NULL
          , Approved                            VARCHAR(5)     NULL
          , BalancePaid100PercentDate           DATETIME       NULL
          , BalancePaid50PercentDate            DATETIME       NULL
          , BalancePaid60PercentDate            DATETIME       NULL
          , BalancePaidatRenewal                DECIMAL(18,2)  NULL
          , BalancePaidPercent                  DECIMAL(18,2)  NULL
          , BalanceRemaining                    DECIMAL(18,2)  NULL
          , BalanceRemainingPercent             DECIMAL(18,2)  NULL
          , BOFI_Deal                           NVARCHAR(255)  NULL
          , BOFI_RestrictedState                NVARCHAR(1300) NULL
          , BookingStarted                      VARCHAR(5)     NULL
          , BrokerTier                          NVARCHAR(255)  NULL
          , CampaignID                          NCHAR(18)      NULL
          , ClosedLostDate                      DATETIME       NULL
          , CloseDate                           DATETIME       NOT NULL
          , CompanyDBA                          NVARCHAR(1300) NULL
          , ContractNumber                      NVARCHAR(32)   NULL
          , DateEstablished                     DATETIME       NULL
          , DateonDocs                          DATETIME       NULL
          , DaysbwAccountActivationDatePrequal  DECIMAL(18,0)  NULL
          , DaysbwActivationDateFundDate        DECIMAL(18,0)  NULL
          , DaysbwApprovalDateFundDate          DECIMAL(18,0)  NULL
          , DaysbwDocsinDateFundDate            DECIMAL(18,0)  NULL
          , DaysInApprovalStage                 DECIMAL(18,0)  NULL
          , DaysInOfferOutStage                 DECIMAL(18,0)  NULL
          , DeclinedDate                        DATETIME       NULL
          , Deposit                             DECIMAL(18,2)  NULL
          , EqAmount                            DECIMAL(18,0)  NULL
          , EqAmtLeased                         DECIMAL(18,2)  NULL
          , EqExperience                        NVARCHAR(32)   NULL
          , EqHasQuote                          VARCHAR(5)     NULL
          , EqLeasedBefore                      VARCHAR(5)     NULL
          , EqNew                               VARCHAR(5)     NULL
          , EqNumVendors                        NVARCHAR(255)  NULL
          , EqTerm                              DECIMAL(18,0)  NULL
          , EqUsed                              VARCHAR(5)     NULL
          , EqWhat                              NVARCHAR(64)   NULL
          , EqWhen                              NVARCHAR(128)  NULL
          , EqWhy                               NVARCHAR(64)   NULL
          , EqWithWhom                          NVARCHAR(64)   NULL
          , EquipCostForActual                  DECIMAL(18,2)  NULL
          , EquipCostPaidByLenderForActual      DECIMAL(18,2)  NULL
          , EquipCostPerDocRequest              DECIMAL(18,2)  NULL
          , EquipmentCondition                  NVARCHAR(255)  NULL
          , EquipmentGrandTotalSummary          DECIMAL(18,2)  NULL
          , EquipmentSeller                     NVARCHAR(255)  NULL
          , EquipmentStory                      NVARCHAR(255)  NULL
          , EquipmentYear                       NVARCHAR(255)  NULL
          , Fax                                 NVARCHAR(1300) NULL
          , FinalFundDate                       DATETIME       NULL
          , FiscalQuarter                       INT            NULL
          , FiscalYear                          INT            NULL
          , IsClosed                            VARCHAR(5)     NOT NULL
          , IsSplit                             VARCHAR(5)     NOT NULL
          , IsWon                               VARCHAR(5)     NOT NULL
          , LeadSource                          NVARCHAR(40)   NULL
          , LeaseNumber                         NVARCHAR(32)   NULL
          , LegalName                           NVARCHAR(1300) NULL
          , LegalStatus                         NVARCHAR(1300) NULL
          , LendersLeaseNumber                  NVARCHAR(32)   NULL
          , LendingTreeId                       NVARCHAR(9)    NULL
          , LendVantageCommission               DECIMAL(18,2)  NULL
          , LendVantageId                       NVARCHAR(255)  NULL
          , LoanStartDate                       DATETIME       NULL
          , LoanStartDateVerified               VARCHAR(5)     NULL
          , Locked                              VARCHAR(5)     NULL
          , MailingAddrDisplay                  NVARCHAR(1300) NULL
          , MailingAddress1                     NVARCHAR(1300) NULL
          , MailingAddressCity                  NVARCHAR(1300) NULL
          , MailingAddressState                 NVARCHAR(1300) NULL
          , MailingAddressZip                   NVARCHAR(1300) NULL
          , MarketingSubmissionDate             DATETIME       NULL
          , MaturityDate                        DATETIME       NULL
          , MerchantID                          NVARCHAR(32)   NULL
          , MonthlyGrossCC_Volume               DECIMAL(18,2)  NULL
          , MonthlySales                        DECIMAL(18,2)  NULL
          , MonthlyVisaMC_Volume                DECIMAL(18,2)  NULL
          , MonthsInBusiness                    DECIMAL(18,0)  NULL
          , Name                                NVARCHAR(120)  NOT NULL
          , NF_DeclineDate                      DATETIME       NULL
          , OpportunityID18                     NVARCHAR(18)   NULL
          , PartnerAccountId                    NCHAR(18)      NULL
          , PastDueBalance                      DECIMAL(18,2)  NULL
          , PaymentsBehind                      DECIMAL(18,0)  NULL
          , PaymentsBounced                     DECIMAL(18,0)  NULL
          , PaymentsCollected                   DECIMAL(18,0)  NULL
          , Phone                               NVARCHAR(1300) NULL
          , Probability                         DECIMAL(18,0)  NULL
          , ProcessorUserID                     NVARCHAR(32)   NULL
          , RateReviewCompleted                 VARCHAR(5)     NULL
          , Ratio                               NVARCHAR(32)   NULL
          , RenewalNumber                       DECIMAL(18,0)  NULL
          , ResubmitDate                        DATETIME       NULL
          , SplitFunding                        NVARCHAR(255)  NULL
          , SplitOpportunity                    VARCHAR(5)     NULL
          , SplitOpportunitywith                NCHAR(18)      NULL
          , Status                              NVARCHAR(255)  NULL
          , SubmissionDate                      DATETIME       NULL
          , SyndicationDate                     DATETIME       NULL              
        )

Bee-Dub (41 rep)

May 18, 2019, 12:32 AM • Last activity: Feb 28, 2023, 01:50 PM

0 votes

1 answers

1781 views

Handling multiple dates on fact table with performance concerns

sql-server database-design datetime dimensional-modeling facttable

I am working on a simple data marts and has once again stumbled with the handling date and time. [![enter image description here][1]][1] This is a design on a vehicle maintenance form records to collect the status record/remarks for analytics/visualisation. There are several date and time columns in...

                                  I am working on a simple data marts and has once again stumbled with the handling date and time.

This is a design on a vehicle maintenance form records to collect the status record/remarks for analytics/visualisation. There are several date and time columns in this fact table.

My questions are

1. Fact table should be narrow and long, and not wide. With this many date and time columns, it makes the table wide. If I query a LEFT JOIN with Vehicle.dim and VehMaintenance.fact to consolidate all details and output in the frontend, wouldn't the performance be horrible? Now imagine I have even more dimensions to join, then doing a LEFT JOIN onto the fact table will make it even more wider. I think my design is horribly wrong.

2. Vehicle.dim is linked back to Date.dim. I read somewhere that dimension table should not link back to another unless it is a snowflake schema and if it is a parent / child. Is it still fine to link both dimension table like this?

Take note that my fact table will be batch updated on a 15 minutes basis from a data warehouse. It will always be an INSERT query to insert records (there will be multiple same records, except the date will be different on these same records). There will be NULL in some of the date columns.

user3118602 (147 rep)

Sep 5, 2022, 02:05 PM • Last activity: Sep 7, 2022, 12:30 PM

0 votes

1 answers

90 views

How do we assign FK to a newly loaded data in fact table?

sql-server database-design dimensional-modeling facttable

While reading up on SQL, I got confused on how a newly loaded data into the fact table will get its FK. Say using the example below (image taken from Google): [![enter image description here][1]][1] [1]: https://i.sstatic.net/vaODI.png This is my thoughts on how the data is loaded 1. `SaleID` is aut...

                                  While reading up on SQL, I got confused on how a newly loaded data into the fact table will get its FK. Say using the example below (image taken from Google):

This is my thoughts on how the data is loaded

1. SaleID is auto generated (eg incremental) to give a unique ID to the new row
2. SalesQuantity, SalesPrice, SalesAmount, ReceiptID, TimeStamp are loaded into the factSales

The question is how does this newly loaded data knows what FK (DateKey, CustomerKey, StoreKey, ProductKey) it should add? My understanding of a dimension table is that it describe the data in factSales.

For example, dimDate will have pre-populated date data up to year 2050. If a date data is loaded into factSales, where it should be in TimeStamp with a data such as 2020-05-01 13:00, how can the FK1 DateKey know how to auto assign it to the date 2020-05-01? The same goes for other dimensions such as CustomerKey, where there will already exist some customer data.

Thank you.

user3118602 (147 rep)

Sep 1, 2022, 03:05 PM • Last activity: Sep 1, 2022, 11:08 PM

1 votes

1 answers

1823 views

How to deal with fact table data that needs to be version controlled?

mysql database-design facttable slowly-changing-dimension version-control

I have the following simplified `sport_match` 'fact' table: | match_id | tournament_id | player_id_p1 | player_id_p2 | p1_final_score | p2_final_score | |----------|---------------|--------------|--------------|----------------|----------------| | 1 | 1 | 1 | 2 | 1 | 0 | | 2 | 1 | 1 | 2 | 3 | 1 | |...

                                  I have the following simplified sport_match 'fact' table:

| match_id | tournament_id | player_id_p1 | player_id_p2 | p1_final_score | p2_final_score |
|----------|---------------|--------------|--------------|----------------|----------------|
|        1 |             1 |            1 |            2 |              1 |              0 |
|        2 |             1 |            1 |            2 |              3 |              1 |
|        3 |             2 |            3 |            2 |              2 |              3 |
|        4 |             2 |            3 |            2 |              4 |              0 |

The table is updated from an API that issues INSERT, UPDATE and DELETE SQL instructions via text files.

Occasionally there is a mistake in the scores and because I need to be able to run historical analyses from a specific point in time I need to capture the incorrect entry and the correct entry. For this reason I started to look at adopting a Slowly Changing Dimension Type 2 method  and translating all the API instructions to INSERT. This would give me a table that looked like this:

| match_key | match_id | tournament_id | player_id_p1 | player_id_p2 | p1_final_score | p2_final_score |    start_date    | current_flag |
|-----------|----------|---------------|--------------|--------------|----------------|----------------|------------------|--------------|
|         1 |        1 |             1 |            1 |            2 |              1 |              0 | 01/01/2000 00:00 | Y            |
|         2 |        2 |             1 |            1 |            2 |              3 |              1 | 02/01/2000 00:00 | Y            |
|         3 |        3 |             2 |            3 |            2 |              2 |              3 | 03/01/2000 00:00 | Y            |
|         4 |        4 |             2 |            3 |            2 |              4 |              0 | 04/01/2000 00:00 | N            |
|         5 |        4 |             2 |            3 |            2 |              4 |              1 | 04/01/2000 00:01 | Y            |

However, I realised I was applying a 'dimension' principle to a 'fact' table.

Is this a viable approach or should I be looking at a different design?
                                

Jossy (83 rep)

Jun 6, 2022, 08:28 PM • Last activity: Jun 6, 2022, 09:04 PM

1 votes

1 answers

518 views

How to handle NULL Dates in an "Accumulating fact table"

data-warehouse business-intelligence dimensional-modeling facttable dimension

One type of fact table in Dimensional modeling is `Accumulating Snapshot fact Table`. if you think you need to review the meaning and definition of this type , you can take a look at this article : https://www.holistics.io/blog/the-three-types-of-fact-tables/. Imagine we have an accumulating fact ta...

                                  One type of fact table in Dimensional modeling is Accumulating Snapshot fact Table. if you think you need to review the meaning and definition of this type , you can take a look at this article :
https://www.holistics.io/blog/the-three-types-of-fact-tables/ .

Imagine we have an accumulating fact table which has three important date column in it 

    'Order_Datekey , Manufacturing_Datekey , Ship_Datekey'

When the table is being inserted for the first time ,there is no data for columns Manufacturing_Datekey and Ship_Datekey'. We only know the date in which the product has been ordered (Order_date). What I need to know is that how we can handle null values for Date columns ?
The important point I read in Kimbal book is that null should be avoided in a foreign key column.

    'Referential
    integrity is violated if you put a null in a fact table column declared as a foreign key
    to a dimension table'

On the other hand we don't have any value for these two date columns at first. What do you suggest?

Thanks in advance

Pantea (1510 rep)

Jun 28, 2021, 10:04 AM • Last activity: Jun 28, 2021, 08:00 PM

1 votes

1 answers

59 views

Type 2 Dimension for a fact measure

database-design dimensional-modeling facttable slowly-changing-dimension

I have a need to store products along with their prices and quantities (and many other things) and track the history. Now, given both price and quantity are stored as integers, it is my understanding these are fact measures. So we might have a table like: f_products: id sku price quantity As the pri...

                                  I have a need to store products along with their prices and quantities (and many other things) and track the history.

Now, given both price and quantity are stored as integers, it is my understanding these are fact measures. So we might have a table like:

    f_products:
      id
      sku 
      price
      quantity

As the price varies over time we can insert a new fact, and I can introduce effective dates to the table to allow for querying the price or quantity at a given time.

But with this design, it means if the quantity changes, I must insert another row, repeating the unchanged price value again. And while this isn't too bad for 2 columns, I have in fact 10.

So, my design is:

    f_products:
      id
      sku 

    price:
       id
       product_id
       price
       start_date
       end_date

    another "dimension":
       id
       product_id
       value
       start_date
       end_date

    and so on

Now the "fact" no longer needs to change when the price changes, and I can query for historical values across these tables.

So my question is, this doesn't seem to fit the standard fact / dimension designs, so I wonder how more experienced modellers would approach these requirements.
                                

sksamuel (113 rep)

Feb 23, 2021, 02:11 AM • Last activity: Feb 23, 2021, 02:49 AM

0 votes

0 answers

53 views

Distinguishing between fact and dimension tables in data warehouse

data-warehouse facttable

Is it true that facts in a fact table of a data warehouse always have a relation to time? I.e. is it true that a fact must always be assignable to a specific point in time? If this is correct, can we conclude that dimension elements (except for maybe a time-dimension) do not need to be assignable to...

                                  Is it true that facts in a fact table of a data warehouse always have a relation to time? I.e. is it true that a fact must always be assignable to a specific point in time?

If this is correct, can we conclude that dimension elements (except for maybe a time-dimension) do not need to be assignable to a specific point in time?

Thanks for your help!

Kevin Südmersen (101 rep)

Aug 11, 2020, 04:26 PM

1 votes

3 answers

854 views

The right way to model multiple FACTs schema

database-design dimensional-modeling facttable star-schema slowly-changing-dimension

**Background** I'm in a process of designing a database (using a STAR schema). There are three tables to model: *products*, *tests*, *states*. The database will be used to store results of *tests* conducted on *products* (in a great simplification). There can be many *tests* pointing out to a single...

                                  **Background**

I'm in a process of designing a database (using a STAR schema). 

There are three tables to model: *products*, *tests*, *states*.

The database will be used to store results of *tests* conducted on *products* (in a great simplification). There can be many *tests* pointing out to a single product but each *test* is unique (they are not shared among *products*). Besides, I need to record the current *state* of the product, at the time when the *test* was conducted. Let's assume that the *state* of a *product* describes its current location and owner, which are changing very often. That will most likely involve SCD lvl 2 - to track the history of *state* changes and to be able to locate a *product* with all its *tests* as well as the *states* it had during these *tests*.

**Problem**

I'm not entirely sure how to model this problem. It seems obvious to store every *test* in a FACT table. This table would then consist of thousands of transactions. On the other hand, there will also be hundreds (and later thousands) of *products*, so I should probably keep them in a second FACT table. Then, there will also be thousands of *state* changes, so in order to record their entire history, I would need to keep them in a ... FACT table as well? I've been told that FACT tables are typically used to store multiple-rows data but on the other hand where are the DIMs in this model?

I also don't know how to model the relationships between these tables. *Products* - *states* is a 1:* relationship. *Products* - *tests* is a 1:* as well. Finally, *states* - *tests* is a 1:* too. I would then go with linking *products* to *states* and then *states* to *tests* (*products* 1<-* *states* 1<-* *tests*), what would allow me to find all *states* for a particular *product* and all *tests* (in all *states* or in a selected *state*). What do you think about that? The problem here is that, as I keep adding *states*, I have two options: either keep duplicating *products* in the *products* table (with added "recorded_timestamp" column) or use a SCD lvl 2 in *states* table, pointing out to the *products* table with a FK, but this would effectively make the *product* table a DIM!

Any help here would be very appreciated.

JustGiveMeData (113 rep)

Jul 12, 2020, 10:14 PM • Last activity: Jul 17, 2020, 01:12 PM

0 votes

1 answers

52 views

Deciding what values should be used in a data warehouse dimension?

data-warehouse dimensional-modeling facttable

I'm beginning to look at data warehousing and I'm coming across examples where, for instance, we have check payment transactions. I understand that this should be the "fact" and we would have dimensions around it that would describe the fact (account, product, etc.). However, it also contains things...

                                  I'm beginning to look at data warehousing and I'm coming across examples where, for instance, we have check payment transactions. I understand that this should be the "fact" and we would have dimensions around it that would describe the fact (account, product, etc.). However, it also contains things like "memo line" and string "transaction id" values which, as strings, don't really belong in the fact table.

My question is, should there be a separate payment "meta" dimension that contains these descriptors, or is it ok to say that something like the memo can't be reported on since it's different every time, so we should leave it out of the warehouse. I know it's somewhat of an abstract question, but I'm not finding much on this in my reading. Any suggestions would be greatly appreciated.

jordaniac89 (103 rep)

Apr 29, 2020, 02:49 PM • Last activity: Apr 29, 2020, 04:52 PM

0 votes

0 answers

62 views

Trying to do a fact and dim table in Sql

mysql table data-warehouse facttable dimension

so far this is what i have code wise, but im getting an error 1239 . Is it not referencing something correctly . CREATE TABLE CustomerDim ( C_CustomerID_Cu INT(10) NOT NULL, C_FacilityReservationNumber_Cu INT(10) NOT NULL, I_EventCode_Cu INT(10) NOT NULL, CONSTRAINT C_CustomerID_Cu FOREIGN KEY (C_Cu...

                                  so far this is what i have code wise, but im getting an error 1239 . Is it not referencing something correctly .


    CREATE TABLE CustomerDim
    (
       C_CustomerID_Cu                INT(10) NOT NULL,
       C_FacilityReservationNumber_Cu INT(10) NOT NULL,
       I_EventCode_Cu                 INT(10) NOT NULL,
       
       CONSTRAINT C_CustomerID_Cu FOREIGN KEY (C_CustomerID_Cu) 
       REFERENCES CustomerDIM
    );

    CREATE TABLE RoomDim
    (
       C_RoomNumber_Ro  INT(10) NOT NULL,
       C_RoomType_Ro    INT(5)  NOT NULL, 
       C_RegionID_Re    INT(4)  NOT NULL,
       C__ResortCode_Fa INT(10) NOT NULL,

       CONSTRAINTC_RoomNumber_Ro FOREIGN KEY(C_RoomType_Ro) 
       refereREFERENCES nces RoomDIM
    );

    CREATE TABLE ReservationFact
    (
       C_RoomReservationNumber_Rr     INT (10) NOT NULL,
       C_CustomerID_Cu                INT (10) NOT NULL, 
       C__RoomType_Ro                 INT (10) NOT NULL, 
       C_D_ArrivalDate_Rr             DATE, 
       C_EmployeeNumber_Em            INT (5) NOT NULL 
       C_ResortCode_Rs                INT (10),
       
       CONSTRAINT C_RoomReservationNumber_Rr_pk PRIMARY KEY
       (C_RoomReservationNumber_Rr)
    );


                                

masongirl (1 rep)

Mar 30, 2020, 07:31 PM • Last activity: Mar 30, 2020, 09:35 PM

0 votes

0 answers

221 views

How to handle table that has an id which references multiple dimension tables?

primary-key many-to-many dimensional-modeling facttable dimension

I'm having a hard time figuring out how to create a lookup table or a table that references multiple dimension tables. I've got a "tracking" table that tracks the number of impressions for an e-commerce website. Each type of impression: manufacturer, product, evaluation is contained is a single objectid column, which references the corresponding dimension table for either manufacturer, product, or evaluation. What I am struggling with is: - manufacturer impressions = manufacturer + product + evaluation impressions - product impressions = product + evaluation impressions I need to be able to query all of the manufacturer impressions, which is a combination of the 3 types of impressions above. If I just query those that match on manufacturer, I will be leaving out the impressions that match on product and/or evaluation. The same thing goes for product impressions. What is the best way to create a reference table or to be able to query all manufacturer impressions? I need to be able to write a query that combines objectid = manufacturerid + objectid = productid (which has a corresponding manufacturerid) + objectid = evaluationid (which has a corresponding productid and then a corresponding manufacturerid that links to the productid). Here's a sample query I'm using, but it's not returning any rows when I add in the second join statement:

select * from tracking t
left join products p on t.objectid = p.productid and t.objecttype = 'Products'
left join evaluations e on t."ObjectID" = e."ProductEvaluationID" and t.[ObjectType] = 'Evaluations'
left join manufacturers m on t.objectid = m.manufacturerid and t.objecttype='Manufacturers'

Any advice is greatly appreciated, thanks!

danieltisdale (1 rep)

Mar 2, 2020, 07:37 PM • Last activity: Mar 2, 2020, 08:58 PM

1 votes

0 answers

174 views

How does one determine the number of FACT tables that are required in a data-warehouse?

dimensional-modeling facttable

When analysing data sources that need to be included in the data-warehouse (Kimball), how does a developer determine whether a schema should be a star type or constellation type? Business processes which can be reflected at one of 3 levels (transaction, periodic snapshot or accumulating snapshot). H...

                                  When analysing data sources that need to be included in the data-warehouse (Kimball), how does a developer determine whether a schema should be a star type or constellation type?

Business processes which can be reflected at one of 3 levels (transaction, periodic snapshot or accumulating snapshot). However, there can exist business processes where the captured data is effectively identical, at any of these levels for any number of these processes.

What are the remaining considerations? (Performance..)

GettingItDone (109 rep)

Oct 17, 2019, 03:40 PM • Last activity: Oct 17, 2019, 06:28 PM

Showing page 1 of 20 total questions