Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

0 answers

16 views

Dimensional Modelling Standard Practices for Periodic Snapshot Table

Our company is relatively new to using dimensional models but we have a need for viewing account balances at certain points in time. Our company has billions of customer accounts so to take daily snapshots of these balances would be millions per day (excluding 0 dollar balances because our business...

                                  Our company is relatively new to using dimensional models but we have a need for viewing account balances at certain points in time. Our company has billions of customer accounts so to take daily snapshots of these balances would be millions per day (excluding 0 dollar balances because our business model closes accounts once reaching 0). 

What I've imagined was creating a periodic snapshot fact table where the balance for each account would utilize the snapshot from the end of the day but only include rows for end of week, end of month, and yesterday (to save memory and processing for days we are not interested in); then utilize a flag in the date dimension table to filter to monthly dates, weekly dates, or current data. I know standard periodic snapshot tables have predefined intervals; to me this sounds like a daily snapshot table that utilizes the dimension table to filter to the dates you're interested in. 

My leadership seems to feel that this should be broken out into three different fact tables (current, weekly, monthly). I feel that this is excessive because it's the same calculation (all time balance at end of day) and could have overlap (i.e. yesterday could be end of week and end of month). Since this is balances at a point in time at end of day and there is no aggregations to achieve "weekly" or "monthly" data, what is standard practice here? Should we take leadership's advice or does it make more sense the way I envisioned it? Either way can someone give me some educational texts to support your opinions for this scenario?

I should also specify that there is already a traditional snapshot source table (not dimensionally modelled) that captures balances with start and end date for each balance change

Bolt (1 rep)

Jul 27, 2025, 04:56 PM • Last activity: Jul 28, 2025, 07:24 AM

0 votes

1 answers

154 views

How to model changing OrderLine rows in fact table

data-warehouse dimensional-modeling

In our business, we have Orders that are made up of OrderItems. Over time, these Orders can change status (e.g. Received, Challenged, Planned, Completed). During this process, the OrderItems for an Order can be changed, added, or removed. For example, when an Order is finally fulfilled, the OrderIte...

                                  In our business, we have Orders that are made up of OrderItems. Over time, these Orders can change status (e.g. Received, Challenged, Planned, Completed). During this process, the OrderItems for an Order can be changed, added, or removed.

For example, when an Order is finally fulfilled, the OrderItems might actually be for different Items and different prices to the OrderItems on the original Order.

I'm wondering how to model this. Any suggestions welcome!

Steve (21 rep)

Jul 18, 2019, 08:03 AM • Last activity: Jul 14, 2025, 08:03 PM

0 votes

1 answers

939 views

Periodic snapshot fact table with monthly grain - Question on dimensions

data-warehouse dimensional-modeling facttable slowly-changing-dimension

I am fairly new to data-warehousing but have read my fair share of books and online tutorials to gain an introductory understanding of the basic components that build up a data warehouse. My end goal is to be able to do a headcount of employees at the end of the month, therefore I designed my fact t...

                                  I am fairly new to data-warehousing but have read my fair share of books and online tutorials to gain an introductory understanding of the basic components that build up a data warehouse.

My end goal is to be able to do a headcount of employees at the end of the month, therefore I designed my fact table as a monthly snapshot where each end-of-month I will populate my fact table (one row per employee per month).

My dimension table includes a DimEmployee table which is an SCD2 (any change made to an employee's information causes a new row to be introduced in the data source and I am keeping track of the most recent employee information via an is_current flag set to 'Y' or 'N').  

My question is: do I need to update the dimensions daily and only insert whatever data I have at end-of-month into the fact table? Or do I update the dimensions end-of-month as well?

Thank you in advance!

Alistair (141 rep)

Apr 16, 2017, 08:01 PM • Last activity: Jun 3, 2025, 12:03 PM

1 votes

0 answers

29 views

appropriate schema modelling a dimensional model for questionnaires

postgresql schema dimensional-modeling star-schema

I am designing a snowflake (star, constellation) schema to model a questionnaire enabling aggregation of questions. 1) A template can have multiple questionnaires. 2) A questionnaire can have only one template. 3) A questionnaire can have multiple questions (multiple choice, rating, text). 4) A ques...

                                  I am designing a snowflake (star, constellation) schema to model a questionnaire enabling aggregation of questions.

1) A template can have multiple questionnaires.
2) A questionnaire can have only one template.
3) A questionnaire can have multiple questions (multiple choice, rating, text).
4) A question can have multiple answers.

A (questionnaire) template can be altered. Answers do not change.

What is an appropriate schema?
                                

Bennimi (165 rep)

Apr 11, 2025, 12:28 PM • Last activity: Apr 16, 2025, 01:39 PM

3 votes

3 answers

5928 views

Should dates in dimensional tables use dimDate?

data-warehouse olap dimensional-modeling

Assuming my dimDate has a surrogate key. Should all the date columns in the *dimensional tables* (not the fact tables) store the surrogate key of date dimension? Or just plain date? For example, in dimensional table dimCustomer, there may be birthday, join date, graduation date, .... Etc.

                                  Assuming my dimDate has a surrogate key. Should all the date columns in the *dimensional tables* (not the fact tables) store the surrogate key of date dimension? Or just plain date? 

For example, in dimensional table dimCustomer, there may be birthday, join date, graduation date, .... Etc.

u23432534 (1565 rep)

Jun 5, 2017, 07:54 PM • Last activity: Apr 5, 2025, 11:08 PM

2 votes

1 answers

1168 views

Dimension modelling for HR with Employee Dimension and multiple departments in a Data warehouse

dimensional-modeling star-schema facttable dimension snowflake

What is the best way to configure a dimension model (preferably star schema) when we have the following requirements? 1. There is an Employees table (25 attributes) where we are required to make some of the attributes to SCD2. For e.g. Salary, LastSalaryIncreaseDate, LastBonusAmount, LastBonuesDate,...

                                  What is the best way to configure a dimension model (preferably star schema) when we have the following requirements?

There is an Employees table (25 attributes) where we are required to make some of the attributes to SCD2. For e.g. Salary, LastSalaryIncreaseDate, LastBonusAmount, LastBonuesDate, Designation. We don't have to maintain the reporting hierarchy.
There are different Departments. Every Department is head by exactly a single Department head (Employee). 
An Employee may belong to multiple Departments and vice versa. 
Monthly payroll information is required to maintain for every Employee.

**Understanding and Questions**
 Should we split the Employees entity into two considering only 5/25 of the attributes are SCD2 based (historical)?
I suppose there is a bridge table required for the Employee and the Departments. So every employee must have an attribute (e.g. DepatementGroupCode) showing multiple departments in the bridge table. Correct?
There is a direct relationship between employees and the Department. So Department will have the attribute EmployeeKey in it. How do I deal with SCD2 changes of employees with respect to the Department entity? 
The payroll periodic Fact entity will be linked only with the Employee and the date dimension. This should not be linked with the Department because it is already linked with the Employee entity...Please correct my understanding.


                                

Irfan Gowani (21 rep)

Jul 8, 2021, 07:39 PM • Last activity: Mar 10, 2025, 07:06 PM

1 votes

2 answers

696 views

Dimension for Product Properties based on Product Type - Sparse Dimension?

data-warehouse dimensional-modeling

I am creating dimension of product properties for sales facts. Property of product depend of product type. For example: - Type = smartphone. Properties = model, OS, size - Type = book. Properties = author, title How dimension should be for this case? Should I create dimension which contain ALL prope...

                                  I am creating dimension of product properties for sales facts. 

Property of product depend of product type. For example:
- Type = smartphone. Properties = model, OS, size
- Type = book. Properties = author, title

How dimension should be for this case?

Should I create dimension which contain ALL properties?
In this case dimension content will be sparse, there will be many null values.

    |----------------------------------------------------|
    | DimKey | Type | Model | OS | Size | AUTHOR | TITLE |

OR, should I create dimension for each?
In this case sales fact will have many FKs.

    |-------------------------------------------------------------|
    | FactKey | Quantity | Total | Book_FK | Smartphone_FK | .... |

Is there any other way to do this?

rendybjunior (259 rep)

Nov 4, 2014, 12:34 PM • Last activity: Mar 3, 2025, 10:02 PM

0 votes

0 answers

28 views

Figuring out dimensional modeling requirements for OLAP in a database containing many pre-aggregated fact tables

database-design schema data-warehouse dimensional-modeling facttable

I have the situation where we have an application that uses a PostgreSQL database for OLAP queries for reporting (less DWH/exploratory). I am trying to find an optimal database design for this workload. The initial incoming transaction data is preaggregated using an ETL pipeline and aggregations for different dimensions are streamed into 'per-aggregation' fact tables where each deals with slightly different dimensions. The requirements for this ETL Pipeline is to be near-realtime and as such something like a DWH with a nighly batch refresh of a single fact table with a star/snowflake schema was unfortunately not an option . These pre-aggregated fact tables looks as follows:

CREATE TABLE IF NOT EXISTS hourly_revenue
(
    hour TIMESTAMP(6) WITH TIME ZONE,
    outlet_id UUID,
    gross_revenue NUMERIC(18,3),
    net_revenue NUMERIC(18,3),
    PRIMARY KEY (outlet_id, hour)
);

CREATE TABLE IF NOT EXISTS hourly_article_revenue
(
    article_group_name character varying(30) NOT NULL,
    article_name character varying(30) NOT NULL,
    hour timestamp(6) with time zone NOT NULL,
    plu character varying(255) NOT NULL,
    outlet_id uuid NOT NULL,
    base_price numeric(18,3),
    line_totals numeric(18,3),
    sum_quantity numeric(18,3),
    CONSTRAINT hourly_article_revenue_pkey PRIMARY KEY (outlet_id, article_name, plu, article_group_name, hour)
);

CREATE TABLE IF NOT EXISTS hourly_waiter_revenue
(
    hour timestamp(6) with time zone NOT NULL,
    outlet_id uuid NOT NULL,
    waiter_name character varying(30) NOT NULL,
    waiter_number character varying(10) NOT NULL,
    gross_revenue numeric(18,3),
    net_revenue numeric(18,3),
    CONSTRAINT hourly_waiter_revenue_pkey PRIMARY KEY (outlet_id, waiter_name, waiter_number, hour)
);

and others. The application supports multi-tenancy which is based on the outlet_id discriminator column. Currently there are no plans for partitioning, but this may change. I am trying to do performance benchmarking on certain types of planned queries which would take these preaggregated hourly facts and then additionally perform ad-hoc aggregations for non-specific time frames (i.e. days to months), and have run into issues with the current design regarding the bloat of multiple fact tables (e.g. highly bloated primary keys) which screams denormalization of dimensions. The worst case scenario is for there to be entries in each of these fact tables for each over for up to 2 years - with around 50000 outlets this yields between 700M and several billions of rows for each "fact" table depending on any additional dimensions. The grain of these individual facts is always hourly. To combat this, I believe I will need some degree of dimensional modeling to better support the types of queries that will be run (queries that impact <1day of data are rather trivial). Taking the hourly_article_revenue fact table as an example, I would have done something like the following: 1. Extract the article dimension into a separate dim_article table, give it a surrogate ID. 2. Add a unique constraint over the current business definition over uniqueness: (outlet_id, article_name, plu, article_group_name). 3. Reference the dim_article surrogate key as a foreign key in the relevant fact table. repeating this process for the rest of the tables and their dimensions (e.g. waiters). However, this approach smells somewhat... *off* due to the fact that - theoretically, I only require one fact table which is sourced from the pre-aggregated data with the highest number of dimensions (and then just drill-up based on the specific requirements). For example, the following two queries sourced from different fact tables:

SELECT outlet_id, date_trunc('month', hour) 
FROM hourly_revenue
WHERE outlet_id = '...'

and:

SELECT outlet_id, date_trunc('month', hour) 
FROM hourly_article_revenue
WHERE outlet_id = '...'
GROUP BY outlet_id

are essentially semantically equivalent, at which point I may as well have just used the raw transaction facts (notwithstanding the near-realtime requirement of data freshness). The obvious drawback is that I cannot necessarily know all the required dimensions beforehand, and as such the fact table would necessarily need to add more dimensions as time goes on, which would be a monumental maintenance burden as opposed to creating a new fact table pertaining to the dimensions that are relevant for those key insights given by a new ETL resultset. Is there a more general approach for this kind of modeling with multiple "fact" tables? Do those principles still hold if the different fact tables are - essentially - drilled up versions of other fact tables? Lastly: is there any further reading I can do on this topic? I have read Kimball et al but I do not have it near my person at the moment and so cannot reference it at this time to verify...

user991710 (111 rep)

Nov 22, 2024, 05:49 PM • Last activity: Nov 25, 2024, 09:34 AM

0 votes

1 answers

44 views

ideas to reduce association tables while building a data warehouse

database-design data-warehouse dimensional-modeling facttable

I'm trying to create a data warehouse for my BI project. Initially I have a raw database that contains 100 data tables, most of them are association tables. My goal is to create normalized dimensional modelling in order to centralize my data the most possible but it seems impossible. Here is an exam...

                                  I'm trying to create a data warehouse for my BI project. Initially I have a raw database that contains 100 data tables, most of them are association tables. My goal is to create normalized dimensional modelling in order to centralize my data the most possible but it seems impossible. 

Here is an example use case : imagine having a firmware, device and many more  tables with many to many relationship with a user table, this way association tables would be created automatically to save every device or firmware created by users ,I want to get rid of these association tables in order to create more centralized dimension/fact tables ,is there any proposition and a solution for this specific use case?

Jalel MED (1 rep)

Jun 20, 2024, 11:19 PM • Last activity: Jun 24, 2024, 04:28 PM

6 votes

1 answers

410 views

Designing a dimensional DB off a normalized source that already implements SCD's

sql-server ssis ssas dimensional-modeling slowly-changing-dimension

I have built an SSIS ETL to integrate various data sources (one from MySQL, two from SQL Server) into a single SQL Server relational and normalized database, which I've called [NDS]. The SSIS ETL handles type-2 updates, and so the [NDS] generates surrogate keys and SCD tables include an [_EffectiveF...

                                  I have built an SSIS ETL to integrate various data sources (one from MySQL, two from SQL Server) into a single SQL Server relational and normalized database, which I've called [NDS].

The SSIS ETL handles type-2 updates, and so the [NDS] generates surrogate keys and SCD tables include an [_EffectiveFrom] timestamp and a nullable [_EffectiveTo] column, and there are constraints for the natural keys and beautiful foreign keys linking all the data together.

Now, I wanted to build an SSAS dimensional database off of it, and it didn't take too long before I realized I was setting myself up for a snowflake schema:

So I'm thinking of adding a new [DDS] (relational) database, to create the *actual* dimension and fact tables that will feed the DSV's for the SSAS db.

This [DDS] database would be as denormalized as humanly possible, so as to "flatten" the facts and dimensions (like, [OrderHeaders]+[OrderDetails] into an [Orders] fact table, and [CustomerStores]+[Customers]+[SalesReps] into some [Customers] dimension table) - doing this should not only make it easier for me to build the dimension hierarchies in SSAS, it should also make it easier to come up with an actual star schema.

I have a few questions though:

 - Can I reuse a subset of my existing surrogate keys? I'm thinking to take the existing key for the most granular level and make that the dimension key. Is that a good approach, or should I just ignore the [NDS] surrogate keys and make the [DDS] (relational db) generate a new set of surrogate keys?
 - How to handle SCD's? For example, "Materials" and "Suppliers" will generate new records in [NDS] when some specific fields change in the source system... I think I'll have to design the SSIS ETL to only load the "last image" records into the [DDS] db, and then re-implement the type-2 updates in that process, i.e. treat the [NDS] as a "source system" that keeps history, while duplicating everything in this [DDS] database. But then, why would I need to keep history in the [NDS] *and* the [DDS]? Clearly something's not right.

Am I setting myself up for a Big Mess™, or I'm on the right track?

Mathieu Guindon (914 rep)

Dec 1, 2015, 06:31 PM • Last activity: Mar 26, 2024, 05:28 PM

1 votes

0 answers

24 views

PowerBI Data Modeling: Modeling with sub-dimension and sub-facts

database-design dimensional-modeling powerbi

We collect test data from network tests. There is common dimensions and facts for all the different tests, But there are also some facets of the dimensions and facts that are unique. If this was an OODB I would created a super-class with all the common properites, and sub-classes of the unique ones....

                                  We collect test data from network tests. There is common dimensions and facts for all the different tests, But there are also some facets of the dimensions and facts that are unique. 

If this was an OODB I would created a super-class with all the common properites, and sub-classes of the unique ones. But PowerBI is a traditional relational model.

I hae the following notional model that I want feedback on. The key words are:

DNS = DNS lookup tests
OCSP = This is protocol to get an SSL certificate
Data Throughput = This is a full session (does DNS lookup to IP address, gets OCSP, and does data transfer)

This looks nice and simple, but I wonder if the sub-dimension such as OCSP should be linked through the common dimension table or point directly to the Common Facts. The link between the OCSP to the COMMON dimension is 1-1. But I can see this as doing two JOINS to slice the OCSP facet to get to the Fact in Common, and three Joins to get to the OCSP facts. (so maybe I should flatten this model. It would not look as nice, but it might be more performant.

BTW, there is already about 150M rows in the fact table (the dimension tables are about 1K, so that is not a big concern.

Dr.YSG (409 rep)

Dec 20, 2023, 10:44 PM

0 votes

0 answers

105 views

How Do I create history table that allows me to retrieve the times when a row active by year & month?

sql-server dimensional-modeling slowly-changing-dimension

I want to be able to retrieve the in-between months when a client was active but from a historical table. Table: DimensionEpisode | episode_id | client_id | episode_status | discharge_date | admission_date | program_id | length_of_stay | subprogram | ...other attributes | |------------|-----------|-...

                                  I want to be able to retrieve the in-between months when a client was active but from a historical table.

Table: DimensionEpisode

| episode_id | client_id | episode_status | discharge_date       | admission_date       | program_id | length_of_stay | subprogram | ...other attributes |
|------------|-----------|----------------|----------------------|----------------------|------------|----------------|------------|--------------------|
| 63         | 61362     | CLOSED         | 2018-09-06T00:00:00  | 2014-10-01T00:00:00  | 72         | 1436           | Union PC   | ...                |
| 64         | 61343     | OPEN           | 2019-03-15T00:00:00  | 2015-05-20T00:00:00  | 80         | 1000           | Rehab      | ...                |
| 65         | 61344     | OPEN           | 2020-01-10T00:00:00  | 2018-11-12T00:00:00  | 65         | 750            | Psychiatry | ...                |
| 66         | 61345     | CLOSED         | 2021-07-02T00:00:00  | 2019-09-28T00:00:00  | 72         | 1250           | Union PC   | ...                |

Sam Johnson (21 rep)

Dec 6, 2023, 03:33 PM

0 votes

1 answers

109 views

Architecture for "Overlapping Dimensions"?

database-design business-intelligence dimensional-modeling dimension

Take a SQL RDBMS data warehouse -- typical facts and dimensions type layout. Say you want Orders x Country. Maybe a date field, an orders field, a country field. And then if you want a report/ software that slices by Continent? Easy country x continent dimensional table, right? But what's the proper...

                                  Take a SQL RDBMS data warehouse -- typical facts and dimensions type layout.

Say you want Orders x Country.
Maybe a date field, an orders field, a country field.

And then if you want a report/ software that slices by Continent? Easy country x continent dimensional table, right?

But what's the proper architecture for overlapping dimensions?

For instance maybe you want "Greater Country" that includes the UK, British Isles, Island of Ireland -- ... these are larger groups that contain overlapping smaller pieces.

Northern Ireland for instance is a component of both the UK and Island of Ireland.

What's the best architecture so that if an end-user select "UK" or "Island of Ireland" -- that Northern Ireland pops up? Without duplication?

I guess there are several ways to do this --- I don't actually care about the UK ha it's just an example --

You have a small component Dimension A. You have a larger grouping Dimension B but A is not unique to a single B.

Every "cube" based reporting system is anti-thetical to this, but there are use cases. Nor do you want to create tons of dimensions.

Is the best method simply a dimensions table that shows full membership -- aka Northern Ireland - UK, Northern Ireland - Isle of Ireland .... then do a join, return distinct? Doesn't seem efficient but maybe that's the best way.

user45867 (1739 rep)

Aug 15, 2023, 08:24 PM • Last activity: Aug 31, 2023, 02:53 PM

1 votes

0 answers

1109 views

Dimensional Model for CRM Sales Opportunity and Sales Leads: Fact or Dimension?

data-warehouse dimensional-modeling facttable dimension

I'm working for a business lender, and trying to come up with a basic dimensional model around the subject area of loan approvals. We use a CRM, (Salesforce) and one of the core objects is a (Sales) "Opportunity". Sales usually start off as a lead, if we get traction on a lead, we will create an account, at least one contact, and an "Opportunity" which is a potential loan (our business calls it a "Deal"). All of the loan approval process happens under a particular "Opportunity". I know that Account and Contact would be good dimension candidates, I am unsure of Lead and Opportunity though. I'm interested in opinions on which category Leads and Opportunities belong in: **Fact or Dimension**? Here is an example of some opportunity fields that may be relevant to a data warehouse:

lang-sql
    CREATE TABLE
      dbo.Opportunity
        (
            OpportunitySK                       BIGINT        NOT NULL
                                                IDENTITY(1,1)
                                                CONSTRAINT 
                                                  PK_dbo_Opportunity
                                                PRIMARY KEY CLUSTERED      
          , AccountId                           NCHAR(18)      NULL                              
          , ACTUAL_AMT_COLLECTED                DECIMAL(18,2)  NULL
          , ACTUAL_AMT_OFFERED                  DECIMAL(18,2)  NULL
          , ActualDailyPayment                  DECIMAL(18,2)  NULL
          , ACTUAL_MCA_TERM                     NVARCHAR(10)   NULL
          , ActualPaymentFrequency              NVARCHAR(255)  NULL
          , ApprovalAmount                      DECIMAL(18,2)  NULL
          , APPROVAL_DATE                       DATETIME       NULL
          , ApprovalExpirationDate              DATETIME       NULL
          , BrokerAccount                       NCHAR(18)      NULL
          , CreatedDate                         DATETIME       NOT NULL
          , CreditAnalyst                       NCHAR(18)      NULL
          , CreditProcessor                     NCHAR(18)      NULL
          , CreditSubmittalDate                 DATETIME       NULL
          , DocsInDate                          DATETIME       NULL
          , DocsOutDate                         DATETIME       NULL
          , eNoahCompleteDate                   DATETIME       NULL
          , ExternalId                          NVARCHAR(30)   NULL
          , FundDate                            DATETIME       NULL
          , FundedAmount                        DECIMAL(18,2)  NULL
          , Funder                              NCHAR(18)      NULL
          , Id                                  NCHAR(18)      NOT NULL
          , InternalCreditAnalyst               NCHAR(18)      NULL
          , LenderAccount                       NCHAR(18)      NULL
          , MerchantNumber                      NVARCHAR(32)   NULL
          , OfferInDate                         DATETIME       NULL
          , OfferOutDate                        DATETIME       NULL
          , OwnerId                             NCHAR(18)      NOT NULL
          , PhysicalAddress1                    NVARCHAR(1300) NULL
          , PhysicalAddress2                    NVARCHAR(1300) NULL
          , PhysicalAddressCity                 NVARCHAR(1300) NULL
          , PhysicalAddressState                NVARCHAR(1300) NULL
          , PhysicalAddressZip                  NVARCHAR(1300) NULL
          , PreQualDate                         DATETIME       NULL
          , RecordTypeId                        NCHAR(18)      NULL
          , RequestedAmount                     DECIMAL(18,2)  NULL
          , StageName                           NVARCHAR(40)   NOT NULL
          , StatementsInDate                    DATETIME       NULL
          , Type                                NVARCHAR(40)   NULL
          , UpForFundingDate                    DATETIME       NULL
          , WC_CreditDecisionRecord             NCHAR(18)      NULL
          , YearsInBusiness                     DECIMAL(18,0)  NULL
          , SystemModstamp                      DATETIME       NOT NULL
          , AccountExternalId                   NVARCHAR(100)  NULL
          , ACH                                 VARCHAR(5)     NULL
          , ACH_Debit                           VARCHAR(5)     NULL
          , ACH_DebitAmount                     DECIMAL(18,2)  NULL
          , ACH_QUALIFIED                       VARCHAR(5)     NULL
          , ACH_StartDate                       DATETIME       NULL
          , Amount                              DECIMAL(18,2)  NULL
          , AmountOffered                       DECIMAL(18,2)  NULL
          , Approved                            VARCHAR(5)     NULL
          , BalancePaid100PercentDate           DATETIME       NULL
          , BalancePaid50PercentDate            DATETIME       NULL
          , BalancePaid60PercentDate            DATETIME       NULL
          , BalancePaidatRenewal                DECIMAL(18,2)  NULL
          , BalancePaidPercent                  DECIMAL(18,2)  NULL
          , BalanceRemaining                    DECIMAL(18,2)  NULL
          , BalanceRemainingPercent             DECIMAL(18,2)  NULL
          , BOFI_Deal                           NVARCHAR(255)  NULL
          , BOFI_RestrictedState                NVARCHAR(1300) NULL
          , BookingStarted                      VARCHAR(5)     NULL
          , BrokerTier                          NVARCHAR(255)  NULL
          , CampaignID                          NCHAR(18)      NULL
          , ClosedLostDate                      DATETIME       NULL
          , CloseDate                           DATETIME       NOT NULL
          , CompanyDBA                          NVARCHAR(1300) NULL
          , ContractNumber                      NVARCHAR(32)   NULL
          , DateEstablished                     DATETIME       NULL
          , DateonDocs                          DATETIME       NULL
          , DaysbwAccountActivationDatePrequal  DECIMAL(18,0)  NULL
          , DaysbwActivationDateFundDate        DECIMAL(18,0)  NULL
          , DaysbwApprovalDateFundDate          DECIMAL(18,0)  NULL
          , DaysbwDocsinDateFundDate            DECIMAL(18,0)  NULL
          , DaysInApprovalStage                 DECIMAL(18,0)  NULL
          , DaysInOfferOutStage                 DECIMAL(18,0)  NULL
          , DeclinedDate                        DATETIME       NULL
          , Deposit                             DECIMAL(18,2)  NULL
          , EqAmount                            DECIMAL(18,0)  NULL
          , EqAmtLeased                         DECIMAL(18,2)  NULL
          , EqExperience                        NVARCHAR(32)   NULL
          , EqHasQuote                          VARCHAR(5)     NULL
          , EqLeasedBefore                      VARCHAR(5)     NULL
          , EqNew                               VARCHAR(5)     NULL
          , EqNumVendors                        NVARCHAR(255)  NULL
          , EqTerm                              DECIMAL(18,0)  NULL
          , EqUsed                              VARCHAR(5)     NULL
          , EqWhat                              NVARCHAR(64)   NULL
          , EqWhen                              NVARCHAR(128)  NULL
          , EqWhy                               NVARCHAR(64)   NULL
          , EqWithWhom                          NVARCHAR(64)   NULL
          , EquipCostForActual                  DECIMAL(18,2)  NULL
          , EquipCostPaidByLenderForActual      DECIMAL(18,2)  NULL
          , EquipCostPerDocRequest              DECIMAL(18,2)  NULL
          , EquipmentCondition                  NVARCHAR(255)  NULL
          , EquipmentGrandTotalSummary          DECIMAL(18,2)  NULL
          , EquipmentSeller                     NVARCHAR(255)  NULL
          , EquipmentStory                      NVARCHAR(255)  NULL
          , EquipmentYear                       NVARCHAR(255)  NULL
          , Fax                                 NVARCHAR(1300) NULL
          , FinalFundDate                       DATETIME       NULL
          , FiscalQuarter                       INT            NULL
          , FiscalYear                          INT            NULL
          , IsClosed                            VARCHAR(5)     NOT NULL
          , IsSplit                             VARCHAR(5)     NOT NULL
          , IsWon                               VARCHAR(5)     NOT NULL
          , LeadSource                          NVARCHAR(40)   NULL
          , LeaseNumber                         NVARCHAR(32)   NULL
          , LegalName                           NVARCHAR(1300) NULL
          , LegalStatus                         NVARCHAR(1300) NULL
          , LendersLeaseNumber                  NVARCHAR(32)   NULL
          , LendingTreeId                       NVARCHAR(9)    NULL
          , LendVantageCommission               DECIMAL(18,2)  NULL
          , LendVantageId                       NVARCHAR(255)  NULL
          , LoanStartDate                       DATETIME       NULL
          , LoanStartDateVerified               VARCHAR(5)     NULL
          , Locked                              VARCHAR(5)     NULL
          , MailingAddrDisplay                  NVARCHAR(1300) NULL
          , MailingAddress1                     NVARCHAR(1300) NULL
          , MailingAddressCity                  NVARCHAR(1300) NULL
          , MailingAddressState                 NVARCHAR(1300) NULL
          , MailingAddressZip                   NVARCHAR(1300) NULL
          , MarketingSubmissionDate             DATETIME       NULL
          , MaturityDate                        DATETIME       NULL
          , MerchantID                          NVARCHAR(32)   NULL
          , MonthlyGrossCC_Volume               DECIMAL(18,2)  NULL
          , MonthlySales                        DECIMAL(18,2)  NULL
          , MonthlyVisaMC_Volume                DECIMAL(18,2)  NULL
          , MonthsInBusiness                    DECIMAL(18,0)  NULL
          , Name                                NVARCHAR(120)  NOT NULL
          , NF_DeclineDate                      DATETIME       NULL
          , OpportunityID18                     NVARCHAR(18)   NULL
          , PartnerAccountId                    NCHAR(18)      NULL
          , PastDueBalance                      DECIMAL(18,2)  NULL
          , PaymentsBehind                      DECIMAL(18,0)  NULL
          , PaymentsBounced                     DECIMAL(18,0)  NULL
          , PaymentsCollected                   DECIMAL(18,0)  NULL
          , Phone                               NVARCHAR(1300) NULL
          , Probability                         DECIMAL(18,0)  NULL
          , ProcessorUserID                     NVARCHAR(32)   NULL
          , RateReviewCompleted                 VARCHAR(5)     NULL
          , Ratio                               NVARCHAR(32)   NULL
          , RenewalNumber                       DECIMAL(18,0)  NULL
          , ResubmitDate                        DATETIME       NULL
          , SplitFunding                        NVARCHAR(255)  NULL
          , SplitOpportunity                    VARCHAR(5)     NULL
          , SplitOpportunitywith                NCHAR(18)      NULL
          , Status                              NVARCHAR(255)  NULL
          , SubmissionDate                      DATETIME       NULL
          , SyndicationDate                     DATETIME       NULL              
        )

Bee-Dub (41 rep)

May 18, 2019, 12:32 AM • Last activity: Feb 28, 2023, 01:50 PM

-1 votes

1 answers

483 views

Understanding OLTP and OLAP

database-design dimensional-modeling

I know what is the meaning of OLAP and OLTP and who is specialized to do what. Currently, I'm just studying data modelling theoretically and I don't know the practicality of the differences. I wanted some example that would highlight the differences. So I cooked up one (with ChatGPT) and am not able to see if any differences really exist. Like if I wanted to model an online order, so I have three entities - Customer, Order, Product. In transactional database, I would have these tables and columns

Customer - {CustomerID, CustomerName}
Order - {CustomerID, OrderID}
Order_Details - {OrderID, ProductID, Location}
Product - {ProductID, Name, Category}

In analytical database, I would have these tables and columns

Customer - {CustomerID, CustomerName}
Order - {OrderID, CustomerID, ProductID, Location}
Product - {ProductID, Name, Category}

The only difference I see is that transactional database has one more table due to normalization. If possible can someone tell what exactly is the benefit. Or if some other example is possible. Also I would like to know about any differences in ER diagrams for both models (except for some naming changes like entities into fact and dimensions), where exactly things change in the ER as well

Dhruv (103 rep)

Feb 21, 2023, 07:18 PM • Last activity: Feb 23, 2023, 11:56 AM

0 votes

1 answers

1777 views

Handling multiple dates on fact table with performance concerns

sql-server database-design datetime dimensional-modeling facttable

I am working on a simple data marts and has once again stumbled with the handling date and time. [![enter image description here][1]][1] This is a design on a vehicle maintenance form records to collect the status record/remarks for analytics/visualisation. There are several date and time columns in...

                                  I am working on a simple data marts and has once again stumbled with the handling date and time.

This is a design on a vehicle maintenance form records to collect the status record/remarks for analytics/visualisation. There are several date and time columns in this fact table.

My questions are

1. Fact table should be narrow and long, and not wide. With this many date and time columns, it makes the table wide. If I query a LEFT JOIN with Vehicle.dim and VehMaintenance.fact to consolidate all details and output in the frontend, wouldn't the performance be horrible? Now imagine I have even more dimensions to join, then doing a LEFT JOIN onto the fact table will make it even more wider. I think my design is horribly wrong.

2. Vehicle.dim is linked back to Date.dim. I read somewhere that dimension table should not link back to another unless it is a snowflake schema and if it is a parent / child. Is it still fine to link both dimension table like this?

Take note that my fact table will be batch updated on a 15 minutes basis from a data warehouse. It will always be an INSERT query to insert records (there will be multiple same records, except the date will be different on these same records). There will be NULL in some of the date columns.

user3118602 (147 rep)

Sep 5, 2022, 02:05 PM • Last activity: Sep 7, 2022, 12:30 PM

2 votes

1 answers

558 views

Data marts - to create multiple database or consolidate in one?

sql-server database-design dimensional-modeling

Take this as an example (image taken from Google): [![enter image description here][1]][1] [1]: https://i.sstatic.net/2H0DN.png All data pipeline diagrams show that there are multiple data marts for each different business units or specific workscope (in this case, `Purchasing`, `Sales`, `Inventory`...

                                  Take this as an example (image taken from Google):



All data pipeline diagrams show that there are multiple data marts for each different business units or specific workscope (in this case, Purchasing, Sales, Inventory).

Given the two options below:

Option A:

    car-db
     |_ Databases
        |_ carDataMarts
           |_ dbo.fact_purchase_x
           |_ dbo.dim_purchase_y
           |_ dbo.dim_purchase_z
           |_ dbo.fact_sales_x
           |_ dbo.dim_sales_y
           |_ dbo.dim_sales_z
           |_ dbo.fact_inventory_x
           |_ dbo.dim_inventory_y
           |_ dbo.dim_inventory_z

Option B

    car-db
     |_ Databases
        |_ car_purchase_DataMart
           |_ dbo.fact_purchase_x
           |_ dbo.dim_purchase_y
           |_ dbo.dim_purchase_z
        |_ car_sales_DataMart
           |_ dbo.fact_sales_x
           |_ dbo.dim_sales_y
           |_ dbo.dim_sales_z
        |_ car_inventory_DataMarts
           |_ dbo.fact_inventory_x
           |_ dbo.dim_inventory_y
           |_ dbo.dim_inventory_z

1. How will the diagram translate to the actual implementation of data marts (option A or B)?
2. There should be some kind of differences in term of performance or usability between option A and B, if so what are these differences?

P.S: Using MS SQL

                                

user3118602 (147 rep)

Sep 2, 2022, 03:00 AM • Last activity: Sep 2, 2022, 12:11 PM

0 votes

1 answers

90 views

How do we assign FK to a newly loaded data in fact table?

sql-server database-design dimensional-modeling facttable

While reading up on SQL, I got confused on how a newly loaded data into the fact table will get its FK. Say using the example below (image taken from Google): [![enter image description here][1]][1] [1]: https://i.sstatic.net/vaODI.png This is my thoughts on how the data is loaded 1. `SaleID` is aut...

                                  While reading up on SQL, I got confused on how a newly loaded data into the fact table will get its FK. Say using the example below (image taken from Google):

This is my thoughts on how the data is loaded

1. SaleID is auto generated (eg incremental) to give a unique ID to the new row
2. SalesQuantity, SalesPrice, SalesAmount, ReceiptID, TimeStamp are loaded into the factSales

The question is how does this newly loaded data knows what FK (DateKey, CustomerKey, StoreKey, ProductKey) it should add? My understanding of a dimension table is that it describe the data in factSales.

For example, dimDate will have pre-populated date data up to year 2050. If a date data is loaded into factSales, where it should be in TimeStamp with a data such as 2020-05-01 13:00, how can the FK1 DateKey know how to auto assign it to the date 2020-05-01? The same goes for other dimensions such as CustomerKey, where there will already exist some customer data.

Thank you.

user3118602 (147 rep)

Sep 1, 2022, 03:05 PM • Last activity: Sep 1, 2022, 11:08 PM

0 votes

1 answers

775 views

Boolean flag in fact table

data-warehouse dimensional-modeling

Imagine we have received the results of a health survey on daily consumption habits of 3 different items, like the following: |Id|Date|Age|Country|CigarettesPerDay|CoffeesPerDay|BeersPerDay| | --- | --- | --- | --- | --- | ---- | ---- | |1 |2021-12-31|35|US|0| 3 | 0 | |2 |2021-12-31|22|US|5|5 | 1 |...

                                  Imagine we have received the results of a health survey on daily consumption habits of 3 different items, like the following:

|Id|Date|Age|Country|CigarettesPerDay|CoffeesPerDay|BeersPerDay|
| --- | --- | --- | --- | --- | ---- | ---- |
|1 |2021-12-31|35|US|0| 3 | 0 |
|2 |2021-12-31|22|US|5|5 | 1 |
|3 |2021-12-31|53|US|3| 4 | 0 |
|... |...|...|...|... |... |
|11276|2021-12-31|44|France|3| 4 | 0 |

I want to model this in a star schema model. In the fact table, I create foreign key relationships  to date and item dimensions as well as a demographics dimension with country and age. I then sum up the number of respondents pr. demograhic group. If the number of respondents is above 100, I mark the group as being representative of the population. Finally I calculate the total and average consumption for each group. 

|DateId|ItemId|DemographicId|NumberOfRespondents|IsRepresentative|TotalConsumption|AverageConsumption|
|----|----|----|----|----|----|----|----|
|20211231|1 |1|70| No| 280 | 4 |
|20211231|1 |2|150|Yes|750 | 5 |
|20211231|1 |3|220|Yes| 660 | 3 |
|... |...|...|...|... |... |
|20211231|3|1000 |1|No| 0 | 0 |

For instance, there was 70 respondents from demographic 1 (e.g. country = US, age = 18). They have on average consumed 4 of item 1 (e.g. cigarettes). 

Generally we should strive to hold only facts and foreign keys in the fact table. However I personally don't think that a seperate dimension for the boolean flag provides any value. Can this flag be considered a generate dimension, or is it considered bad design to have it in the fact table?
                                

Mads (31 rep)

May 14, 2022, 11:11 AM • Last activity: Jul 27, 2022, 09:32 PM

1 votes

2 answers

498 views

What is the disadvantage of not creating surrogate key in DW?

t-sql data-warehouse dimensional-modeling

I want to create a data warehouse from 1 OLTP database. The tables in the OLTP db have got integer primary keys. So they are the business keys. The tables are: Client, customer, products and sales. They have primary key and foreign key relationships. I am writing an ETL to model this into dimensions...

                                  I want to create a data warehouse from 1 OLTP database. The tables in the OLTP db have got integer primary keys. So they are the business keys.

The tables are: Client, customer, products and sales. They have primary key and foreign key relationships.

I am writing an ETL to model this into dimensions and facts. 

My manager insists that I create surrogate keys. I know to achieve this I will have to load the dimension tables first (so that they get their surrogate key), and then load the fact table by using the business key to add the corresponding surrogate key into the fact tables. I know this is appropriate when the business keys are alphanumeric or large value but in my case the business keys are auto incrementing integers.

In my situation, what is the disadvantage of not creating surrogate key?

variable (3590 rep)

Dec 14, 2021, 11:44 AM • Last activity: Dec 15, 2021, 09:35 AM

Showing page 1 of 20 total questions