SSAS Processing Sanity check: Am I using partitions effectively?
0
votes
0
answers
33
views
I have been rebuilding a DataWarehouse and lately I have been focusing on decreasing processing times. This is using MS SQL Server 2022. I have been implementing partitions on some of the larger fact tables where appropriate, and am wondering if the way I am approaching partition processing is effective.
Here's my situation:
I am running an SSIS job daily to do the following:
1) Truncate fact and dimension tables that will be reloaded with fresh data
2) Load Fact and Dimension tables with fresh data from sources
3) Rebuild indexes (>30% fragmentation, >1000 pages)
4) Reorganize Indexes (>15% fregmentation, >1000 pages)
5) Update Statistics
6) SSAS Process task - Process Full on the model
This job has been taking longer than desired. I have been working on decreasing time by taking a more targeted approach to model processing. Rather than perform a "Process Full" against the model, I am trying the following in order:
Fully process dimension tables, Fully process fact tables, Fully Process measure table, recalculate model.
Two of my fact tables are not rebuilt daily, but instead have a daily dataset added each day(snapshot). I have decided to partition these tables as they are burdensome. My partitioning strategy is what I'm mostly hoping for critique of, though I'd be happy for thoughts on any part of my approach.
To partition these snapshot tables, I have taken the following approach: I've created two database views for each table, latest and older. The views use a stored procedure as the date filter, which determines the last successful run date of this SSIS job. For instance, if my daily SSIS job has failed since March 8th, the "Latest" views would contain all snapshot data >= March 8th. "Older" views would include snapshot data < March 8th. My goal here is to avoid unnecessary processing of these snapshot tables due to their size. Each snapshot table in the model has two partitions: Latest and Older.
I have configured my processing task to fully process only the "Latest" partition of these snapshot tables. Once a week I run a Process Full on the entire model to cover my bases.
I am concerned about whether this is going to effectively keep my model up to date. It is performing admirably timewise, and I will be performing extensive testing to ensure data consistency, but am also interested if this is a coherent approach or if it's flawed.
Thank you for any thoughts or advice you can provide!
Asked by Kad
(1 rep)
Mar 11, 2025, 02:36 PM