Updating a big replicated Dimension (SQL Server PDW)
8
votes
3
answers
1690
views
We use a SQL Server PDW appliance for our data warehouse. One of the tables in our warehouse is a replicated table with about 20 million rows. As part of our ETL process we need to expire old records from this dimension; however, we are seeing that updating a handful of records (<100) takes over 1 hour to complete. This is what I would like to improve if I can.
Naturally, one option that I thought about was changing this Dimension from Replicated to Distributed. My testing shows that it would fix the issue with the ETL process taking long (from 1.5 hours came down to 30 secs) but all the joins against the Distributed version of this dimension would be affected since the joins are almost never based on the same distribution column. When I look at the execution plan of some of these queries I usually see either a *ShuffleMove * or a *BroadcastMove* operation.
So my question to the PDW guru's here is:
Is there anything else that can be done in order to improve the performance of updating records in the **replicated** version of this Dimension?
Again, moving to a Distributed table doesn't seem to be the best solution since it will affect hundreds of already written SQL queries and reports developed by other people.
Asked by Icarus
(337 rep)
Aug 12, 2013, 09:03 PM
Last activity: Oct 19, 2014, 08:28 PM
Last activity: Oct 19, 2014, 08:28 PM