Updating a big replicated Dimension (SQL Server PDW)

8 votes

3 answers

1690 views

sql-server data-warehouse sql-server-pdw

                          We use a SQL Server PDW appliance  for our data warehouse. One of the tables in our warehouse is a replicated table with about 20 million rows. As part of our ETL process we need to expire old records from this dimension; however, we are seeing that updating a handful of records (<100) takes over 1 hour to complete. This is what I would like to improve if I can.

Naturally, one option that I thought about was changing this Dimension from Replicated to Distributed. My testing shows that it would fix the issue with the ETL process taking long (from 1.5 hours came down to 30 secs) but all the joins against the Distributed version of this dimension would be affected since the joins are almost never based on the same distribution column. When I look at the execution plan of some of these queries I usually see either a *ShuffleMove * or a *BroadcastMove* operation. 

So my question to the PDW guru's here is:

Is there anything else that can be done in order to improve the performance of updating records in the **replicated** version of this Dimension? 

Again, moving to a Distributed table doesn't seem to be the best solution since it will affect hundreds of already written SQL queries and reports developed by other people.

Asked by Icarus (337 rep)

Aug 12, 2013, 09:03 PM
Last activity: Oct 19, 2014, 08:28 PM

Updating a big replicated Dimension (SQL Server PDW)

Related Questions