Sample Header Ad - 728x90

Handling new fields when merging in data using hashbytes?

2 votes
1 answer
690 views
We load in data from stage into our ODS and check for differences using hashbytes. We calculate hashbytes from the stage tables and insert/update into the destination table and also store the hashbytes value in the destination. The problem arises when there's a new field that we need to bring in from a source system. We'll add a new column in the ODS table with a default value but the calculated hashbytes are different because of this new field. As a result, everything gets updated even if nothing has changed. We'd have to update Hashbytes column in the large (300M but could be 1B rows) table whenever we add a new field which is not often but enough to be cumbersome. What are the best approaches to handle this? I'm thinking remove the hashbytes column from the ODS table and just calculate it in the proc for stage and ODS values. I inherited this code so I don't know why the hashbytes is stored in the ODS table, is that a best practice? **Approach #1** UPDATE b SET Field1 = a.Field1 ,Field2 = a.Field2 FROM source a INNER JOIN destination b ON a.PrimaryKeyField = b.PrimaryKeyField WHERE CAST(HASHBYTES('SHA1', CONCAT(a.[Field1], '|', a.[Field2]) CAST(HASHBYTES('SHA1', CONCAT(b.[Field1], '|', b.[Field2]) **Approach #2** UPDATE b SET Field1 = a.Field1 ,Field2 = a.Field2 FROM source a INNER JOIN destination b ON a.PrimaryKeyField = b.PrimaryKeyField WHERE (a.Field1 b.Field1 OR isnull(a.Field2,'') isnull(b.Field2,''))
Asked by Gabe (1396 rep)
Dec 28, 2018, 08:15 PM
Last activity: Jul 2, 2025, 11:03 AM