Handling new fields when merging in data using hashbytes?

2 votes

1 answer

690 views

sql-server sql-server-2017

                          We load in data from stage into our ODS and check for differences using hashbytes. We calculate hashbytes from the stage tables and insert/update into the destination table and also store the hashbytes value in the destination. 

The problem arises when there's a new field that we need to bring in from a source system. We'll add a new column in the ODS table with a default value but the calculated hashbytes are different because of this new field. As a result, everything gets updated even if nothing has changed. We'd have to update Hashbytes column in the large (300M but could be 1B rows) table whenever we add a new field which is not often but enough to be cumbersome.

What are the best approaches to handle this? 

I'm thinking remove the hashbytes column from the ODS table and just calculate it in the proc for stage and ODS values. I inherited this code so I don't know why the hashbytes is stored in the ODS table, is that a best practice?

**Approach #1**

    UPDATE b
    SET Field1 = a.Field1
    	,Field2 = a.Field2
    FROM source a 
    INNER JOIN destination b ON a.PrimaryKeyField = b.PrimaryKeyField
    WHERE CAST(HASHBYTES('SHA1', CONCAT(a.[Field1], '|', a.[Field2])
    	 CAST(HASHBYTES('SHA1', CONCAT(b.[Field1], '|', b.[Field2])
    	
**Approach #2**

    
    UPDATE b
    SET Field1 = a.Field1
    	,Field2 = a.Field2
    FROM source a 
    INNER JOIN destination b ON a.PrimaryKeyField = b.PrimaryKeyField
    WHERE  (a.Field1  b.Field1
    	OR isnull(a.Field2,'')  isnull(b.Field2,''))

                        

Asked by Gabe (1396 rep)

Dec 28, 2018, 08:15 PM
Last activity: Jul 2, 2025, 11:03 AM

Handling new fields when merging in data using hashbytes?

Related Questions