Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

542 views

Azure Dedicated SQL pool - group has db_datareader access but cannot login

sql-server azure-sql-data-warehouse azure-synapse-analytics dedicated-sql-pool

We have created Test AD Group and they should have readonly access to the database (schemas and tables, views) in the Azure SQL dedicated pool. Our DBA team did it but the users in the Test AD Group cannot login until they select the database as the default database in the connection dialog in SSMS....

                                  We have created Test AD Group and they should have readonly access to the database (schemas and tables, views) in the Azure SQL dedicated pool.

Our DBA team did it but the users in the Test AD Group cannot login until they select the database as the default database in the connection dialog in SSMS.

Is there a way to allow them to login while ONLY have readonly access to the database (including future schemas and tables that will be created)?

xmlapi (11 rep)

Dec 6, 2022, 06:22 PM • Last activity: Aug 2, 2025, 11:05 PM

1 votes

1 answers

1684 views

How to see what's using TempDb space in Azure SQL DW?

azure-sql-data-warehouse

I'm having problems with TempDb being full. Are there any system views I can see to diagnose what's using up TempDb and/or how much space is still available on it? It would be awesome to also see the TempDb transaction size limit too if possible.

                                  I'm having problems with TempDb being full. Are there any system views I can see to diagnose what's using up TempDb and/or how much space is still available on it? 

It would be awesome to also see the TempDb transaction size limit too if possible.

Neil P (1294 rep)

Dec 31, 2018, 10:45 AM • Last activity: Jul 29, 2025, 01:07 PM

0 votes

1 answers

157 views

SQL DB Role to get View Definition permission

azure-synapse-analytics

In Azure Synapse Serverless SQL Pool looking for an inbuilt DB Role that can be assigned to an SPN which can then read the definition of all views. I know that we do have the "GRANT VIEW DEFINITION" permission option. But I am looking for a DB role option.

                                  In Azure Synapse Serverless SQL Pool looking for an inbuilt DB Role that can be assigned to an SPN which can then read the definition of all views. 

I know that we do have the "GRANT VIEW DEFINITION" permission option. But I am looking for a DB role option.

Prakash (101 rep)

Jan 3, 2024, 12:10 AM • Last activity: Jul 13, 2025, 06:04 AM

0 votes

1 answers

715 views

Database Scoped Credential with User-Assigned Managed Identity possible?

azure-synapse-analytics

We have a data lake (Azure Data Lake Gen 2) that contains single folders with data for specific use cases in the 'gold' container, e.g. for a financial dashboard (folder 1) or a production dashboard (folder 2). For each use case, I want to create a Synapse SQL Serverless database and grant access to...

                                  We have a data lake (Azure Data Lake Gen 2) that contains single folders with data for specific use cases in the 'gold' container, e.g. for a financial dashboard (folder 1) or a production dashboard (folder 2).

For each use case, I want to create a Synapse SQL Serverless database and grant access to the respective folder in the Data Lake via a Database Scoped Credential. For this I want to make sure that database 1 also has access to folder 1 and database 2 also only has access to folder 2. I should control this via ACLs in the Data Lake.

For the authentication I first thought of 2 service principals. However, these are probably not supported if the Data Lake was protected by a firewall.

https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-storage-files-storage-access-control?tabs=service-principal#firewall-protected-storage 

Can I solve my problem e.g. via User Assigned Managed Identities? Are User Assigned Managed Identities supported for creating Database Scoped Credentials? Or is there a completely different idea?

alexsilviu05 (15 rep)

Jun 13, 2023, 07:53 AM • Last activity: Jul 11, 2025, 03:08 AM

1 votes

1 answers

188 views

Azure Dedicated SQL Pool - How to checksum a table?

sql-server azure-sql-data-warehouse azure-synapse-analytics dedicated-sql-pool

We're moving all our on premise tables into the azure dedicated sql pool. We're using synapse workspace pipelines to import the data into the dedicated pool. Is there a way to checksum a table to compare it against on premise table to make sure everything was imported correctly? I know in SQL Server...

                                  We're moving all our on premise tables into the azure dedicated sql pool. We're using synapse workspace pipelines to import the data into the dedicated pool. 

Is there a way to checksum a table to compare it against on premise table to make sure everything was imported correctly? I know in SQL Server there is a checksum function for a table but apparently it is not in

xmlapi (11 rep)

Dec 9, 2022, 01:39 PM • Last activity: Jun 24, 2025, 09:04 AM

0 votes

1 answers

284 views

Hash, Replication, and Round Robin Distributions - Need more clarification

azure-sql-data-warehouse azure-synapse-analytics dedicated-sql-pool

I will give a few examples of the tables we have: Table1 - Most queries are State Code and Year, has about 1000 rows, will grow by 1000 rows a year, used Round Robin Table2 - Most queries are by State Code, Year, Column 1 (String), has about 1000 rows, will grow by 1000 rows a year, used Round Robin...

                                  I will give a few examples of the tables we have:

    Table1 - Most queries are State Code and Year, has about 1000 rows, will grow by 1000 rows a year, used Round Robin
    
    Table2 - Most queries are by State Code, Year, Column 1 (String), has about 1000 rows, will grow by 1000 rows a year, used Round Robin
    
    Table3 - Most queries are by State Code and Year, has about 100,000 rows, will grow by about 25000 rows a year, used Hash
    
    Table4 - Most queries are by Year, has about 100,000 rows, will grow by about 25000 rows a year, plan to use Hash
    
    Lookup Table 5 - Most queries are by State Code and Look up Id, has about 10 rows, used Replication
    
    Lookup Table 6 - Most queries are by State Code and Look pup Id, has about 500 rows, used Replication

Did we use the correct types of distribution? Can someone give a more concrete/better example/clarification of when/why you should use each type of distribution? Microsoft's documentation/guidance wasn't very helpful.

xmlapi (11 rep)

Dec 20, 2022, 09:26 PM • Last activity: May 11, 2025, 04:06 PM

2 votes

1 answers

2004 views

Azure Synapse SQL Pool: Issue in MERGE INSERT UPDATE statement

azure-sql-database azure-synapse-analytics

I have a table in Azure Synapse Dedicated SQL POOL. CREATE TABLE table_A ( ID int IDENTITY(1,1), ClientID varchar(10), fitrstname varchar(20), lastname varchar(30), phone varchar(20), address varchar(100), milageRun decimal(18,8) CONSTRAINT (PK_table_A) PRIMARY NONCLUSTERED ( ClientID ASC ) ) WITH (...

                                  I have a table in Azure Synapse Dedicated SQL POOL. 

    CREATE TABLE table_A
      (
      ID int IDENTITY(1,1),
      ClientID    varchar(10),
      fitrstname  varchar(20),
      lastname    varchar(30),
      phone       varchar(20),
      address     varchar(100),
      milageRun   decimal(18,8)
      CONSTRAINT (PK_table_A) PRIMARY NONCLUSTERED
        (
          ClientID ASC
        )
       )
       WITH
        (
         DISTRIBUTION = ROUND_ROBIN,
         HEAP
        )
       GO

Now when I am trying to do a MERGE-INSERT-UPDATE the above table using the below syntax

      MERGE table_A AS TARGET
        USING table_B AS SOURCE
        ON table_A.ClientID = table_B.ClientID
      WHEN MATCHED THEN 
      UPDATE SET
      TARGET.milageRun = SOURCE.odometerCount
      IF NOT MATCHED BY TARGET THEN 
      INSERT INTO (milageRun)
      VALUES (SOURCE.odometerCount)

The above SQL script is throwing two different errors:

    1. Msg 50000, Level 16, State 1, Line 1
       Merge statements with a WHEN NOT MATCHED [BY TARGET] clause must target a hash 
       distributed table.
    2. Can not update ID column

But the same above query works fine when I use normal On-Prem SQL Server or Azure SQL Database. 

In the above example table_A is a small table and contains < 1000 data. As per Microsoft recommendation we should use Hash distribution only when there are huge records. 

So what could be the best approach?

Secondly, even if I am not updating the ID column, why the 2nd error is coming? 

Any expert advice would be helpful. 



    

                                

pythondumb (129 rep)

Jul 16, 2022, 01:58 AM • Last activity: May 4, 2025, 07:07 AM

0 votes

0 answers

21 views

Failover group for dedicated SQL pool

hadr azure-synapse-analytics

we're looking at DR options for our Azure SQL dedicated pool. Most documentation refers to backup and restore solutions, but it looks like I can add it to a failover Group, however I can't find any docs that refer to it as an option. The storage costs for our SQL pool aren't huge, so replicating it...

                                  we're looking at DR options for our Azure SQL dedicated pool. Most documentation refers to backup and restore solutions, but it looks like I can add it to a failover Group, however I can't find any docs that refer to it as an option.

The storage costs for our SQL pool aren't huge, so replicating it from a cost point of view would be ok.

Has anyone else used a Failover group with a dedicated pool? Does it work?

Cheers

Alex

AlexP012 (53 rep)

Apr 14, 2025, 10:34 AM

0 votes

1 answers

1235 views

Accessing Secured Storage account from Synapse Notebook

azure-synapse-analytics

We have a Firewalled storage account which has a few files that we need to access via Synapse notebook. We do that with below code : abfss_path = f'abfss://container@storageaccount.dfs.core.windows.net/data.csv' df = spark.read.load(abfss_path, format='csv',header=True) display(df.limit(10)) The man...

                                  We have a Firewalled storage account which has a few files that we need to access via Synapse notebook. We do that with below code :

    abfss_path = f'abfss://container@storageaccount.dfs.core.windows.net/data.csv'        
    df = spark.read.load(abfss_path, format='csv',header=True)
    display(df.limit(10))

The managed identity and my user (objectid) for the Synapse workspace has ' Storage Blob Data Contributor' role on the storage account. 

The below have been enabled:
- Allow Azure services on the trusted services list to access this storage account.
- Specify resource instances that will have access to your storage account based on their system-assigned managed identity.

Running the above code works from the pipeline with a synapse notebook task but **fails when run from the studio through the notebook cell**.

Error : ***Operation failed: "This request is not authorized to perform this operation.", 403***

I have tried everything but can't get it to work. 

Via Pipeline

Via Studio Notebook Cell

I can see AAD (pass through) takes precedence when running from studio, so I tried running the session as 'managed Identity'. This doesn't change anything and I keep getting same error.

How to get this working?

Note : I am not using the managed vnet in my synapse workspace.

Ramakant Dadhichi (2338 rep)

Oct 20, 2022, 03:52 PM • Last activity: Apr 11, 2025, 09:08 AM

4 votes

1 answers

642 views

Revision Tracking & Source Control for Azure SQL Data Warehouse

data-warehouse tools visual-studio-2015 source-control azure-sql-data-warehouse

What is a good approaching for tracking incremental changes to database tables, stored procedures, etc for Azure SQL Data Warehouse? I am in the process of moving a large database over to Azure SQL Data Warehouse. The prior approach for change tracking was using a 'Database Project' in Visual Studio...

                                  What is a good approaching for tracking incremental changes to database tables, stored procedures, etc for Azure SQL Data Warehouse?

I am in the process of moving a large database over to Azure SQL Data Warehouse. The prior approach for change tracking was using a 'Database Project' in Visual Studio 2015. This allows easy source control integration with TFS or Git or whatever. When you want to publish, you just target the destination database and it generates a change script.

This functionality does not at all work for Azure SQL Data Warehouse. Visual Studio (and the latest SSDT) simply can't target SQL DW. This means the process of publishing is extremely tedious, entirely manual and extremely error prone.

Is there another comparable approach you are using for this type of project?

John Hargrove (149 rep)

Oct 20, 2017, 05:10 AM • Last activity: Apr 10, 2025, 01:08 PM

1 votes

3 answers

7047 views

How to get queries executed in Synapse SQL?

sql-server azure-sql-database azure-synapse-analytics

I need to find all the query executed on my DB in last 1Hr in Synapse workspace with SQL pool. Will apply my logic on top of the result set. What's the right table/view to look for? 1. sys.dm_exec_requests 2. sys.dm_exec_requests_history 3. sys.dm_exec_query_stats

                                  I need to find all the query executed on my DB in last 1Hr in Synapse workspace with SQL pool. Will apply my logic on top of the result set.

What's the right table/view to look for?
1. sys.dm_exec_requests
2. sys.dm_exec_requests_history
3. sys.dm_exec_query_stats

Santosh Hegde (131 rep)

Mar 28, 2022, 09:19 AM • Last activity: Jul 1, 2024, 11:31 PM

0 votes

1 answers

51 views

Error when using sys.query_store_plan table in Azure Synapse Dedicated sq pool

azure-synapse-analytics

I am getting below error when executing this query in my dedicated sql pool. select * from sys.query_store_plan I do not use any columns in my selection. Not sure why this happens. Error message: Msg 207, Level 16, State 1, Line 102 Invalid column name 'has_compile_replay_script'. Invalid column nam...

                                  I am getting below error when executing this query in my dedicated sql pool. 
 
    select * from sys.query_store_plan 

I do not use any columns in my selection. Not sure why this happens. 

Error message: 

Msg 207, Level 16, State 1, Line 102
Invalid column name 'has_compile_replay_script'.
Invalid column name 'is_optimized_plan_forcing_disabled'.
Invalid column name 'plan_type'.
Invalid column name 'plan_type_desc'.

All the above columns that shows in error message are not applicable for Azure Synapse and my query has no column selection.

https://learn.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-query-store-plan-transact-sql?view=sql-server-ver16

SOUser (31 rep)

Apr 16, 2024, 06:51 AM • Last activity: Apr 16, 2024, 08:28 AM

0 votes

1 answers

97 views

Divide by Zero error in Materialized Views in Azure Synapse Dedicated SQL Pool

azure azure-synapse-analytics dedicated-sql-pool

I have a materialized view in Azure synapse Analytics dedicated sql pool which has a calculate column X / denominator. When **I select the view select * from myview** it does not give me divide by zero error. But when I select Select * from myview where denominator = 0 I get divide by zero error Whe...

                                  I have a materialized view in Azure synapse Analytics dedicated sql pool which has a calculate column X / denominator. 
When **I select the view select * from myview** it does not give me divide by zero error. 
But when I select 
Select * from myview where denominator = 0 I get divide by zero error

When I use **select * from myview order by denominator** I am not getting divide by zero error. 

How can this be explained. I know the Arithabort setting but not sure how to check the value set to this in Synapse dedicated SQL Pool. Any help would be appreciated. 
 
                                

SOUser (31 rep)

Mar 25, 2024, 05:17 AM • Last activity: Mar 25, 2024, 12:25 PM

1 votes

1 answers

101 views

Increment a field in runtime for Synapse SQL dedicated SQL Pool

sql-server azure azure-synapse-analytics

I have a task to increment the expected flag column by looking in to existing values. I solved this using while loop and I can get the expected value updated using while loop in Azure synapse-Sql Data warehouse. But I would like to know a direct way to do this using any ranking functions? My sample...

                                  I have a task to increment the expected flag column by looking in to existing values. 
I solved this using while loop and I can get the expected value updated  using while loop in Azure synapse-Sql Data warehouse. 
But I would like to know a direct way to do this using any ranking functions? 

My sample Data

    CREATE TABLE [dbo].[work_table](
	[work_order_key] [int] NULL,
	[modification_date] [datetime2](7) NULL,
	[from_status] [varchar](1000) NULL,
	[to_status] [varchar](100) NULL,
	[count] [int] NULL,
	[sequence] [int] NULL,
	[count_VCR] [int] NULL,
	[flag] [int] NULL,
	[flag_expected] [int] NULL
      )

      INSERT [dbo].[work_table] ([work_order_key], [modification_date], [from_status], [to_status], [count], [sequence], [count_VCR], [flag], [flag_expected]) VALUES (1002586, CAST(N'2022-04-14T11:53:05.1630000' AS DateTime2), N'Not Reviewed', N'Planned', 1, 1, 1, 1, NULL)
      INSERT [dbo].[work_table] ([work_order_key], [modification_date], [from_status], [to_status], [count], [sequence], [count_VCR], [flag], [flag_expected]) VALUES (1002586, CAST(N'2022-04-19T03:49:25.7370000' AS DateTime2), N'Planned', N'In Progress', 1, 2, 1, 1, NULL)
      INSERT [dbo].[work_table] ([work_order_key], [modification_date], [from_status], [to_status], [count], [sequence], [count_VCR], [flag], [flag_expected]) VALUES (1002586, CAST(N'2022-04-19T04:22:33.0630000' AS DateTime2), N'In Progress', N'Awaiting Parts', 1, 3, 1, 1, NULL)
      INSERT [dbo].[work_table] ([work_order_key], [modification_date], [from_status], [to_status], [count], [sequence], [count_VCR], [flag], [flag_expected]) VALUES (1002586, CAST(N'2022-04-27T02:58:54.7570000' AS DateTime2), N'Awaiting Parts', N'Parts Ordered', 1, 4, 0, 1, NULL)
      INSERT [dbo].[work_table] ([work_order_key], [modification_date], [from_status], [to_status], [count], [sequence], [count_VCR], [flag], [flag_expected]) VALUES (1002586, CAST(N'2022-04-27T02:59:00.8530000' AS DateTime2), N'Parts Ordered', N'Parts Received', 1, 5, 0, 1, NULL)
      INSERT [dbo].[work_table] ([work_order_key], [modification_date], [from_status], [to_status], [count], [sequence], [count_VCR], [flag], [flag_expected]) VALUES (1002586, CAST(N'2022-04-27T02:59:09.9000000' AS DateTime2), N'Parts Received', N'Planned', 1, 6, 0, 1, NULL)
      INSERT [dbo].[work_table] ([work_order_key], [modification_date], [from_status], [to_status], [count], [sequence], [count_VCR], [flag], [flag_expected]) VALUES (1002586, CAST(N'2022-04-28T07:59:58.1130000' AS DateTime2), N'Planned', N'In Progress', 1, 7, 1, 1, NULL)
      INSERT [dbo].[work_table] ([work_order_key], [modification_date], [from_status], [to_status], [count], [sequence], [count_VCR], [flag], [flag_expected]) VALUES (1002586, CAST(N'2022-04-29T07:53:30.9030000' AS DateTime2), N'In Progress', N'Work Completed', 1, 8, 1, 1, NULL)
      INSERT [dbo].[work_table] ([work_order_key], [modification_date], [from_status], [to_status], [count], [sequence], [count_VCR], [flag], [flag_expected]) VALUES (1002586, CAST(N'2022-05-05T06:15:34.4300000' AS DateTime2), N'Work Completed', N'Completed', 1, 9, 0, 1, NULL)


My expected result is


Partition column is [work_order_key] and any column from the list as the way it is suitable. 
While loop can get this answer. I used work_order_key and count_VCR column to get this in a while loop. But I want to check with the group for direct query using the any SQL window or ranking functions. 

I cannot add my solution as it is production one. This is sample data set only. 
Thanks for the support.
                                

SOUser (31 rep)

Feb 9, 2024, 08:18 AM • Last activity: Feb 10, 2024, 07:15 PM

0 votes

0 answers

20 views

Distributions when only one physical node

optimization distributed-databases azure-synapse-analytics

Working on Azure Synapse, we have for now around 30 tables on a dev environment. I want to optimize the tables before replicating them in qal and prod env. As far as I understand, we only have one physical node: Subscription DW100c, and when running `SELECT DISTINCT(pwd_node_id) FROM sys.pdw_nodes_i...

                                  Working on Azure Synapse, we have for now around 30 tables on a dev environment.

I want to optimize the tables before replicating them in qal and prod env.

As far as I understand, we only have one physical node: Subscription DW100c, and when running SELECT DISTINCT(pwd_node_id) FROM sys.pdw_nodes_indexes I get a unique id.
At the moment, everything is round-robin and because there is unicity of node, I guess that transitionning some to hash won't lead in any improvement of performance, and will rather lead to a loss in loading time for the PowerBI semantic models that refresh every day.

Could you please confirm?

Gregoire_M (1 rep)

Feb 5, 2024, 05:05 PM • Last activity: Feb 5, 2024, 05:13 PM

0 votes

1 answers

792 views

Generating MD5 Hash for NVARCHAR and VARCHAR columns across different databases

unicode hashing netezza azure-synapse-analytics

I am building a utility that generates a single MD5 hash across all CONCATENATED column values for any table row. To make this work, I have to eliminate NULL values using COALESCE and must CAST NUMBER and DATE types to VARCHAR using consistent patterns. The purpose of this utility is to compare all...

                                  I am building a utility that generates a single MD5 hash across all CONCATENATED column values for any table row. To make this work, I have to eliminate NULL values using COALESCE and must CAST NUMBER and DATE types to VARCHAR using consistent patterns. 

The purpose of this utility is to compare all the data across two different databases as part of a data migration. We wanted to make sure all of our converted ETL code (stored procedures) produced the same results with 100% accuracy after porting it from Netezza to Azure Synapse Analytics.

For the most part, this works extremely well and we were able to identify numerous bugs using this approach but still, this process isn't perfect ... We cannot use this with BLOB columns and we sometimes get slight differences with FLOAT types. But the real headache is how NVARCHAR is hashed differently between Netezza and Azure Synapse Analytics.
 
Here is an example... First, I will demonstrate what Netezza gives me.

    show nz_encoding
    -- NOTICE: UTF8
    
    show server_encoding
    --  Current server encoding is LATIN9
    
    CREATE TABLE ENCODING_TEST (
        VALUE1  CHARACTER VARYING(255),
    	VALUE2  NATIONAL CHARACTER VARYING(255)
    );
    -- The command completed successfully
    
    INSERT INTO ENCODING_TEST (VALUE1, VALUE2) VALUES('très bien', 'très bien');
    -- 1 rows affected
    
    SELECT RAWTOHEX(HASH('très bien', 0)) FROM ENCODING_TEST
    -- E5D128AFD34139A261C507DA18B3C558
    
    SELECT RAWTOHEX(HASH(VALUE1, 0)) FROM ENCODING_TEST
    -- E5D128AFD34139A261C507DA18B3C558
    
    SELECT RAWTOHEX(HASH(VALUE2, 0)) FROM ENCODING_TEST
    -- A54489E883CE7705CDBE1FDAA3AA8DF4
     
    SELECT RAWTOHEX(HASH(CAST(VALUE2 AS VARCHAR(255)), 0)) FROM ENCODING_TEST
    -- E5D128AFD34139A261C507DA18B3C558

And here is what Azure Synapse Analytics gives me...

    CREATE TABLE WZ_BI.ENCODING_TEST (
        VALUE1  CHARACTER VARYING(255),
    	VALUE2  NATIONAL CHARACTER VARYING(255)
    );
    -- Commands completed successfully.
    
    INSERT INTO WZ_BI.ENCODING_TEST (VALUE1, VALUE2) VALUES('très bien', 'très bien');
    -- (1 row affected)
    
    SELECT HASHBYTES ('MD5', 'très bien') FROM WZ_BI.ENCODING_TEST
    -- 0xE5D128AFD34139A261C507DA18B3C558
    
    SELECT HASHBYTES ('MD5', VALUE1) FROM WZ_BI.ENCODING_TEST
    -- 0xE5D128AFD34139A261C507DA18B3C558
    
    SELECT HASHBYTES ('MD5', VALUE2) FROM WZ_BI.ENCODING_TEST
    -- 0xC43534A6812499038457EDF545834866
    
    SELECT HASHBYTES ('MD5', CAST(VALUE2 AS VARCHAR(255))) FROM WZ_BI.ENCODING_TEST
    -- 0xE5D128AFD34139A261C507DA18B3C558

The question I have is how come the MD5 hash for the NVARCHAR column in Netezza is different than the MD5 hash of the same type and value in Azure Synapse Analytics? I mean it treats the VARCHAR types the same? I really do not want to have to explicitly CAST all NVARCHAR types to VARCHAR to get these to work, but I have not found any other way to make them equivalent.

What am I missing here?
                                

Lauren_G (69 rep)

Jun 29, 2023, 02:38 AM • Last activity: Jan 12, 2024, 02:05 AM

1 votes

2 answers

534 views

SQL Server tempdb files growth increment size

sql-server azure-sql-database azure-sql-data-warehouse azure-sql-managed-instance

In our `Azure SQL Managed Instance`, we have 12 `tempdb` data files each of size 11GB. The Log file growth value is set to a max default size of 2TB. The growth value of each data file is 32GB. **Question**: Based on the above numbers, what should be the growth increment size of these 12 `tembdb` da...

                                  In our Azure SQL Managed Instance, we have 12 tempdb data files each of size 11GB. The Log file growth value is set to a max default size of 2TB. The growth value of each data file is 32GB.

**Question**: Based on the above numbers, what should be the growth increment size of these 12 tembdb data files? NOTE: Question is specific to tempdb data files.

**Remark**: As a best practice recommended by Azure documents , one should have growth increment of these files to a reasonable size to prevent the tempdb database files from growing by too small a value. If the file growth is too small compared to the amount of data that's being written to tempdb, tempdb might have to constantly expand. That will affect performance. NOTE: Ours is largely a data warehouse project where large amount of data is imported that is then used for Data Analytics.

nam (515 rep)

Sep 7, 2022, 11:19 PM • Last activity: Nov 17, 2023, 02:15 PM

1 votes

0 answers

241 views

Schema Drift & Delta Table update - cannot resolve `target.new_column` in UPDATE clause given columns [list of columns]

azure-synapse-analytics

Using Azure Synapse Analytics, I have configured a pipeline and a Mapping Data Flow that were working great until I recently added a Derived Column to my Mapping Data Flow, which created a column that **was not originally present when I created the table in the Database Designer.** The column just a...

()

to help me track when rows have been updated. I assumed that the Mapping Data Flow would create a new column, as

Drift

is enabled. **It didn't.** So I created the new column in the Database Designer, but that didn't help either, I receive the same error, as in the post title: >cannot resolve target.new_column in UPDATE clause given columns [Column List]. It then repeats the same error, and provides this which I have included in hopes that it would be helpful. >

\n\tat org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)\n\tat org.apache.spark.sql.catalyst.plans.logical.DeltaMergeInto$.$anonfun$resolveReferences$3(deltaMerge.scala:303)\n\tat scala.collection.immutable.List.foreach(List.scala:392)\n\tat org.apache.spark.sql.catalyst.plans.logical.DeltaMergeInto$.resolveExprsOrFail$1(deltaMerge.scala:299)\n\tat org.apache.spark.sql.catalyst.plans.logical.DeltaMergeInto$.resolveOrFail$1(deltaMerge.scala:312)\n\tat org.apache.spark.sql.catalyst.plans.l","failureType":"UserError","target":"lhApplicationUsers to Delta DF","errorCode":"DFExecutorUserError"}

Some hopefully helpful facts: - Sink is a Delta table in an ADLS2 container - Source is a different ADLS2 container with CSV - Confirmed that Database Designer column format is correct What I have determined is that, when I import the schema from the connection, the new column is **not** present. However, in the database view and in a SELECT statement, the new column does appear and it is null (as expected since the Mapping Data Flow has not run). I have tried recreating the table in the Database Designer, I have tried deleting and re-adding the Sink in mapping data flow. I have tried changing the format to Date and using

()

(to eliminate column formatting errors). What I have not done is deleted the underlying data, as I really don't want to do that unless I absolutely must. Any suggestions on how to address this problem?

leleusi (11 rep)

Sep 6, 2023, 05:56 PM • Last activity: Sep 6, 2023, 07:48 PM

Showing page 1 of 18 total questions