Bulk SQL INSERT into Azure SQL Database using spark causes blocking/contention?

0 votes

1 answer

969 views

sql-server azure apache-spark query-performance

                          I am running the following code using microsoft's sql sparkconnector to write a 1-2 Billion dataframe into Azure SQL Database.

    df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("append") \
    .option("url", secrets.db.url) \
    .option("dbtable", 'tableName') \
    .option("user", secrets.db.user) \
    .option("password", secrets.db.password) \
    .option("batchsize", 1048576) \
    .option("schemaCheckEnabled", "false") \
    .option("BulkCopyTimeout", 3600) \
    .save()





    EXEC sp_WhoIsActive
    @find_block_leaders = 1,
    @sort_order = '[blocked_session_count] DESC'

The wait_info column's value is Resource_Semaphore.



Configs:
My dataframe is partitioned over 2100 partitions on a cluster of 900 cores
My database is 14 vcores in General Purpose tier on Azure.

My query is incredibly slow because of this blocking. It's almost like it's running one bulk insert from my cluster at a time. Any suggestions on what to change to speed it up? or any insights into why it's blocking? 
                        

Asked by Youssef Fares

Dec 22, 2020, 03:35 PM
Last activity: Mar 11, 2025, 07:09 PM

Bulk SQL INSERT into Azure SQL Database using spark causes blocking/contention?

Related Questions