Do Azure SQL and SQL Server still call-out into the CLR for FORMAT and PARSE? If so, which CLR version and how does marshalling affect performance?
0
votes
1
answer
681
views
The documentation for SQL Server's T-SQL
FORMAT
function implies calls are forwarded to .NET's ToString
methods on .NET types corresponding to T-SQL types, and TRY_PARSE
and PARSE
's documentation is quite explicit about it using the CLR:
> https://learn.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql
> ## FORMAT( value, format, culture )
> **format
**: The format
argument must contain a valid .NET Framework format string, either as a standard format string (for example, "C"
or "D"
), or as a pattern of custom characters for dates and numeric values (for example, "MMMM DD, yyyy (dddd)"
). Composite formatting is not supported.
>
> **culture
**: culture
accepts any culture supported by the .NET Framework as an argument; it is not limited to the languages explicitly supported by SQL Server. If the culture argument is not valid, FORMAT
raises an error.
and:
> https://learn.microsoft.com/en-us/sql/t-sql/functions/parse-transact-sql?view=sql-server-ver16
> ## PARSE( string_value AS data_type [ USING culture ] )
> ### Remarks
> * Keep in mind that there is a certain performance overhead in parsing the string value.
> * PARSE
relies on the presence of the .NET Framework Common Language Runtime (CLR).
> * This function will not be remoted since it depends on the presence of the CLR. Remoting a function that requires the CLR would cause an error on the remote server.
So using FORMAT( intValue, 'N2', 'en-US' )
and PARSE( '2023-02-21' AS date )
in a T-SQL SELECT
projection will cause SQL Server to somehow invoke .NET's Int32.ToString( "N2", CultureInfo.GetCultureInfo( "en-US" ) )
and DateTime.Parse( "2023-02-21" )
methods at runtime.
...which strikes me as introducing considerable overhead, as SQL Server would need to use .NET's runtime type/value marshalling to convert from SQL Server's own internal representation of varchar(n)
and nvarchar
values to .NET's UTF-16-based strings, invoking those .NET methods (which, in many cases, are forwarded-out to non-CIL native code within the CLR... thus making the CLR part pointless, no?)
But more profoundly, Azure SQL notably disabled SQL-CLR shortly after launch for security reasons and there's no indication Microsoft will bring the feature back, except [in their _Managed Instance_ service](https://learn.microsoft.com/en-us/shows/data-exposed/its-just-sql-clr-in-azure-sql-database-managed-instance) , so if Azure SQL has the SQL-CLR disabled, how is it that our T-SQL scripts have no problem with using .NET-based FORMAT
/PARSE
functions?
-----
Based on this, I have some questions, which all basically boil-down to _"What's going on, eh?"_:
* Computers can have multiple versions of .NET installed side-by-side, so exactly what version of the CLR and BCL are being used when you use FORMAT
/PARSE
/TRY_PARSE
?
* e.g. .NET Framework 4.8 x64 vs. .NET Core 2.1 vs. .NET 5 vs. .NET 7+
* [The official documentation for SQL-CLR doesn't seem to have been updated](https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/sql-server-common-language-runtime-integration) since .NET 5+ effectively replaced Framework 4.x, so my reading implies it always uses .NET Framework 4.x...
* How does it work on SQL Server-for-Linux where only _post-.NET Core 3.1_ versions of .NET are available?
* The ("_classic_") .NET Framework 4.x is very Windows-specific, and I know SQL Server-for-Linux is essentially its own operating-system, but that seems like a lot of engineering-effort to keep using .NET Framework 4.x in this situation. It boggles the mind.
* Whereas if SQL Server is using .NET Core, or .NET 5+ instead of the .NET Framework 4.8 runtime, then how can we use .NET 5+ in our own SQL-CLR projects?
* How does this impact parallelisation of queries and query-plan generation?
* The article about SQL-CLR performance does not make any mention of scalar parallelisation nor UDF inlining at all, beyond saying that T-SQL built-ins are generally better than SQL-CLR, but that SQL-CLR is still far better than old-world CURSOR
-based approaches.
* ...but I assume that using a scalar SQL-CLR function in a query breaks parallelisation (as .NET doesn't expose a way to explicitly mark a method as being reentrant or thread-safe, so SQL Server would have no way of knowing that, so it would _have to_ disable parallelisation) - which kills performance, on top of the type-marshalling overhead - it's very odd that this factor isn't mentioned in the documentation.
* And if Azure SQL has SQL-CLR disabled, then how do FORMAT
and PARSE
work? Does Azure SQL have a _native_ built-in reimplementation of these functions that avoid the CLR entirely, or does it _actually_ have SQL-CLR enabled but only allows a small set of _blessed_ .NET BCL functions to be called? If so, then why _only_ those functions when there exist _many_ other useful functions we'd also like to call from T-SQL (composite string-formatting, Calendar
, Regex
, etc) - the whole thing seems _very arbitrarily limited_.
* And if Azure SQL _does_ have SQL-CLR running internally but only for first-party code, why does [the clr enabled
option](https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/clr-enabled-server-configuration-option) return 0
(and many other CLR-related properties/functions seem broken) unless Azure SQL's T-SQL sandboxing is doing some convoluted hiding-of-functionality?
-----
## Update
(I hate it when my initial research doesn't find anything helpful, then I post to SO/SE, and then _right afterwards_ I search Google again with slightly different keywords and find out what I wanted to know... so here's some information I just found, but I don't want to post it as an answer just yet, maybe in a couple of weeks if no-one else can post a succint, accurate, recent and relevant answer)
* According to https://github.com/MicrosoftDocs/sql-docs/issues/1594 and my own quick research:
* See also: https://stackoverflow.com/a/41114654/159145
* As of Q1 2023, all versions of SQL Server, up-to-and-including SQL Server 2022, including SQL Server for Linux, use _some version_ of the .NET Framework 4.x.
* On Windows Server, it uses whatever the latest version of the OS-provided .NET Framework 4.x is, which should be .NET 4.8.1.
* On Linux, it uses a private build of .NET Framework 4.x that runs within SQL Server's Windows-abstraction layer "SQLPAL" which is also involved with sandboxing SQL-CLR user-code.
* On Linux, the EXTERNAL_ACCESS
option (which would allow SQL-CLR code to break-out of the sandbox) is not supported, ditto on _Azure SQL Server Managed Instance_.
* However I haven't been able to find out how/why PARSE
/FORMAT
/etc works in Azure SQL despite all the indications that SQL-CLR is entirely disabled - or the performance/overhead/query-plan implications of the native-to-managed-and-back process involved in invoking .NET code from T-SQL.
Asked by Dai
(632 rep)
Feb 21, 2023, 07:18 PM
Last activity: Feb 21, 2023, 11:43 PM
Last activity: Feb 21, 2023, 11:43 PM