Data servers: Do query optimisers re-write queries removing redundant columns during the plan creation?
0
votes
1
answer
134
views
Although I'm using SQL Server, as this is a question of whether an optimiser will re-write a query to remove redundant columns, I'm pitching this at all data servers: RDBMS, NoSQL, MPP, anything capable of holding and querying stored data using SQL that shall optimise the query before running it.
I kinda feel that they would, or at least it would seem logical that they would especially as it'd be crazy to fill a cache unnecessarily, but I can't find any evidence to say they would.
I don't want to get bogged-down on how environments, network, server, table, and cache loads, and table size and performance will alter the selected plans; this is just a very high-level question of: would the server rewrite a query to remove redundant columns and/or joins, but mostly columns, that are not in any way used to generate the result.
On my isolated dev server, I have this test query running against a tiny test 290k row table, it has a pk on an identity field, and a composite index which whilst two of the fields from the derived table are covered as part of the index, the primary field under test is not
The derived table in this instance has 7 redundant columns, and I'm executing these three DBCC commands before each run so as to start with a cold cache :
And from the profiler, CPU: 92, Reads: 23703
And then, having re-ran the three DBCC commands to return the cache to cold, running this re-expressed query:
Select
provider_type
,Count(1) As count_of_provider_type
From adhoc..datacentre
Group By provider_type;
Gives me this actual plan:
And profiler, CPU: 78, Reads: 23476
Notice any similarities?
Which given the batch count, io and cpu, leads me to wonder that the optimiser did rewrite the first query to remove the derived table and the redundant columns.
But how can I prove it.
I can't find anything at learn.microsoft under the Query Processing Architecture to suggest that the optimiser would rewrite the query, neither can I find a way of seeing what was transferred to cache.
Does anything exist that can definitively say exactly what was read and cached.
Remember - although I'm using SQL Server, I'd be interested to know how other RDBMS / MPP such as GBQ, Redshift, Athena, Snowflake etc would handle this
And finally, the why.
What nutjob would write the first query without having realised it could be re-expressed?
This is twofold: Firstly views, and secondly and more prominently: SQL from visualisation and reporting tools capable of accepting an SQL script, which is often functionally equivalent to a non-materialised view.
As we all know, views can be abused. They shouldn't, and in an ideal world, users would create views as isolated models to spit-out a result-set ready for ingestion by the tool that the model was designed for, which is also the same direction for visualisation and reporting tools.
But we all know this never happens.
Just like a doctor who has a cream for that, so too have engineering built a view that includes "what you're after" in its output, "so you don't need to go and create a new query, just query that view".
And if the view is basic enough, maybe it's being used to replace a table, but includes scd:2 logic, and maybe its selecting non-engineering and/or PII data; or maybe it is a basic model with a couple of non-complex joins.
But if a user were to query this view for only a couple of fields, would the optimiser rewrite the query to remove redundant columns of a single-table view, and possibly remove redundant joins from a multi-table view?
As I said at the beginning, I feel the optimizer would, but I need to be able to evidence this beyond conjecture or theory.
DBCC FreeProcCache; DBCC DropCleanBuffers; DBCC FreeSystemCache('sql plans')
:
Select
a.provider_type
,Count(1) As count_of_provider_type
From (
Select
customer_id
,access_plan
,provider_type
,ap_postcode
,browser
,session_start_date
,session_end_date
,payment_method
From adhoc..datacentre
)a
Group By
a.provider_type;
Returns this actual plan:


Asked by Steve Martin
(9 rep)
Mar 3, 2024, 08:49 AM
Last activity: Mar 3, 2024, 09:15 AM
Last activity: Mar 3, 2024, 09:15 AM