Why does adding an empty columnstore index to my tables consistently make execution plans ignore bitmaps?
5
votes
1
answer
178
views
### Problem
I am using a Standard Edition SQL Server 2019 box. Given the edition, compatibility levels should not be relevant (the level-dependent batch mode features are disallowed on Standard). On this box, I have been experimenting with the hack of adding an empty non-clustered filtered columnstore index to a temp table before joining it on to other tables, thus allowing the queries with said join to use batch mode despite not using any columnstore indexes. When I do this, I have been regularly finding the following behaviour in the queries that previously did not use batch mode:
- The parts of the execution plans that now use batch mode run much faster. In particular, window functions become incredible compared to what they were in the plans that had no batch mode parts.
- The plans with batch mode parts will always have a much lower cost than the queries that do not, even when the they take longer to run.
- Gather Streams sometimes becomes much slower, even though every operator is running evenly on two threads in the plans that have batch mode steps.
- Queries that previously built a bitmap on one table in order to cheaply scan and then hash join another table by passing the bitmap's hash probe in the the scan become **much** slower. Instead of a bitmap inside an index scan, they have a very expensive Filter operator after the scan.
Is there any explanation for the final observation in my list? More importantly, **how can I prevent the introduction of batch mode removing excellent bitmaps?** That bold point is what I'm wanting answered here. If I could have both the bitmaps and the excellent benefits of batch mode elsewhere in the query (e.g. in the window functions), then my queries would be much faster. I accept that I could probably index the tables better, but I was already happy with the original hash joins.
### Reproducible Example
Erik Darling quite rightly requested that I paste the plan, but I thought I'd go one better and give a reproducible example. The following uses the [StackOverflow2010 database](https://www.brentozar.com/archive/2015/10/how-to-download-the-stack-overflow-database-via-bittorrent/) . You can [load this in a docker container](https://sqlblog.org/2020/03/15/attaching-restoring-databases-inside-a-container) if you wish.
SET STATISTICS XML OFF;
USE StackOverflow2010;
DROP TABLE IF EXISTS #IDs;
SELECT TOP (10000)
Id
INTO
#IDs
FROM
Comments;
ALTER TABLE #IDs ADD CONSTRAINT RowStorePK
PRIMARY KEY CLUSTERED (Id);
SET STATISTICS XML ON;
;WITH [NumberedPosts] AS
(
SELECT
PostID
,ROW_NUMBER() OVER (PARTITION BY PostID ORDER BY Score) RN
FROM
Comments
WHERE
PostID IN (SELECT Id FROM #IDs)
)
SELECT
PostID
FROM
NumberedPosts
WHERE
RN = 1;
/*
Hack to get batch mode for these queries despite
my real machine being on Standard Edition.
*/
CREATE NONCLUSTERED COLUMNSTORE INDEX ColStoreNonClust ON #IDs (ID)
WHERE ID = 1 AND ID = 2;
-- Same query again.
;WITH [NumberedPosts] AS
(
SELECT
PostID
,ROW_NUMBER() OVER (PARTITION BY PostID ORDER BY Score) RN
FROM
Comments
WHERE
PostID IN (SELECT Id FROM #IDs)
)
SELECT
PostID
FROM
NumberedPosts
WHERE
RN = 1;
[This plan](https://www.brentozar.com/pastetheplan/?id=S17c3PgwR) is from before making the columnstore index. It uses a bitmap. [This plan](https://www.brentozar.com/pastetheplan/?id=SyMo2DgPC) is from after making the columnstore index. It does not use a bitmap and runs much slower then the bitmap-using query.
I have seen this so many times that I've become convinced that it's a pattern that must have some general explanation or prevention steps.
Asked by J. Mini
(1237 rep)
Jun 26, 2024, 08:15 PM
Last activity: Jul 3, 2024, 11:56 AM
Last activity: Jul 3, 2024, 11:56 AM