Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
2
votes
1
answers
141
views
Does the auto_stats Extended Event misreport the sample percentage from temporal tables or columnstore?
Start up the auto_stats Extended Event, filter out some noise, and prepare yourself to monitor it. ```sql CREATE EVENT SESSION [AutoStatsTest] ON SERVER ADD EVENT sqlserver.auto_stats( WHERE ([duration]>1 AND [object_id] > 0) ) ADD TARGET package0.ring_buffer ALTER EVENT SESSION [AutoStatsTest] ON S...
Start up the auto_stats Extended Event, filter out some noise, and prepare yourself to monitor it.
CREATE EVENT SESSION [AutoStatsTest] ON SERVER
ADD EVENT sqlserver.auto_stats(
WHERE ([duration]>1 AND [object_id] > 0) )
ADD TARGET package0.ring_buffer
ALTER EVENT SESSION [AutoStatsTest] ON SERVER
STATE = START
GO
Find a temporal table. I had a copy of the StackOverflow2010 database to hand, so I just made Votes
system versioned.
ALTER TABLE Votes ADD
ValidFrom DATETIME2 GENERATED ALWAYS AS ROW START HIDDEN
CONSTRAINT DF_InsurancePolicy_ValidFrom DEFAULT SYSUTCDATETIME(),
ValidTo DATETIME2 GENERATED ALWAYS AS ROW END HIDDEN
CONSTRAINT DF_InsurancePolicy_ValidTo DEFAULT CONVERT(DATETIME2, '9999-12-31 23:59:59.9999999'),
PERIOD FOR SYSTEM_TIME(ValidFrom, ValidTo);
GO
ALTER TABLE Votes
SET (SYSTEM_VERSIONING = ON);
GO
With your temporal table, do something that will trigger an automatic update of stats. This worked for me
SELECT TOP (1000) [BountyAmount]
FROM [StackOverflow2010].[dbo].[Votes]
WHERE [PostId]
100
Given the duration (800 microseconds on my pathetic machine), I knew that this was nonsense. Upon consulting sys.dm_db_stats_properties
, I found that the actual percentage was much less than 100.
/*
Save yourself the pain of writing
the stats query
and just use sp_BlitzIndex from GitHub
*/
EXEC sp_blitzindex @databasename = 'StackOverflow2010', @tablename = 'Votes'
In summary, **it appears that the auto_stats Extended Event reports an incorrect sample_percentage for temporal tables**. Is this a SQL Server bug or is it my misunderstanding? If it is my misunderstanding, then where can I read further? Individual Extended Events, as far as I know, are very lacking in documentation.
I have only seen this happen with temporal tables. I have found it on both SQL Server 2022 on a real box (the table was multi-terabyte, so I screamed when I saw a 100% sample rate for the primary key's statistic) and on my local 2019 test box.
Adding a columnstore index seems to help reproduce this, but I am not 100% sure.
CREATE NONCLUSTERED COLUMNSTORE INDEX IX_ColStore
on dbo.Votes
(
Id,
PostId
)
I am absolutely sure that there is a real bug here. However, my attempts to reproduce it have given inconsistent results. I suspect that you need a TB-scale table.
**Final update:** I give up on explaining this one. The statistics for the table says that 90 times more rows than appeared in the actual execution plan (caught live as it was running in production, scanning the non-clustered columnstore index) were used to update the statistics. The number of rows in the actual execution plan is 10,000 times less than the table's row count and the auto_stats Extended Event says that 100% of the rows in the table were read. It is impossible for all of these to be true at the same time.
I suspect that it might actually be the deleted bitmap that causes all of this.
J. Mini
(1225 rep)
Mar 7, 2025, 10:24 PM
• Last activity: Jul 17, 2025, 05:21 PM
-5
votes
1
answers
280
views
UPDATE STATISTICS best practice
In following scenario what do you think is best option for update statistics Database around 1 TB One table has 4 billion records Right now we run daily UPDATE STATISTICS (without specifying anything SAMPLE PERCENT) Should we use WITH SAMPLE ?? PERCENT
In following scenario what do you think is best option for update statistics
Database around 1 TB
One table has 4 billion records
Right now we run daily UPDATE STATISTICS (without specifying anything SAMPLE PERCENT)
Should we use WITH SAMPLE ?? PERCENT
dana
Dec 3, 2024, 06:09 PM
• Last activity: Jul 9, 2025, 12:02 PM
2
votes
2
answers
455
views
Does the Query Optimizer use Incremental Statistics yet?
I was considering using incremental statistics in my data warehouse but I read [an article by Erin Stellato][1] that says the query optimizer doesn't use incremental statistics. The article was written in May of 2015 but I haven't seen anything recanting her position in the following 6 years. There...
I was considering using incremental statistics in my data warehouse but I read an article by Erin Stellato that says the query optimizer doesn't use incremental statistics. The article was written in May of 2015 but I haven't seen anything recanting her position in the following 6 years. There are a number of articles in different communities that show how to set it up, but if it's not useful, why bother? Does anyone know if subsequent versions of the query optimizer in 2016, 2017, or 2019 support the use of incremental statistics? If not, should we even use them? If they won't help the engine make a good decision about how to query a table with 10 billion records what good is it? Thanx for any help!
Mike Lemay
(57 rep)
Oct 22, 2021, 04:32 PM
• Last activity: Jun 8, 2025, 06:49 PM
0
votes
1
answers
256
views
Using Postgres to return a selection based on frequency distribution
I am not even sure if this is a possibility so my apologies if I'm barking up the wrong tree; I am ignorant of Postgres's limitations as I am self-taught. This is hypothetical, but if I had a table like so: **dice_rolls** | roll_1 | roll_2 | sum | | -------- | ------- |------| | 1 | 1 | 2 | | 1 | 2...
I am not even sure if this is a possibility so my apologies if I'm barking up the wrong tree; I am ignorant of Postgres's limitations as I am self-taught.
This is hypothetical, but if I had a table like so:
**dice_rolls**
| roll_1 | roll_2 | sum |
| -------- | ------- |------|
| 1 | 1 | 2 |
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 1 | 4 | 5 |
| 1 | 5 | 6 |
| 1 | 6 | 7 |
| 2 | 1 | 3 |
| 2 | 2 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | 6 |
| 2 | 5 | 7 |
| 2 | 6 | 8 |
| 3 | 1 | 4 |
| 3 | 2 | 5 |
| 3 | 3 | 6 |
| 3 | 4 | 7 |
| 3 | 5 | 8 |
| 3 | 6 | 9 |
| 4 | 1 | 5 |
| 4 | 2 | 6 |
| 4 | 3 | 7 |
| 4 | 4 | 8 |
| 4 | 5 | 9 |
| 4 | 6 | 10 |
| 5 | 1 | 6 |
| 5 | 2 | 7 |
| 5 | 3 | 8 |
| 5 | 4 | 9 |
| 5 | 5 | 10 |
| 5 | 6 | 11 |
| 6 | 1 | 7 |
| 6 | 2 | 8 |
| 6 | 3 | 9 |
| 6 | 4 | 10 |
| 6 | 5 | 11 |
| 6 | 6 | 12 |
we can see that the value '7' for sum appears most frequently. If we plotted the frequency distribution of the sum, it would look something like this:
x
x x x
x x x x x
x x x x x x x
x x x x x x x x x
x x x x x x x x x x x
2 3 4 5 6 7 8 9 10 11 12
Therefore, we can assume that a truly random selection of this dataset will follow this distribution pattern. What I'm interested in is psuedo-random DISTINCT selection of N rows that attempts to normalize the distribution based on the sum column.
E.g.,
SELECT * FROM dice_rolls ORDER BY RANDOM() LIMIT 5;
is what I have now. This query is most likely to contain rows where the sum is 5, 6, 7, 8, 9. How would I go about using a custom ORDER BY function that would be significantly more likely (or even *strictly*) to, return rows where the sum has an even chance of being 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12?
PoorBob
(101 rep)
Jan 20, 2021, 07:51 PM
• Last activity: May 29, 2025, 09:04 PM
0
votes
1
answers
266
views
Bad query plan only on first execution, not in SSMS
We have a very customizable report query system where the user can pick and chose which columns to include in the results as well as a lot of filters. I'll be mapping what we're doing onto an example orders system. Customers, orders, line items of orders and the like. For purposes of the example pro...
We have a very customizable report query system where the user can pick and chose which columns to include in the results as well as a lot of filters.
I'll be mapping what we're doing onto an example orders system. Customers, orders, line items of orders and the like. For purposes of the example problem also assume we've got a very free-form chunk of customer information like "wife's name", "anniversary", "names of pets", etc (stuff you might want around to personalize interactions with the customer)
To avoid massive denormalization of the order details in our orders report (when they select that output), we're populating a DataSet with 2 queries - the first query is for the orders that meet the filter criteria; the second is for the order details that match the output from the first query.
Our qa dept recently started running into some problems in the C# middleware because the output of the 2 queries is disjoint, even though the same filter criteria are used in both queries. Both queries start by asking for the top 20 orders that match the filter criteria.
Further complicating things, the qa automation suite does a lot of database restores, setting a few data conditions, running a test and then doing that again for the next test case. After a lot of back-and-forth, we found that they are only having the problem the first time this query is run after a restore. Every execution *after* the first one (with the same query) produces the correct results.
I even got their seed database and started doing the restores myself. If I took the SQL code and ran it in SSMS immediately after the restore, I couldn't reproduce the problem. I can only reproduce it if I run through the app code and ADO.NET to run the query.
I used Profiler to capture the query plan on that first execution and for reasons I can't fathom, only on that first execution it completely upends the query and starts by looking for that free-form blob of pet names as the seed kernel of the results. e.g.
SELECT TOP (20) bunch of configurable columns, xml blob of customer preferences, ...
FROM
ORDERS
INNER JOIN CUSTOMERS ON CUSTOMERS.ID = ORDERS.CUSTOMERID
INNER JOIN
(SELECT sums, avgs, etc FROM ORDERDETAILS
GROUP BY ORDERDETAILS.ORDERID) orderavg on orderavg.ORDERID = ORDERS.ORDERID
... whatever other joins needed based on configuration ...
INNER JOIN CUST_PREFS ON CUSTOMERS.ID = CUST_PREFS.CUSTOMERID
WHERE ORDERS.DATE BETWEEN @from and @to
ORDER BY ORDERS.DATE
SELECT ORDERID, orderdetails
FROM
(SELECT TOP(20) ORDERID
FROM ORDERS
WHERE ORDERS.DATE BETWEEN @from and @to
ORDER BY ORDERS.DATE) ORDERS
INNER JOIN ORDERDETAILS ON ORDERDETAILS.ORDERID = ORDERS.ORDERID
Seemed straightforward enough - start from the orders table and work your way down.
But for this case QA ran into, *only* on the first execution after a db restore, and *only* through ADO.NET (not SSMS), the query plan comes out **starting** the execution with the CUST_PREFS "find me the random customer notes blob" join (the 7th join in the query). Re-run the same query in the app again, and it goes back to starting with the root table (ORDERS).
A) I'm at a loss to figure out why it decides the 7th join is the right place to start immediately after a restore. The stats don't bear it out and it goes back to something more understandable on the 2nd execution.
B) I could try to encapsulate the core filter query in a subselect (like the 2nd query has) to make the filter evaluation more predictable and only join for detail after the right subset is selected, but that seems a lot of reshuffling.
C) I'm debating adding an OPTION (FORCE ORDER) hint to the first query to try and make this more predictable, but I thought I'd toss this out amongst a group with deeper knowledge than I before pulling on that thread.
Any pointers from those wiser than me would be appreciated.
user1664043
(379 rep)
Jan 30, 2019, 09:14 PM
• Last activity: May 20, 2025, 04:08 PM
0
votes
1
answers
290
views
Oracle gather stats autotask being interrupted before maintenance window ends
a few months ago I managed to enlarge the maintenance windows for autotask on a pluggable database, this way: BEGIN dbms_scheduler.disable( name => 'WINDOW_NAME'); dbms_scheduler.set_attribute( name => 'WINDOW_NAME', attribute => 'DURATION', value => numtodsinterval(6, 'hour')); dbms_scheduler.enabl...
a few months ago I managed to enlarge the maintenance windows for autotask on a pluggable database, this way:
BEGIN dbms_scheduler.disable( name => 'WINDOW_NAME'); dbms_scheduler.set_attribute( name => 'WINDOW_NAME', attribute => 'DURATION', value => numtodsinterval(6, 'hour')); dbms_scheduler.enable( name => 'WINDOW_NAME'); END; /This was what I got:
WINDOW_NAME START_TIME DURATION ---------------- ---------------------------------------- --------------- THURSDAY_WINDOW 08-JUL-21 10.00.00.241279 PM +02:00 +000 06:00:00 FRIDAY_WINDOW 09-JUL-21 10.00.00.241279 PM +02:00 +000 06:00:00 SATURDAY_WINDOW 10-JUL-21 10.00.00.241279 PM +02:00 +000 06:00:00 SUNDAY_WINDOW 11-JUL-21 10.00.00.241279 PM +02:00 +000 06:00:00 MONDAY_WINDOW 12-JUL-21 10.00.00.241279 PM +02:00 +000 06:00:00 TUESDAY_WINDOW 13-JUL-21 10.00.00.241279 PM +02:00 +000 06:00:00 WEDNESDAY_WINDOW 14-JUL-21 10.00.00.241279 PM +02:00 +000 06:00:00But now, on that same pluggable db I'm noticing that since August 24th stats jobs are being stopped due to the end of the window:
TASK_NAME STATUS CURRENT_JOB_NAME LAST_TRY_DATE LAST_TRY_R ------------------------- ---------- ------------------------- ----------------------------------- ---------- gather_stats_prog ENABLED 12-SEP-21 03.00.09.858615 AM +02:00 STOPPED AT END OF MA INTENANCE WINDOW auto_space_advisor_prog ENABLED 12-SEP-21 12.33.31.490404 AM +02:00 SUCCEEDED AUTO_SQL_TUNING_PROG ENABLED 11-SEP-21 11.00.22.422336 PM +02:00 SUCCEEDEDAt first I thought that it might be caused by some performance problem, but then I noticed that the jobs were still during 4 hours (default windows duration):
CLIENT_NAME JOB_NAME JOB_STATUS JOB_START_TIME JOB_DURATION ----------------------------------- ------------------------- ---------- ------------------------------------------ --------------- auto optimizer stats collection ORA$AT_OS_OPT_SY_9601 STOPPED 07-SEP-21 10.00.19.825083 PM EUROPE/VIENNA +000 03:59:44 auto optimizer stats collection ORA$AT_OS_OPT_SY_9621 STOPPED 08-SEP-21 10.00.11.786102 PM EUROPE/VIENNA +000 03:59:53I then noticed that the maintenance windows duration was still set to 4 hours on the container database:
WINDOW_NAME START_TIME DURATION ---------------- ---------------------------------------- --------------- SUNDAY_WINDOW 12-SEP-21 10.00.00.235575 PM +02:00 +000 04:00:00 MONDAY_WINDOW 13-SEP-21 10.00.00.235575 PM +02:00 +000 04:00:00 TUESDAY_WINDOW 14-SEP-21 10.00.00.235575 PM +02:00 +000 04:00:00 WEDNESDAY_WINDOW 15-SEP-21 10.00.00.235575 PM +02:00 +000 04:00:00 THURSDAY_WINDOW 16-SEP-21 10.00.00.235575 PM +02:00 +000 04:00:00 FRIDAY_WINDOW 17-SEP-21 10.00.00.235575 PM +02:00 +000 04:00:00 SATURDAY_WINDOW 18-SEP-21 10.00.00.235575 PM +02:00 +000 04:00:00Should I modify the duration on the container too? Thank you.
trustno1
(9 rep)
Sep 12, 2021, 03:13 PM
• Last activity: May 8, 2025, 06:02 AM
1
votes
0
answers
77
views
Are there exactly two possible causes of statistics updates on a readable replica?
I have a table that is modified regularly. Due to these frequent modifications, statistics update on it very frequently. To my surprise, I even see queries running on a SQL Server 2022 readable secondary AG that are trying to update statistics. I have been able to distinguish two types of statistics...
I have a table that is modified regularly. Due to these frequent modifications, statistics update on it very frequently. To my surprise, I even see queries running on a SQL Server 2022 readable secondary AG that are trying to update statistics. I have been able to distinguish two types of statistics updates.
I believe the first type to be the replica updating stats in response to being queried. In other words, it's an automatic stats update just like on the primary. I know this because
EXEC sp_WhoIsActive @get_plans = 2
on the replica shows me execution plans that are clearly updating statistics (e.g. they have the blatant STATMAN
stuff in their XML), despite the query text just being a normal user-written query of the replica. They also appear in the auto_stats Extended Event, usually as completing successfully.
With sp_WhoIsActive @show_system_spids = 1, @get_locks = 1
, I can also see a second type of statistics update. This second type is the redo thread trying to take an Sch-M
lock on a resource_type of METADATA.STATS
. The blocked process report confirms this and gives me both the object_id and stats_id of the object in question.
My question is this: **if I see a statistics update on a readable AG replica, is it always one of the two cases described above?**
All of the documentation that I've read, particularly [this](https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/active-secondaries-readable-secondary-replicas-always-on-availability-groups?view=sql-server-ver16#Read-OnlyStats) , suggests that my first type can only be the creation of _temporary_ statistics on the readable replica. I have five objections to this idea, listed below, so I am lead to believe that the two cases that I have listed above cannot be all of the possibly ways that a statistics update can happen on an AG replica. My objections are these:
1. The workload on the secondary is much less than the primary despite the read-only queries being exactly the same, so there is no good reason why the statistics on the primary should be stale.
2. I have never seen statistics from this table in the output of SELECT * FROM [MyDb].sys.stats WHERE is_temporary = 1
. tempdb
doesn't have it either.
3. The auto_stats Extended Event tells me which statistic it is updating. The suffix _readonly_database_statistic
**never** occurs. I either get the names of real statistics or blanks (for failures).
4. The statistics create too frequently. On a bad day, auto_stats reports them being made every 15 minutes. Temporary statistics surely can't be that short lived?
5. [This blog](https://techcommunity.microsoft.com/blog/sqlserver/alwayson-making-latest-statistics-available-on-readable-secondary-read-only-data/383987) suggests that there is a distinction between temporary statistics explicitly created to help a readable replica and statistics that already exist on the primary but are stale when the replica is queried. However, the documentation I linked earlier strongly suggests that all temporary statistics have the _readonly_database_statistic
suffix that I am not seeing.
J. Mini
(1225 rep)
Apr 17, 2025, 09:37 PM
• Last activity: Apr 28, 2025, 08:42 PM
1
votes
1
answers
592
views
Mysql.innodb_table_stats & mysql.innodb_index_stats not updating
I believe I have persitent stats configured correctly and there are records in INFORMATION_SCHEMA.STATISTICS table but the following tables have no data for many of the tables/indexes. mysql.innodb_table_stats & mysql.innodb_index_stats . Has anyone ever seen this before? I tried running ANALYZE but...
I believe I have persitent stats configured correctly and there are records in INFORMATION_SCHEMA.STATISTICS table but the following tables have no data for many of the tables/indexes. mysql.innodb_table_stats & mysql.innodb_index_stats .
Has anyone ever seen this before?
I tried running ANALYZE but still there were no records added.
+--------------------------------------------------------+---------------+
| Variable_name | Value |
+--------------------------------------------------------+---------------+
| innodb_stats_auto_recalc | ON |
| innodb_stats_include_delete_marked | OFF |
| innodb_stats_persistent | ON |
| innodb_stats_persistent_sample_pages | 20 |
| innodb_stats_sample_pages | 8 |
| innodb_stats_transient_sample_pages | 8 |
+--------------------------------------------------------+---------------+
8adger
(51 rep)
Apr 29, 2022, 09:23 AM
• Last activity: Apr 23, 2025, 08:06 AM
2
votes
2
answers
93
views
When reading from a columnstore index, is PERSIST_SAMPLE_PERCENT ignored in UPDATE STATISTICS?
I have a partitioned table with a rowstore primary key and a nonclustered columnstore index. The leading key of both (not that key order matters for columnstore) is the same. When updating statistics, SQL Server regularly chooses to scan the columnstore index. This takes far too long for my taste, s...
I have a partitioned table with a rowstore primary key and a nonclustered columnstore index. The leading key of both (not that key order matters for columnstore) is the same. When updating statistics, SQL Server regularly chooses to scan the columnstore index. This takes far too long for my taste, so I tried
UPDATE STATISTICS My_Table
WITH PERSIST_SAMPLE_PERCENT = ON, SAMPLE 3 PERCENT
To my surprise, the statistics updates still took a very long time. The execution plans very much suggest that the columnstore index is being scanned in the same way as it previously was.
So, I have my question: when reading from a columnstore index, is PERSIST_SAMPLE_PERCENT ignored in UPDATE STATISTICS? Automatic statistics updates seem to share this issue.
Update: I suspect that it might actually be the deleted bitmap that causes all of this to go wrong.
J. Mini
(1225 rep)
Mar 29, 2025, 12:33 AM
• Last activity: Apr 20, 2025, 06:39 PM
3
votes
1
answers
1716
views
Autotask 'auto optimizer stats collection' and scheduler jobs logging
We have a DB hosting Peoplesoft schema, with more than 33.000 tables. Since december Oracle has left more than 18 thousand of them with no statistics renewed: SQL> select to_char(last_analyzed,'YYYYMM') ,count(*) from dba_tables where 2 owner='SYSADM' group by to_char(last_analyzed,'YYYYMM') order b...
We have a DB hosting Peoplesoft schema, with more than 33.000 tables. Since
december Oracle has left more than 18 thousand of them with no statistics
renewed:
SQL> select to_char(last_analyzed,'YYYYMM') ,count(*) from dba_tables where
2 owner='SYSADM' group by to_char(last_analyzed,'YYYYMM') order by 1;
TO_CHA COUNT(*)
------ ----------
202012 18369
202101 23
202103 14
202104 832
14490
Yet if I check one of them (as an example) within these 18369 tables, I find
it has been modified way above the 10% default threshold for stats
recomputation; select from dba_tab_modifications shows:
TABLE_NAME INSERTS UPDATES DELETES TIMESTAMP
-------------------- ---------- ---------- ---------- ---------
PS_JOB 8063 181375 395 12-APR-21
SQL> select t.table_name,ts.stale_stats,
2 t.monitoring,t.last_analyzed,t.num_rows from dba_tables t,dba_tab_statistics
3 ts where t.owner=ts.owner and t.table_name='PS_JOB' and
4 ts.table_name='PS_JOB';
TABLE_NAME STALE_S MON LAST_ANAL NUM_ROWS
-------------------- ------- --- --------- ----------
PS_JOB YES YES 08-DEC-20 162039
SQL> select count(*) from sysadm.ps_job;
COUNT(*)
----------
180794
In DBA_AUTOTASK_JOB_HISTORY or DBA_SCHEDULER_JOB_RUN_DETAILS views the
analyze-table jobs "ORA$AT_OS_OPT_SY_nnnnn" never take more than 35 to 40
minutes.
Yet this table is skipped from all stats recomputations that occur every day...
I have 2 questions:
* Someone knows if internal process doing stats recomputing job (i.e.
procedure DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC, which is not documented)
is logged somewhere with relevant details ?? (I'm very surprised that, given
the amount of tables this schema contains, it only takes a little more than
half an hour to complete...)
* What are conditions, or tests performed by DBMS_STATS in the background to
yield list of tables of which statistics should be recalculated ?? (why this
PS_JOB table hasn't been recomputed yet since Dec, 8th ??, in spite of its
growth)
In advance - thanks a lot.
Regards,
Sebino
Seb
(31 rep)
Apr 13, 2021, 07:39 AM
• Last activity: Feb 26, 2025, 06:07 PM
0
votes
1
answers
363
views
Diagnose the exact cause of WAIT_ON_SYNC_STATISTICS_REFRESH
We are currently facing query timeout issues with our online API . The related request is an ad hoc select query referencing two tables. In Redgate, we see that each time the request is blocked, it's due to "WAIT_ON_SYNC_STATISTICS_REFRESH". I tried to match occurrence of blocking with ongoing stati...
We are currently facing query timeout issues with our online API .
The related request is an ad hoc select query referencing two tables.
In Redgate, we see that each time the request is blocked, it's due to "WAIT_ON_SYNC_STATISTICS_REFRESH".
I tried to match occurrence of blocking with ongoing statistic refresh using requests on
sys.stats
or sys.dm_tran_locks
but I can't find any clear relations.
I can't find any updated statistic related to the request during the occurrence of blocking, neither I can manage to find CPU or I/O consumption issues.
I know we could try to use the asynchronous statistic update strategy on the database, but I would like to understand the reasons of the problem first.
Any idea would be welcome.
Nicolas Gicquel
(1 rep)
Jun 24, 2022, 01:54 PM
• Last activity: Feb 5, 2025, 03:06 PM
3
votes
1
answers
418
views
Select query slow in the big table because of expired stats in Microsoft SQL Server
1. I have an application to select/search records from the database (application and database in separate servers). 2. In the database, one table only is fast growing table (more than 2 Millions per day). 3. There is no update will happen in this table all are insert only. 3. Created index for this...
1. I have an application to select/search records from the database (application and database in separate servers).
2. In the database, one table only is fast growing table (more than 2 Millions per day).
3. There is no update will happen in this table all are insert only.
3. Created index for this table and everything is working fine.
4. I try to install another SQL Server instance in the database server.
5. After that, I realized that the select is very slow in the big table.
6. I uninstall the another instances of SQL Server.
7. Still the select is very slow in the particular table.
8. Search in the some forum, they mentioned that need to update stats (EXEC sp_updatestats).
9. After executed this query **EXEC sp_updatestats** the select query is very fast.
10. But after 10 days, the same issue (slowness in select query) occurred again.
11. I created daily jobs to update the stats, after 10 days, the issue observed again.
After update the stats manually, it is very fast.
The whole issue observed after I installed the second instance of SQL server in the same server.
Is my observation correct?
Even I uninstall the seconds instance, the slowness issue observed frequently.
Kindly give me your valuable suggestion. Thanks in advance.
Noah
(73 rep)
Aug 10, 2020, 02:41 AM
• Last activity: Jan 17, 2025, 05:03 PM
0
votes
1
answers
81
views
An explanation of the process of creating statistics
During the process of creating automatic statistics, Sql Server internally executes a statement like the following (based on AdventureWorks database) SELECT StatMan([SC0], [SB0000]) FROM ( SELECT TOP 100 PERCENT [SC0] ,step_direction([SC0]) OVER ( ORDER BY NULL ) AS [SB0000] FROM ( SELECT [OrderDate...
During the process of creating automatic statistics, Sql Server internally executes a statement like the following (based on AdventureWorks database)
SELECT StatMan([SC0], [SB0000])
FROM (
SELECT TOP 100 PERCENT [SC0]
,step_direction([SC0]) OVER (
ORDER BY NULL
) AS [SB0000]
FROM (
SELECT [OrderDate] AS [SC0]
FROM [Sales].[SalesOrderHeaderEnlarged] TABLESAMPLE
SYSTEM(5.485069e+00 PERCENT) WITH (READUNCOMMITTED)
) AS _MS_UPDSTATS_TBL_HELPER
ORDER BY [SC0]
,[SB0000]
) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 6)
Paul White has already explained here how the percentage value is calculated.
I noticed that if I execute repeatidly the most internal statement, the one with
TABLESAMPLE
, each time it returns a different number of rows, and the reason is that the REPEATABLE
clause is not used.
As a consequence, dropping and re-creating the statistic several times, I expected to get different statistics, each time, in terms of Rows Sampled and Histogram; instead the statistic is always the same.
How can this behaviour be explained?
I noticed the following facts:
1. When an auto-statistic is created a statement like the one posted above is executed and it is based on the most internal query that uses TABLESAMPLE SYSTEM.
2. If you execute that query several times you get always a different set of rows (because REPEATABLE clause is not used).
3. If you drop and re-create the auto-statistic you get always the same result. I suspect that the statement grabbed with Profiler/XE is not the real one that the Optimizer utilizes. If it used TABLESAMPLE without the REPEATABLE clause we should get different statistics each time
Alessandro Mortola
(83 rep)
Nov 17, 2024, 02:43 PM
• Last activity: Dec 25, 2024, 05:13 PM
0
votes
0
answers
120
views
How to manage ORA$_ATSK_AUTOSTS
There are 2 pluggable db's in the same container db running on Oracle 21c SE2 21.0.0.0.0 One pdb runs `ORA$_ATSK_AUTOSTS` every 15 minutes and it runs for less than a minute, seen from `dba_scheduler_job_run_details`. On the other pdb it runs once a day for 30-50 minutes. I tried disabling it yester...
There are 2 pluggable db's in the same container db running on Oracle 21c SE2 21.0.0.0.0
One pdb runs
ORA$_ATSK_AUTOSTS
every 15 minutes and it runs for less than a minute, seen from dba_scheduler_job_run_details
.
On the other pdb it runs once a day for 30-50 minutes.
I tried disabling it yesterday :
exec DBMS_AUTO_TASK_ADMIN.DISABLE(client_name => 'auto optimizer stats collection', operation => NULL, window_name => NULL);
But it still ran again this morning.
Logs in dba_scheduler_job_log
imply that it was run manually, but I can't see how it was triggered ...
dba_scheduler_job_log.ADDITIONAL_INFO = 'REASON="manual slave run"'
My questions are as follows:
- How can I find out what is triggering the job?
- Why doesn't the disable proc work?
- Why is it running outside of the maintnenance windows dba_autotask_schedule
?
- Is there a way to configure the sample size so that it runs for a shorter time ?
crowne
(125 rep)
Dec 17, 2024, 12:31 PM
0
votes
0
answers
49
views
Extended events to track autoupdate statistics for Azure SQL Database
I would like to monitor what stats are being updated and when. It seems that the auto_stats extended event is exactly what I am looking for, but it looks like it's only available for regular SQL Server and not Azure SQL Database (which is what I am using). Is there any equivalent for Azure SQL Datab...
I would like to monitor what stats are being updated and when. It seems that the auto_stats extended event is exactly what I am looking for, but it looks like it's only available for regular SQL Server and not Azure SQL Database (which is what I am using). Is there any equivalent for Azure SQL Database, or am I out of luck?
Thanks for any help!
Andy
(151 rep)
Nov 21, 2024, 11:59 AM
• Last activity: Nov 21, 2024, 07:29 PM
-1
votes
2
answers
1649
views
Why does IO Statistics show a lot of LOB Logical Reads on the Worktable when none of the tables in my query use any LOB datatype fields?
I have a procedure that runs a query dynamically across a few hundred databases that all have the same exact schema and aggregates the results to a temp table. The query only involves 3 tables (each table is in the 10s to 100s of millions of rows but I'm only pulling about 50,000 rows of data back t...
I have a procedure that runs a query dynamically across a few hundred databases that all have the same exact schema and aggregates the results to a temp table.
The query only involves 3 tables (each table is in the 10s to 100s of millions of rows but I'm only pulling about 50,000 rows of data back total).
When looking at the aggregated IO Statistics (via StatisticsParser.com - shout out Richie Rump) for the query, it's showing that a Worktable is created and there are about 550,000 LOB Logical Reads just on the Worktable. The total number of regular Logical Reads summed for all tables in the query is just below that at about 400,000 Logical Reads.
None of the actual tables in the query use any LOB data types, so what the heck does this mean and is it a potential source of a bottleneck in my query?
(An aside, the 50,000 rows of data I'm pulling back only equate to about 3 MB of data, but my query is taking about 10 seconds to run on first run (e.g. when the tables' data pages are still being loaded into memory) and then is less than half that time on subsequent runs, so trying to see if I can consistently get closer to the = @StartDate
AND T2.Date < @EndDate
**Update 2:** Execution Plan of main query that is ran dynamically
**Update 3:** Compute Scalar Operator Properties


J.D.
(40893 rep)
Apr 24, 2020, 04:06 PM
• Last activity: Nov 5, 2024, 08:58 PM
2
votes
1
answers
147
views
Auto Update Stats - Bad Query Plan
I have a query that hits a 1.4 billion row table. Everything runs fine with this query, runs sub-second. Then once every 5 days or so, auto-update stats kicks in and the query starts running almost 60 seconds - It takes on a different plan. I checked and when the auto-update stats kicks in, the samp...
I have a query that hits a 1.4 billion row table.
Everything runs fine with this query, runs sub-second.
Then once every 5 days or so, auto-update stats kicks in and the query starts running almost 60 seconds - It takes on a different plan.
I checked and when the auto-update stats kicks in, the sample rate is less than 1% of the table.
Now, my question is - Can an auto-update stats cause the optimizer to choose a plan that is worse than before?
I can see that the estimations of the bad plan are wildly different to the good plan.
**Good Plan:**
**Bad Plan:**
The estimations are clearly way off for the bad plan and yet the amount of rows being returned by the query is the same.
At this point, I am trying to confirm whether an auto-stats updates can do this?
What I am attempting now is to do a stats update with a sample of 10% and see what happens.
**Update:**
I have completed a stats update with 10% sampling rate and queries are doing well again.
Have made this a job to run regularly.


Sicilian-Najdorf
(381 rep)
Oct 9, 2024, 09:09 AM
• Last activity: Oct 30, 2024, 02:29 PM
5
votes
2
answers
8369
views
dbms_datapump API exclude statistics during import
We are using `dbms_datapump` to copy data between different servers or to setup new schemas quickly. However importing statistics usually takes ages and has the additional disadvantage that schema statistics are locked after the import. So I'm trying to find a way to skip importing the statistics. A...
We are using
dbms_datapump
to copy data between different servers or to setup new schemas quickly.
However importing statistics usually takes ages and has the additional disadvantage that schema statistics are locked after the import.
So I'm trying to find a way to skip importing the statistics.
According to Metalink Doc ID 1392981.1 this should be possible using:
dbms_datapump.metadata_filter(handle => l_job_handle,
name => 'EXCLUDE_PATH_LIST',
value => '''STATISTICS''');
However when I try that, I get an "ORA-39001: invalid argument value" error.
I also tried various other formats found in different places:
dbms_datapump.metadata_filter(handle => l_job_handle,
name => 'EXCLUDE_PATH_LIST',
value => 'STATISTICS');
dbms_datapump.metadata_filter(handle => l_job_handle,
name => 'EXCLUDE_PATH_EXPR',
value => 'like ''%/TABLE/STATISTICS/TABLE_STATISTICS''');
dbms_datapump.metadata_filter(handle => l_job_handle,
name => 'EXCLUDE_PATH_EXPR',
value => 'like ''%STATISTICS%''');
But all of them return with an ORA-39001.
The version I'm using is:
> Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
Operating system is a Windows Server, but this also occurs on a Linux installation.
Any ideas how I can skip the statistics during import (or export) through the DBMS_DATAPUMP
API?
user1822
Apr 15, 2015, 10:01 AM
• Last activity: Oct 16, 2024, 11:59 AM
2
votes
2
answers
173
views
How to predict whether a query plan will involve the use of an index?
I am using the [latest PostgreSQL docker image][1] to create a local DB (on an Apple M1 Pro - MacOS Sonoma 14.5 machine). I create a table `table0` with a single column `col0` within it, and fill it with random strings of 2 to 16 characters. I create a [trigram-index][2] on `col0` and `vacuum (analy...
I am using the latest PostgreSQL docker image to create a local DB (on an Apple M1 Pro - MacOS Sonoma 14.5 machine). I create a table
table0
with a single column col0
within it, and fill it with random strings of 2 to 16 characters. I create a trigram-index on col0
and vacuum (analyze)
. Exact steps:
create table public.table0 (
col0 varchar(25)
);
select setseed(0.12343);
insert into table0 (col0)
select substring(md5(random()::text), 1, (2 + (random() * 14))::int)
from generate_series(1, 12345678);
create extension pg_trgm;
create index col0_gin_trgm_idx on table0 using gin (col0 gin_trgm_ops);
vacuum (analyze) table0;
I examine the query plan for selecting 200 rows containing the string abc
:
explain analyze
select * from table0 where col0 like '%abc%' limit 200;
Output, confirming that trigram index is not, but a sequential scan is used:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..351.78 rows=200 width=10) (actual time=0.313..15.640 rows=200 loops=1)
-> Seq Scan on table0 (cost=0.00..216614.79 rows=123154 width=10) (actual time=0.312..15.620 rows=200 loops=1)
Filter: ((col0)::text ~~ '%abc%'::text)
Rows Removed by Filter: 115643
Planning Time: 4.401 ms
Execution Time: 15.841 ms
(6 rows)
However, if instead of searching for rows containing abc
I search for rows containing bcd
:
explain analyze
select * from table0 where col0 like '%bcd%' limit 200;
Then I get a different query plan, which now includes an index scan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.34..764.83 rows=200 width=10) (actual time=7.032..7.230 rows=200 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.34..4394.94 rows=1219 width=10) (actual time=7.031..7.220 rows=200 loops=1)
Recheck Cond: ((col0)::text ~~ '%bcd%'::text)
Heap Blocks: exact=169
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=5.100..5.100 rows=21264 loops=1)
Index Cond: ((col0)::text ~~ '%bcd%'::text)
Planning Time: 0.521 ms
Execution Time: 7.366 ms
(8 rows)
This setup might not be entirely reproducible on the first try, even though setseed(0.12343);
is used, since analyze
"collects statistics based on its own random selection of rows" (see second paragraph here ). I recreated the above situation several times, and I never had to attempt the setup steps more than 4 times, so I hope it's easily reproducible, even though the code I provide is not entirely deterministic. (I removed and relaunched the docker container between attempts.)
This answer gives a basic explanation of why once the sequential scan and why sometimes the index scan is used. It also has 2 suggestions on how to discourage sequential scans: modifying random_page_cost
and STATISTICS
values.
I set random_page_cost
to 1.1 (via ALTER DATABASE postgres SET random_page_cost = 1.1;
). I also "raise the amount of statistics collected" by analyze
(via ALTER TABLE table0 ALTER COLUMN col0 SET STATISTICS 1000;
). After another vacuum (analyze) table0;
I rerun:
explain analyze
select * from table0 where col0 like '%abc%' limit 200;
and this time the col0_gin_trgm_idx
trigram index is indeed used. After this, I recreated the scenario above, and without modifying random_page_cost
or STATISTICS
I rerun vacuum (analyze) table0;
- this also modified the behaviour and caused a switch from sequential scan to index scan. I believe this is due to the non-deterministic nature of statistics collected by analyze
.
This time, **instead of being able to trigger the use of the index (which I am able to do so now, mainly thanks to the mentioned answer ), I would like to understand the details on how the decision is made between sequential scan and index scan**. Ideally, I would like to be able to predict whether query:
explain analyze
select * from table0 where col0 like '%xyz%' limit 200;
will trigger an index scan or a sequential scan, knowing xyz
, and anything related to database statistics or settings. Previously, in the context of a similar question, I was advised to check SELECT name, setting FROM pg_settings WHERE name = ANY ( '{shared_buffers, effective_cache_size, random_page_cost, effective_io_concurrency, work_mem}'::text[]);
. It returns (before modifying default settings):
name | setting
--------------------------+---------
effective_cache_size | 524288
effective_io_concurrency | 1
random_page_cost | 4
shared_buffers | 16384
work_mem | 4096
(5 rows)
I think these values have an effect on the sequential scan vs index scan decision. I hope a function *f* with two possible outputs: *sequential scan* or *index scan* exists. I imagine *f* takes xyz
, random_page_cost
, statistics collected by analyze
, etc as inputs. I would like to understand the list of inputs (ie what is etc?) and what *f* does with them.
**How can I predict whether a query plan will involve the use of an index?**
zabop
(169 rep)
Oct 15, 2024, 03:01 PM
• Last activity: Oct 16, 2024, 04:35 AM
2
votes
1
answers
136
views
Query plan seems to depend on time interval between query and table setup?
I am working with PostgreSQL 17. As an example, I have a table `table0` with a column `col0`, with randomly generated strings as [GIN-indexed][1] values. I use the following `setup.sql` to create such a table: create table public.table0 ( col0 varchar(25) ); select setseed(0.12345); insert into tabl...
I am working with PostgreSQL 17. As an example, I have a table
table0
with a column col0
, with randomly generated strings as GIN-indexed values. I use the following setup.sql
to create such a table:
create table public.table0 (
col0 varchar(25)
);
select setseed(0.12345);
insert into table0 (col0)
select substring(md5(random()::text), 1, (2 + (random() * 14))::int)
from generate_series(1, 12345678);
create extension pg_trgm;
create index col0_gin_trgm_idx on table0 using gin (col0 gin_trgm_ops);
vacuum (full, analyze) table0;
Using select setseed(0.12345);
ensures the created table is the same at every execution of setup.sql
. I would like to observe the execution plan for a simple partial string match query. To do so, I use query.sql
:
explain (analyze, buffers)
select * from table0 where col0 like '%abc%' limit 500;
To my surprise, the execution plan of the query is not constant, and it seemingly depends on the time between table creation and query execution. To demonstrate this, I create the following bash script:
~~~bash
#!/bin/bash
set -o errexit
set -o nounset
set -o pipefail
rm -f records.txt
for i in {1..9}; do
docker run \
--name postgres-db \
--env POSTGRES_DB=postgres \
--env POSTGRES_USER=postgres \
--env POSTGRES_PASSWORD=mysecretpassword \
--publish 5432:5432\
--detach postgres
sleep 2 # wait for docker container to start
PGPASSWORD=mysecretpassword psql \
--host=localhost \
--port=5432 \
--username=postgres \
--dbname=postgres \
--set=ON_ERROR_STOP=1 \
--file=setup.sql
sleep $((RANDOM % 10))
PGPASSWORD=mysecretpassword psql \
--host=localhost \
--port=5432 \
--username=postgres \
--dbname=postgres \
--set=ON_ERROR_STOP=1 \
--file=query.sql >> records.txt
docker rm --force postgres-db
done
~~~
This scripts sets up the table, waits a random amount up to 10 seconds, and then executes the above query.sql
. It does several times. It saves query.sql
outputs to records.txt
. I looked at records.txt
and found that **sometimes sequential scan, sometimes index scan is used to perform the query**. A filtered (via cat records.txt | grep "\->"
) version of records.txt
:
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=10) (actual time=11.526..15.200 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=9.559..9.559 rows=20852 loops=1)
-> Seq Scan on table0 (cost=0.00..216612.01 rows=122742 width=9) (actual time=0.068..17.788 rows=500 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=9) (actual time=5.963..8.939 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=4.144..4.144 rows=20852 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.32..4377.97 rows=1214 width=9) (actual time=7.447..11.406 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.01 rows=1214 width=0) (actual time=5.594..5.594 rows=20852 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.33..4384.75 rows=1216 width=9) (actual time=6.660..11.991 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.02 rows=1216 width=0) (actual time=4.744..4.745 rows=20852 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=9) (actual time=9.153..13.563 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=7.141..7.141 rows=20852 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.31..4374.58 rows=1213 width=10) (actual time=10.078..13.199 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.01 rows=1213 width=0) (actual time=8.108..8.108 rows=20852 loops=1)
-> Bitmap Heap Scan on table0 (cost=52.33..4384.76 rows=1216 width=10) (actual time=7.322..12.073 rows=500 loops=1)
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.02 rows=1216 width=0) (actual time=5.526..5.527 rows=20852 loops=1)
-> Seq Scan on table0 (cost=0.00..216610.51 rows=245232 width=9) (actual time=0.073..23.047 rows=500 loops=1)
The complete records.txt
is:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.34..1833.55 rows=500 width=10) (actual time=11.527..15.249 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=10) (actual time=11.526..15.200 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=9.559..9.559 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 4.666 ms
Execution Time: 15.657 ms
(13 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..882.39 rows=500 width=9) (actual time=0.068..17.831 rows=500 loops=1)
Buffers: shared read=1439
-> Seq Scan on table0 (cost=0.00..216612.01 rows=122742 width=9) (actual time=0.068..17.788 rows=500 loops=1)
Filter: ((col0)::text ~~ '%abc%'::text)
Rows Removed by Filter: 282649
Buffers: shared read=1439
Planning:
Buffers: shared hit=55 read=9 dirtied=1
Planning Time: 2.427 ms
Execution Time: 17.918 ms
(10 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.34..1833.55 rows=500 width=9) (actual time=5.965..8.988 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=9) (actual time=5.963..8.939 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=4.144..4.144 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 2.427 ms
Execution Time: 9.234 ms
(13 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.32..1833.89 rows=500 width=9) (actual time=7.448..11.454 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.32..4377.97 rows=1214 width=9) (actual time=7.447..11.406 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.01 rows=1214 width=0) (actual time=5.594..5.594 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=81 read=13 dirtied=1
Planning Time: 2.835 ms
Execution Time: 11.721 ms
(13 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.33..1833.75 rows=500 width=9) (actual time=6.662..12.059 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.33..4384.75 rows=1216 width=9) (actual time=6.660..11.991 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.02 rows=1216 width=0) (actual time=4.744..4.745 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 3.332 ms
Execution Time: 12.511 ms
(13 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.34..1833.55 rows=500 width=9) (actual time=9.154..13.621 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.34..4394.93 rows=1219 width=9) (actual time=9.153..13.563 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.04 rows=1219 width=0) (actual time=7.141..7.141 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 4.113 ms
Execution Time: 14.018 ms
(13 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.31..1833.95 rows=500 width=10) (actual time=10.079..13.249 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.31..4374.58 rows=1213 width=10) (actual time=10.078..13.199 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.01 rows=1213 width=0) (actual time=8.108..8.108 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 3.596 ms
Execution Time: 13.682 ms
(13 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.33..1833.75 rows=500 width=10) (actual time=7.323..12.126 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.33..4384.76 rows=1216 width=10) (actual time=7.322..12.073 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.02 rows=1216 width=0) (actual time=5.526..5.527 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 5.907 ms
Execution Time: 12.485 ms
(13 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..441.64 rows=500 width=9) (actual time=0.074..23.096 rows=500 loops=1)
Buffers: shared read=1439
-> Seq Scan on table0 (cost=0.00..216610.51 rows=245232 width=9) (actual time=0.073..23.047 rows=500 loops=1)
Filter: ((col0)::text ~~ '%abc%'::text)
Rows Removed by Filter: 282649
Buffers: shared read=1439
Planning:
Buffers: shared hit=51 read=13 dirtied=1
Planning Time: 2.087 ms
Execution Time: 23.218 ms
(10 rows)
I used dockerised Postgres 17 to provide easy reproducibility.
Restarting the docker container, running setup.sql
, and then:
explain (analyze, buffers, settings)
select * from table0 where col0 like '%abc%' limit 500;
returns:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52.32..1833.89 rows=500 width=9) (actual time=9.476..17.923 rows=500 loops=1)
Buffers: shared hit=4 read=435
-> Bitmap Heap Scan on table0 (cost=52.32..4377.97 rows=1214 width=9) (actual time=9.475..17.858 rows=500 loops=1)
Recheck Cond: ((col0)::text ~~ '%abc%'::text)
Heap Blocks: exact=428
Buffers: shared hit=4 read=435
-> Bitmap Index Scan on col0_gin_trgm_idx (cost=0.00..52.01 rows=1214 width=0) (actual time=7.662..7.662 rows=20852 loops=1)
Index Cond: ((col0)::text ~~ '%abc%'::text)
Buffers: shared hit=4 read=7
Planning:
Buffers: shared hit=55 read=9 dirtied=1
Planning Time: 5.597 ms
Execution Time: 18.334 ms
(13 rows)
Yet, I am not able to understand why sometimes a sequential scan, sometimes an index scan is used.
Why does a query plan seemingly depend on the time interval between the query and table setup?
----------
Below I address some questions which came up in the comments.
*Is there any other write activity in the DB?*
I did not initiate any, and since it's just a container I run on my local machine, unless there is some automated process I am not aware of, there isn't any writing activity.
*How busy is your server otherwise?*
I haven't noticed anything not working fine. It's an Apple M1 Pro, running MacOS Sonoma 14.5.
zabop
(169 rep)
Oct 11, 2024, 09:02 PM
• Last activity: Oct 15, 2024, 01:42 PM
Showing page 1 of 20 total questions