Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

3 votes

4 answers

285 views

Performance issues of a queue implemented on top of InnoDB

# Brief problem statement First thing first: _after_ posting this question originally and working more with our DBAs, I actually learned that our DB runs in a container, instead of being installed natively. From what I've read before, it's discouraged to do that in prod, since databases do all sorts of microoptimizations with memory and storage and cpu, and the container might obscure some things and so the behavior will not be optimal. So, if after reading this question you think it can be actually caused by the DB running in a container, definitely lmk. Alright so now to the actual question: ------- There will be a detailed explanation below, but this initial problem statement is just to give you a feel of what we’re dealing with here. We have a feature called “Tasks”, which is just a generic task runner. A task is represented by a single row in an InnoDB table (using MariaDB 10.6.14); anyone can create a task, and then one of the workers will pick it up, run, potentially retry a few times if there are errors, and once the task is finished (successfully or not), delete the row from the table. The issue is that, having plenty of tasks to run (even just 200K of runnable tasks which we need to run asap is enough), the performance alternates between those two modes: - Performance is great (the query to pick the next task takes 20-30ms, and every worker is able to run about 200 tasks per second or more) - Performance is terrible (the query quickly jumps to hundreds of milliseconds, and keeps slowly growing, and we’re running 1-3 tasks per second per worker). The “great” phase usually lasts for 5-10 mins, followed by a few hours of the “terrible” phase; then it fixes itself and goes back to “great”, etc. Here’s how it looks on the charts:

performance alternates between great or terrible

A few quick points to highlight: - On this chart, no new tasks are being inserted in the table; we already have a few hundred K of tasks in the table that we need to run ASAP, and as we run them (and DELETE the rows from the table), the performance alternates like that. - It doesn’t seem to have anything to do with the amount of data in the table: as mentioned, even just 200K of rows is enough, which is nothing for the modern hardware (and yeah the hardware specs used here are very good). It’s rather just the velocity of changes that seems to be causing this: those 200K of rows represent the tasks that we need to run ASAP, and we do. When we have a much lower rate of tasks that we delete/update, those performance issues don’t happen. - It doesn’t seem to be caused by a wrong index: if there wasn’t a proper index, the performance would always be bad. But here it switches back and forth on its own, without us adding any new data to the table, so it seems to be some internal mariadb issues. This was just to give you a feel of what the issue is like. I have more to say on this, but before I do that, I feel I need to share more implementation details, so let’s get to it. # Background info and implementation details As mentioned above, we have a generic task runner. A task is represented by a single row in an InnoDB table; anyone can create a task, and then one of the workers will pick it up, take care of, and eventually delete the row from the table. There are two additional features worth highlighting: - A task can be scheduled either to run ASAP or after a specific time in the future; the only time-related guarantee here is that the task will not run before its scheduled time. We say that a task is runnable if we don’t need to wait more and can already run it whenever we can; - Every task has a priority: from 0 (highest) to 4 (lowest); having multiple runnable tasks on different priorities, the higher-priority tasks will always be picked first. Btw those features are a big part of why this was implemented on top of MariaDB, and not say Kafka or similar. It’s not a textbook use case of Kafka. The usage pattern can be very bursty: most of the time we might have like 10-20 tasks per second being created and ran ASAP, and it doesn’t cause problems; but then as part of some batch job we might create a few millions of low-priority tasks, and the workers will run them at the rate of 500 tasks per second in total. This is how the table looks like (there are a bit more fields, but they are not relevant to the problem, so are omitted)

CREATE TABLE tasks (
  id bigint(20) unsigned NOT NULL AUTO_INCREMENT,

  -- Identifies which handler to actually run for this task. Think of it
  -- as the name of a function to call.
  name varchar(127) NOT NULL,

  -- Priority of the task; the highest priority is 0, and as the number
  -- increases, priority lowers. Currently, it can be from 0 to 4.
  priority tinyint(3) unsigned NOT NULL DEFAULT 2,

  -- Status of the task, one of those:
  -- - 0: PENDING: the task is ready to run asap or after next_attempt time.
  --      It can be the initial status that the task is created with, but the
  --      task could also reenter this status again later if it failed and the
  --      scheduler decided to retry it;
  -- - 1: PROCESSING: the task is currently being executed;
  --
  -- Note: there are no states for completed or failed tasks, because such
  -- tasks are deleted from this table.
  status tinyint(3) unsigned NOT NULL DEFAULT 0,

  -- Specifies the earliest time when the task needs to run next time.
  -- Used for tasks scheduled for the future, as well as for implementing
  -- retries with backoff. If NULL, the task should run ASAP.
  next_attempt timestamp(6) NULL,

  PRIMARY KEY (id),

  -- See details on this particular index ordering below.
  INDEX tasks_next_attempt_id_idx (priority, next_attempt, id)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

There are a few workers (not too many: something like 3 or 6 etc) polling this table (see the exact query below); when a worker gets no task to run, it backs off and polls it only once per second, but when it successfully gets a task to run, it tries to get the next one right away, so that having many runnable tasks, we run them as fast as we can. The query to pick the next task does the following, atomically: - Find the id of the next task to run (taking into account the scheduled time and priorities as described above) - Set its status to 1 (which means “processing”) - Set the next_attempt to the current time plus 15 seconds (which is an interval after which the task will rerun if it appears dead; it’s not really relevant to the problem, but for the context: while the task is running, every 10 seconds its next_attempt will be again updated to 15 seconds into the future, so as long as the worker is functioning and the task keeps running, next_attempt will always be in the future) - Return the task id Here’s the exact query (where 2024-04-11 08:00:00 used as the current time: we actually pass specific timestamps like that instead of using NOW() for better testability, but it’s not related to the issue).

UPDATE tasks
INNER JOIN (
    SELECT id FROM tasks
    WHERE
            priority IN (0, 1, 2, 3, 4)
        AND (next_attempt IS NULL OR next_attempt = 0 AND priority  select count(*) from tasks;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.3 sec)

mysql> SELECT id FROM tasks WHERE priority IN (0, 1, 2, 3, 4) AND (next_attempt IS NULL OR next_attempt  EXPLAIN SELECT id FROM tasks WHERE priority IN (0, 1, 2, 3, 4) AND (next_attempt IS NULL OR next_attempt <= NOW()) ORDER BY priority, next_attempt, id LIMIT 1 FOR UPDATE SKIP LOCKED;                    
+------+-------------+----------+-------+---------------------------------------+---------------------------------------+---------+------+--------+--------------------------+
| id   | select_type | table    | type  | possible_keys                         | key                                   | key_len | ref  | rows   | Extra                    |
+------+-------------+----------+-------+---------------------------------------+---------------------------------------+---------+------+--------+--------------------------+
|    1 | SIMPLE      | tasks    | range | tasks_priority_next_attempt_id_idx    | tasks_priority_next_attempt_id_idx    | 9       | NULL | 748746 | Using where; Using index |
+------+-------------+----------+-------+---------------------------------------+---------------------------------------+---------+------+--------+--------------------------+
1 row in set (0.26 sec)

I also tried to SHOW TABLE STATUS, the important bits from there are:

Data_length:    45 092 864
Index_length:    7 393 280
Data_free:     827 326 464

After a few hours of being in that degraded state, it recovered on its own (without having any data being inserted), and those stats became:

Data_length:        4 096
Index_length:       4 096
Data_free:    879 755 264

I’m not a DBA and not sure what exactly these numbers mean though, but thought it’s worth sharing them here. I’m guessing it tells me that the index is full of garbage (7 MB of index on an empty table can’t have anything but garbage), and it must also be telling me that the index is not getting rebuilt when it should have. ## 2. If we don’t DELETE the tasks, but only UPDATE them to make them non-runnable, it doesn’t help with the performance I was just suspecting that the fragmentation (which is supposedly caused by deleting a lot of data from the table) might have something to do with it, so I just changed the logic to UPDATE the task after completion (set it to lowest the priority 5 which is never runnable) instead of DELETEing the row. It didn’t change anything about the performance; the patterns remained the same. ## 3. Doing a bunch of INSERTs helps As another observation, when we not only run tasks (and therefore DELETE them from the table), but also INSERT new tasks at about the same rate, then even though mariadb still switches to poor performance periodically, it recovers very quickly. Check it out:

- So while we’re inserting tasks, it doesn't go into the degraded state for too long; - Instead, there is a distinctive pattern: seemingly every 4 mins, the performance drops, and then immediately recovers. - When we stopped creating tasks, it worked for a few minutes, and again fell into a degraded state for an extended period of time. I also confirmed that by just waiting for it to degrade, and then inserting a bunch of tasks, to check if it’d help it to recover. It helped every time. Empirically found that inserting 10K or 20K is not enough, but 30K of tasks is usually enough; the performance recovers right after inserting enough tasks:

INSERTing about 30K of tasks helps mariadb to recover

I guess it tells me that frequent INSERTs cause MariaDB to do something useful with the index, while frequent DELETEs and UPDATEs unfortunately do not. ## 4. Rebuilding an index helps If we just build a new index and drop the old one:

ALTER TABLE tasks ADD INDEX tasks_next_attempt_id_idx2 (priority,next_attempt,id), ALGORITHM=INPLACE, LOCK=NONE;

ALTER TABLE tasks DROP INDEX tasks_next_attempt_id_idx;

It fixes the issue immediately, and doesn’t lock the table while building an index. Actually this is the most viable workaround we’re thinking of, at the moment: if we can’t find a way to make it happen automatically, we can just add some app logic like “if the next-task queries become slower than 100ms and stay this way for 5 seconds, recreate the index manually”; or even the lazy way like “rebuild the index every few minutes”. It sucks, and is generally a weird design to do ALTER TABLE in the app code, but practically it’s still much better than letting it be in the degraded state for hours. # Question I’d appreciate any thoughts and feedback you have based on the explanation above, but the questions I'm actually asking are: ## 1. What is happening? Really curious to learn some MariaDB implementation details which would explain this behavior, since it doesn’t make sense to me and I wasn’t able to find it on my own. ## 2. How to make it work fast without having to rebuild the index manually? As mentioned before, doing a bunch of INSERTs, like 30K or more, usually helps it to recover from the degraded state, so it looks like MariaDB does some maintenance to the index behind the scenes, and this maintenance doesn’t happen for DELETEs or UPDATEs. I wonder if there is some knob in MariaDB that we can tune to enable some more aggressive index maintenance, without having to rebuild it manually.

Dmitry Frank (131 rep)

Apr 12, 2024, 09:35 AM • Last activity: May 31, 2025, 04:05 PM

0 votes

2 answers

92 views

Best Database specifications for my IOT solution

database-design stored-procedures queue

i'm working on a IOT environment that basically is a sensor that can **measure more then 300 times a second**. We receive this measurements remotely using **MQTT protocol**. I want to set up **a painel** where i can **watch in real time the measures** my sensor is doing, i can do this with MQTT with...

                                  i'm working on a IOT environment that basically is a sensor that can **measure more then 300 times a second**. We receive this measurements remotely using **MQTT protocol**. I want to set up **a painel** where i can **watch in real time the measures** my sensor is doing, i can do this with MQTT with no problem at all. But someone have ever done this using some database function? I know there are something called **"Queue watch"** for database that can do something similar.

The problem starts when i need to **store those measurements**, because i need to do analysis on them, draw graphics, and things like that.

As i'm not familiar with databases, **there is a lot of data to store** ( i imagine ) **what would be the best database environment** for this kind of thing?

------------------------
----UPDATE1 -----

*How many do you need to store? Last hour? Last day? Last decade? – **Basil Bourque***

As it's a service for clients, i would say that the lest month of measurements would be "hot data", i want to make them easily accessible and as fast as possible and anything older then that i can store as "cold data" just for eventual analysis, or any request for old measurements from clients.

------------

Bruno Cerk (103 rep)

May 20, 2018, 05:22 AM • Last activity: Mar 30, 2025, 04:47 AM

0 votes

0 answers

73 views

using WHILE EXISTS (SELECT 1 FROM dbo.TargetQueue) caused service broker keep calling activation function even there is no msg on the target queue

sql-server performance sql-server-2019 service-broker queue

I have the following SQL service broker set up: ``` USE master -- Create database ALTER DATABASE ServiceBrokerDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE ServiceBrokerDemo CREATE DATABASE ServiceBrokerDemo; -- Create a log USE ServiceBrokerDemo GO DROP TABLE IF EXISTS DemoServiceBrok...

I have the following SQL service broker set up:

USE  master
-- Create database
ALTER DATABASE ServiceBrokerDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE ServiceBrokerDemo
CREATE DATABASE ServiceBrokerDemo;

-- Create a log
USE ServiceBrokerDemo
GO 
DROP TABLE IF EXISTS DemoServiceBrokerLog;
CREATE TABLE DemoServiceBrokerLog (
    ID INT IDENTITY(1,1) PRIMARY KEY,
    Date DATETIME,
	MessageTypeName sysname,
	ActivationFunction sysname, 
	MessageBody xml
);

-- CREATE TWO MESSAGE QUEUES: one for the sender and one for the receiver (who processes messages from the sender)
USE ServiceBrokerDemo
GO 

CREATE QUEUE dbo.InitiatorQueue WITH STATUS=ON ON [PRIMARY]
GO

CREATE QUEUE dbo.TargetQueue WITH STATUS=ON ON [PRIMARY]
GO


-- CREATE TWO ACTIVATION FUNCTIONS: for the two queues
-- This is for the target queue
CREATE OR ALTER PROCEDURE [dbo].[TargetQueueProcess]
AS
BEGIN
	SET NOCOUNT ON;

	DECLARE @conversationHandle UNIQUEIDENTIFIER
	DECLARE @messagetypename NVARCHAR(256)
	DECLARE @messagebody XML

	DECLARE @errorNumber AS int;
	DECLARE @errorMessage AS nvarchar(3000);

	WHILE EXISTS (SELECT 1 FROM dbo.TargetQueue)
	BEGIN
		;RECEIVE TOP(1) @conversationHandle = conversation_handle
			 , @messagetypename = message_type_name
			 , @messagebody = CASE WHEN validation = 'X' THEN CAST(message_body AS XML) ELSE NULL END
		  FROM [dbo].[TargetQueue]

		IF @conversationHandle IS NOT NULL
		BEGIN
			-- Log
			INSERT INTO DemoServiceBrokerLog VALUES (
						getdate(),
						@messageTypeName,
						'TargetQueueProcess', 
						@messagebody
				);


			IF @messageTypeName = 'http://my.dba.aventure/dbo/DemoServiceBroker/endofconversation ' 
			BEGIN
				-- End the conversation 
				-- The initiator will receive an EndDialog message afterwards
				END CONVERSATION @conversationHandle;
			END
			ELSE IF @messageTypeName = 'http://my.dba.aventure/dbo/DemoServiceBroker/message ' 
			BEGIN
				BEGIN TRY 
					-- Process the message
					PRINT ('Do the actual task in here')
				END TRY 
				BEGIN CATCH 
					-- Capture the error
					SELECT @errorNumber = ERROR_NUMBER()
					, @errorMessage = ERROR_MESSAGE()

					-- End the conversation with an error
					-- The initiator will receive an Error message afterwards
					END CONVERSATION @conversationHandle WITH ERROR = @errorNumber DESCRIPTION = @errorMessage;
				END CATCH 
			END
		END
	END
END
GO

-- This procedure mainly to end the conversation at the initiator side and perform action upon errors
USE ServiceBrokerDemo
GO
CREATE OR ALTER PROCEDURE [dbo].[InitiatorQueueProcess]
AS
BEGIN
	SET NOCOUNT ON;

	DECLARE @conversationHandle UNIQUEIDENTIFIER
	DECLARE @messageTypeName SYSNAME
	DECLARE @messagebody XML

	DECLARE @errorCode int;
	DECLARE @errorMessage nvarchar(3000);

	--BEGIN TRANSACTION
		;RECEIVE TOP(1) @conversationHandle = conversation_handle
			 , @messageTypeName = message_type_name
			 , @messagebody = CASE WHEN validation = 'X' THEN CAST(message_body AS XML) ELSE NULL END
		  FROM [dbo].[InitiatorQueue]

		IF @conversationHandle IS NOT NULL
		BEGIN
			INSERT INTO DemoServiceBrokerLog VALUES (
						getdate(),
						@messageTypeName,
						'InitiatorQueueProcess', 
						@messagebody
			)

			IF @messageTypeName = 'http://schemas.microsoft.com/SQL/ServiceBroker/DialogTimer ' 
			BEGIN
				;SEND ON CONVERSATION @conversationHandle 
					MESSAGE TYPE [http://my.dba.aventure/dbo/DemoServiceBroker/endofconversation] ;
			END
			ELSE IF @messageTypeName = 'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog ' 
			BEGIN
				END CONVERSATION @conversationHandle;
			END
			ELSE IF @messageTypeName = 'http://schemas.microsoft.com/SQL/ServiceBroker/Error '
			BEGIN 
				-- Log the error before ending the conversation
				SET @errorCode = 
				(SELECT @messagebody.value(
				N'declare namespace 
				brokerns="http://schemas.microsoft.com/SQL/ServiceBroker/Error "; 
				(/brokerns:Error/brokerns:Code)', 'int'));

				SET @errorMessage = (SELECT @messagebody.value(
				'declare namespace
				brokerns="http://schemas.microsoft.com/SQL/ServiceBroker/Error ";
				(/brokerns:Error/brokerns:Description)', 'nvarchar(3000)'));

				-- do something with the error 
				PRINT('log the errors to some table')

				-- End conversation on the initiator side
				END CONVERSATION @conversationHandle;	
			END
		END
	--COMMIT TRANSACTION;
END
GO


-- Alter queue to associate them to the activation functions
USE ServiceBrokerDemo
GO
ALTER QUEUE dbo.TargetQueue WITH STATUS = ON, RETENTION = OFF, ACTIVATION (STATUS = ON, PROCEDURE_NAME = dbo.TargetQueueProcess, MAX_QUEUE_READERS = 1, EXECUTE AS SELF) , POISON_MESSAGE_HANDLING (STATUS = OFF) 
GO 
ALTER QUEUE dbo.InitiatorQueue WITH STATUS = ON, RETENTION = OFF, ACTIVATION (STATUS = ON, PROCEDURE_NAME = dbo.InitiatorQueueProcess, MAX_QUEUE_READERS = 1, EXECUTE AS SELF) , POISON_MESSAGE_HANDLING (STATUS = OFF) 
GO 

-- Create message type 
CREATE MESSAGE TYPE [http://my.dba.aventure/dbo/DemoServiceBroker/message] 
CREATE MESSAGE TYPE [http://my.dba.aventure/dbo/DemoServiceBroker/endofconversation] 

-- Create contract
CREATE CONTRACT [http://my.dba.aventure/dbo/DemoServiceBroker/contract]  
(
	[http://my.dba.aventure/dbo/DemoServiceBroker/message]  SENT BY INITIATOR, 
	[http://my.dba.aventure/dbo/DemoServiceBroker/endofconversation]  SENT BY INITIATOR
)

-- CREATE Service
CREATE SERVICE [http://my.dba.aventure/dbo/DemoServiceBroker/InitiatorService]   ON QUEUE [dbo].[InitiatorQueue] ([http://my.dba.aventure/dbo/DemoServiceBroker/contract]) 
CREATE SERVICE [http://my.dba.aventure/dbo/DemoServiceBroker/TargetService]   ON QUEUE [dbo].[TargetQueue] ([http://my.dba.aventure/dbo/DemoServiceBroker/contract])

I sent a message from the initiator to the target using the following code:

USE  ServiceBrokerDemo

DECLARE @conversationHandle UNIQUEIDENTIFIER
DECLARE @fromService SYSNAME = 'http://my.dba.aventure/dbo/DemoServiceBroker/InitiatorService '
DECLARE @toService SYSNAME = 'http://my.dba.aventure/dbo/DemoServiceBroker/TargetService '
DECLARE @contract SYSNAME = 'http://my.dba.aventure/dbo/DemoServiceBroker/contract '

-- Craft a contrived message body
DECLARE @messageBody XML = 'some message'


BEGIN DIALOG CONVERSATION @conversationHandle
FROM SERVICE @fromService
TO SERVICE @toService
ON CONTRACT @contract
WITH ENCRYPTION = OFF;

BEGIN CONVERSATION TIMER (@conversationHandle) TIMEOUT = 5

;SEND ON CONVERSATION @conversationHandle
	MESSAGE TYPE [http://my.dba.aventure/dbo/DemoServiceBroker/message]  
	(@messageBody)

I ran the following code to check the status of the queue

SELECT *
FROM sys.dm_broker_queue_monitors AS m
INNER JOIN sys.service_queues q
ON m.queue_id = q.object_id

The status of the queue keeps to be RECEIVES_OCCURRING even hours after the last time I ran the code that sends a message. I would expect the state to be INACTIVE since all the messages in the target already processed right? And the target activation function keeps executing even though nothing in the queue. However, as soon as I fixed the while loop part, the state of the queue is correctly changed to INACTIVE as soon the messages have been processed.

...
	WHILE (1=1)
	BEGIN
		;RECEIVE TOP(1) @conversationHandle = conversation_handle
			 , @messagetypename = message_type_name
			 , @messagebody = CASE WHEN validation = 'X' THEN CAST(message_body AS XML) ELSE NULL END
		  FROM [dbo].[TargetQueue]
		
		IF (@@ROWCOUNT=0)
		BEGIN
			BREAK
		END

		IF @conversationHandle IS NOT NULL
		BEGIN
...

Could you help me explain why the WHILE EXISTS (SELECT 1 FROM dbo.TargetQueue) caused the service broker keep calling the activation function to receive messages even though there is no message on the target queue?

Tuyen Nguyen (343 rep)

Oct 18, 2024, 06:20 PM • Last activity: Oct 30, 2024, 02:28 PM

152 votes

1 answers

199406 views

Postgres UPDATE ... LIMIT 1

postgresql update concurrency queue

I have a Postgres database which contains details on clusters of servers, such as server status ('active', 'standby' etc). Active servers at any time may need to fail over to a standby, and I don't care which standby is used in particular. I want a database query to change the status of a standby -...

                                  I have a Postgres database which contains details on clusters of servers, such as server status ('active', 'standby' etc). Active servers at any time may need to fail over to a standby, and I don't care which standby is used in particular. 

I want a database query to change the status of a standby - JUST ONE - and return the server IP that is to be used. The pick can be arbitrary: since the status of the server changes with the query, it doesn't matter which standby is selected.

Is it possible to limit my query to just one update?

Here is what I have so far:

    UPDATE server_info SET status = 'active' 
    WHERE status = 'standby' [[LIMIT 1???]] 
    RETURNING server_ip;

Postgres doesn't like this. What could I do differently?

vastlysuperiorman (1695 rep)

Jul 1, 2014, 04:54 PM • Last activity: May 3, 2024, 09:23 PM

7 votes

4 answers

1108 views

Is my queue table implementation race condition safe?

sql-server sql-server-2019 concurrency standard-edition queue

Hello people smarter than me! I've created a sort-of-a-queue table system, but it seems too simple to be safe from race conditions. Am I missing something or is the following race condition safe? The Schema - I have a table, let's call it `ProductQueue`: CREATE TABLE dbo.ProductQueue ( SerialId BIGI...

                                  Hello people smarter than me! I've created a sort-of-a-queue table system, but it seems too simple to be safe from race conditions. Am I missing something or is the following race condition safe?

The Schema
-

I have a table, let's call it ProductQueue:

    CREATE TABLE dbo.ProductQueue
    (
        SerialId BIGINT PRIMARY KEY,
        QueuedDateTime DATETIME NOT NULL -- Only using this for reference, no functionality is tied to it
    );

I have procedure for adding to the queue called AddToProductQueue:

    CREATE PROCEDURE dbo.AddToProductQueue (@SerialId BIGINT)
    AS
    BEGIN
    	INSERT INTO dbo.ProductQueue (SerialId, QueuedDateTime)
    	OUTPUT Inserted.SerialId
    	SELECT @SerialId, GETDATE();
    END

I also have a procedure for removing from the queue called RemoveFromProductQueue:

    CREATE PROCEDURE dbo.RemoveFromProductQueue (@SerialId BIGINT)
    AS
    BEGIN
    	DELETE FROM dbo.ProductQueue
    	OUTPUT Deleted.SerialId
    	WHERE SerialId = @SerialId;
    END

Note, SerialId is globally unique for a Product in the source database / system. I.e. no two instances of a Product can ever have the same SerialId. That's the extent of it on the database side.

The Workflow
-

 - I have an application process that runs hourly.
 - That process gets a variable list of SerialIds from the source system.
 - It iteratively calls the AddToProductQueue procedure on each SerialId in its list.
 - If the procedure tries to insert a SerialId that exists in the ProductQueue table already, it throws a primary key violation error, and the application process catches that error and skips that SerialId.
 - Otherwise, the procedure successfully adds that SerialId to the ProductQueue table and returns it back to the application process.
 - The application process then adds that successfully queued SerialId to a separate list.
 - After the application process finishes iterating its list of all candidate SerialIds to enqueue, it then iterates its new list of successfully queued SerialIds and does external work on them, in a **separate thread** per SerialId. (This work is all unrelated to the database.)
 - Finally, as each thread finishes its external work, the last step in that asynchronous thread is to remove that SerialId from the ProductQueue table by calling the RemoveFromProductQueue procedure. (Note that a new database context object is instantiated and a new connection is created for each asynchronous call to this procedure, so that it is thread-safe on the application side.)

Additional Information
-

- There aren't any indexes on the ProductQueue table, and it'll never have more than 1,000 rows in it at one time. (Actually, most times it'll literally only have a couple of rows.)
- The same SerialId can become a candidate again to be re-added to the queue table on a future execution of the application process.
- There are no safe guards from preventing a second instance of the application process from concurrently running, either by accident or if the first instance took more than 1 hour to run, etc. (This is the concurrent part I'm most concerned about.)
- The transaction isolation level of the database (and connection being made) where the queue table and procedures live is the default isolation level of Read Committed.

Potential Problems
-

- The running instance of the application process crashes in an unhandled way, leaving SerialIds stuck in the queue table. This is acceptable for the business needs, and we plan to have exception reports to help us manually remediate this case.
- The application process gets executed multiple times concurrently and grabs some of the same SerialIds between instances in their initial source lists. I can't think of any negative ramifications of this case yet, since the enqueuing procedure is atomic, and the actual list of SerialIds that the application process will work on should be self-contained due to that atomic enqueuing procedure. We don't care which instance of the application process actually processes each SerialId as long as the same SerialId isn't processed concurrently by both process instances.
                                

J.D. (40893 rep)

Mar 13, 2024, 09:46 PM • Last activity: Mar 16, 2024, 03:03 PM

25 votes

3 answers

31458 views

Postgres Listen/Notify As Message Queue

postgresql event-notification queue

Is there any way to use Postgres Listen/Notify feature to deliver a message to a channel and have only one listener consume this message? The purpose for this is that I have multiple 'worker' apps all listening to the same Postgres channel. But I only want the work done once per message received thr...

                                  Is there any way to use Postgres Listen/Notify feature to deliver a message to a channel and have only one listener consume this message?

The purpose for this is that I have multiple 'worker' apps all listening to the same Postgres channel. But I only want the work done once per message received through the notification channel.

If Listen/Notify is not the correct feature in Postgres, is there a seperate feature I should be using?

Ideally I would like to do this without using any additional extensions.

moesef (351 rep)

Apr 13, 2017, 11:13 PM • Last activity: Aug 4, 2023, 04:54 PM

2 votes

2 answers

274 views

Message queue per user

mysql queue

I am using the same user to connect 4 servers to a database. Although the database is not overloaded, I think that if a heavy query delay occurs on one server, the other servers are affected, especially for new connections. Mysql server accepts up to 4030 simultaneous connections and has no query li...

                                  I am using the same user to connect 4 servers to a database.

Although the database is not overloaded, I think that if a heavy query delay occurs on one server, the other servers are affected, especially for new connections.

Mysql server accepts up to 4030 simultaneous connections and has no query limits per user.

Does Mysql queue the queries of the same user or those of all?

If I create a mysql user for each server will I better optimize user queries and queues?

Thank you so much.

Albert Arbós Corbalán (23 rep)

Jun 19, 2023, 05:02 PM • Last activity: Jun 20, 2023, 10:00 AM

3 votes

3 answers

1030 views

Impact of index on a "status" field with one (guaranteed) change

postgresql query-performance index queue event

## Introduction I have a PostgreSQL table setup as a queue/event-source. I would very much like to keep the "order" of the events (even after the queue item has been processed) as a source for e2e testing. I starting to run into query performance slow-downs (probably because of table bloat) and I do...

CREATE TABLE eventsource.events (
	id serial4 NOT NULL,
	message jsonb NOT NULL,
	status varchar(50) NOT NULL,
	createdOn timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
	CONSTRAINT events_pkey PRIMARY KEY (id)
);
CREATE INDEX ON eventsource.events (createdOn)

### Scrape Query (Pseudo Code)

BEGIN;  -- Start transaction

SELECT message, status
FROM eventsource.events ee
WHERE status = 'PENDING'
ORDER BY ee.createdOn ASC
FOR UPDATE SKIP LOCKED
LIMIT 10;  -- Get the OLDEST 10 events that are pending
-- I found that having a batch of work items was more performant than taking 1 at a time.

...
-- The application then uses the entries as tickets for doing work as in "I am working on these 10 items, no one else touch"
...
UPDATE ONLY eventsource.events SET status = 'DONE' WHERE id = $id_1
UPDATE ONLY eventsource.events SET status = 'DONE' WHERE id = $id_2
UPDATE ONLY eventsource.events SET status = 'FAIL' WHERE id = $id_3
UPDATE ONLY eventsource.events SET status = 'DONE' WHERE id = $id_n
...
END; -- finish transaction

### Rough Worker outline Multiple workers taking batches of work items form the queue then actioning them and reporting their statuses. I want to have as little overlap as possible.

## Assessment When looking at the execution plan it looks like the query has to traverse the entire table to get the records that are in 'PENDING' status. I thought this might be because of the ORDER BY ee.createdOn ASC at first. But after reviewing the execution plan I saw that the query was traversing the entire table searching for the status, and only THEN ordering it. ## Attempt I saw partial indexes and hoped it could reduce the search space of the queries.

CREATE INDEX ON eventsource.events (status)
WHERE status = 'PENDING'

But I think I made it worse ... Records are being inserted with the 'PENDING' status and then almost immediately changed to 'DONE' (or 'FAIL') as the application is consuming the queue. I think this might be destroying the index every time and then recreating it from scratch after the update to the status field (probably very expensive). ## Question What is the effect of updating a partial-index's key / predicate (and if significant) how do I effectively filter a big table on a changing key? ### Index Approach Is my index approach sound? > My first thought was Indexes but maybe partitions would be better suited here? What happens if the partition key gets changed? Is it just as destructive as destroying the index? ### Index type I know the default index type is a B-Tree, would a HASH index (or other ) be better in this situation? > Under the hood, would changing the index key of a HASH index, result in destroying/recreating the index table the same way it does with a B-Tree? ### Index creation I am unsure what the effect is of the partial index's key vs predicate. What is the effective difference in indexing between:

CREATE INDEX ON eventsource.events (status)
WHERE status = 'PENDING'

and

CREATE INDEX ON eventsource.events (createdOn)
WHERE status = 'PENDING'

Here I am using createdOn because it is in my scrape query but I think id would work too. > Would moving the index key to a different field effect the index creation/recreation? In this instance I moved it from the status field (which will change) to the createdOn field, which won't. I don't quite understand what this SO implies. And the Postgres docs are a little unclear to me about this type of partial index.

WesAtWork (143 rep)

May 9, 2023, 08:45 PM • Last activity: Jun 3, 2023, 04:29 PM

0 votes

2 answers

2219 views

Transactions are not completing when locking and updating one row at a time

postgresql update locking queue

I have a database function I have written to atomize a simple table update. My system is status driven, so I want the next row with the desired status, and update it to a new status, and return the ID. ``` CREATE OR REPLACE FUNCTION public.processnextid(prevstatus text, nextstatus text) RETURNS inte...

CREATE OR REPLACE FUNCTION public.processnextid(prevstatus text, nextstatus text)
 RETURNS integer
 LANGUAGE plpgsql
AS $function$
 declare PacketID int4;
BEGIN
  SELECT p.id INTO PacketID FROM packet p WHERE p.status = prevStatus limit 1 for update;
  UPDATE packet p1 set status = nextStatus WHERE p1.id = PacketID;
  RETURN PacketID;
  commit;
END;
$function$;

I have several (16) python processes that randomly call this function. The system runs in the background so it's a little had to get an exact play by play, but after several hours of processing I find that some of my processes have stalled. When I query the database for transactional status of the table I find transactions with status "Idle in transaction". I assume that they are waiting for something? I tried to make this function as simple and as quick as possible. Any ideas what the problem might be? How to proceed? I've looked around the Internet. The examples of select for update I have found are almost exactly the same. So I'm kind of baffled.

Sherman (1 rep)

Apr 18, 2023, 03:59 PM • Last activity: Apr 20, 2023, 04:02 PM

7 votes

2 answers

580 views

Using a Table as a Queue without sp_getapplock/sp_releaseapplock

sql-server locking concurrency queue

I have a list of commands I need to execute, all of which are contained within a table I've named `myQueue`. This table is a little unique in that some commands should be *grouped together* such that their execution is performed sequentially, rather than concurrently, as executing them concurrently...

I have a list of commands I need to execute, all of which are contained within a table I've named myQueue. This table is a little unique in that some commands should be *grouped together* such that their execution is performed sequentially, rather than concurrently, as executing them concurrently causes unwanted data artifacts and errors. Because of this, the queue cannot be classified in a typical **FIFO**/**LIFO** fashion as the dequeue order is determined at run-time. To summarize: 1. A Table named myQueue will act as a command queue (where dequeue order is determined at run-time) 2. Commands are added to the table in a random way 3. Commands may fall into *groups*, and if so, must be executed by a single worker thread in an ordered, sequential manner 4. Any number of worker threads can be running when commands are being dequeued 5. Dequeuing is performed via an UPDATE rather than a DELETE as this table is used for historical performance reporting for said commands My current approach is to iterate over this table using explicit mutex logic via [sp_getapplock](https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-getapplock-transact-sql?view=sql-server-ver15)/[ sp_releaseapplock](https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-releaseapplock-transact-sql?view=sql-server-ver15) calls. While this works as expected, the approach generates enough locking such that a high number of worker threads isn't feasible to iterate over the queue at any given time. After reading through Remus Rusanu's [excellent blog post on the topic](https://rusanu.com/2010/03/26/using-tables-as-queues/) , I decided to try utilizing table hints in hopes I could further optimize my approach. I'll include the test code below, but to summarize my results, the downside to using table hints and eliminating calls to sp_getapplock/sp_releaseapplock results in up to three undesirable behaviors as follows: 1. Deadlocking 2. Multiple threads execute commands that are contained within a single *group* 3. Thread Assignments are missing within a *group* of commands On a positive note though, when the code accommodates the deadlocking (e.g. retrying the offending operation as is currently included), the methods not using sp_getapplock/sp_releaseapplock and which don't exhibit undesirable behaviors 2 & 3 perform at least twice as fast, if not faster. **What I'm hoping for is that someone will point out how I'm not structuring my dequeing statements correctly so I can still move forward with using table hints exclusively.** If that doesn't work, so be it, but I wanted to see if it could be done just the same. The tests can be setup with the following code. The myQueue table creation and population with *commands* that are similar enough to my workload:

CREATE TABLE myQueue
(
	ID INT	IDENTITY (1,1) PRIMARY KEY CLUSTERED,
	Main	INT,
	Sub		INT,
	Detail	INT,
	Command	VARCHAR(MAX),
	Thread	INT,
	StartDT	DATETIME2,
	EndDT	DATETIME2
)
GO
INSERT INTO myQueue WITH (TABLOCKX) (Main, Sub, Detail, Command)
SELECT	ABS(CHECKSUM(NEWID()) % 200),
		ABS(CHECKSUM(NEWID()) % 1280),
		ABS(CHECKSUM(NEWID())),
		'WAITFOR DELAY ''00:00:00.01'''
FROM sys.types t1 CROSS JOIN 
	 sys.types t2 CROSS JOIN
	 sys.types t3 CROSS JOIN
	 (VALUES (1), (2)) t4(x)
GO

CREATE NONCLUSTERED INDEX [IX_myQueue_Update]
ON [dbo].[myQueue] ([Main],[Sub])
INCLUDE (Thread, EndDT)
GO

Worker Threads all follow the same logic. I recommend that if you run this locally, you just copy this code into separate query windows and run each query at the same time, making sure all Worker Threads adhere to the same locking method (there are 7 buried in the comments and surrounded by comment blocks):

SET NOCOUNT ON
DECLARE @updOUT TABLE
(
	Main	INT,
	Sub		INT
)
-- Update @CurrentThread as a unique ID, I tend to
SET NOCOUNT ON
DECLARE @updOUT TABLE
(
	Main	INT,
	Sub		INT
)
-- @CurrentThread should be a unique ID, which I'm assigning as @@SPID
DECLARE @CurrentThread INT = @@SPID, 
		@main INT, @sub INT,
		@id INT, @command VARCHAR(MAX), 
		@ErrorMessage NVARCHAR(4000)
WHILE	EXISTS(SELECT TOP 1 ID FROM myQueue WHERE EndDT IS NULL)
BEGIN
	BEGIN TRY

		--/*
		-- Method 1: Top 1 WITH TIES within CTE, direct update against CTE, Contained with sp_getapplock/sp_releaseapplock
		-- works
		-- high volume of xp_userlock waits
		BEGIN TRY
			BEGIN TRAN

				EXEC sp_getapplock @Resource = 'myQueue', @LockMode = 'Update'

				;WITH dequeue AS
				(
					SELECT TOP 1 WITH TIES
						Main, Sub, Thread
					FROM	myQueue
					WHERE	EndDT IS NULL
						AND	(Thread IS NULL OR Thread = @CurrentThread)
					ORDER BY Main, Sub
				)
				UPDATE	dequeue
				SET	Thread = @CurrentThread
				OUTPUT	DELETED.Main,
						DELETED.Sub
				INTO @updOUT

				EXEC sp_releaseapplock @Resource = 'myQueue'
			COMMIT
		END TRY
		BEGIN CATCH
			EXEC sp_releaseapplock @Resource = 'myQueue'
			ROLLBACK TRAN
		END CATCH
		--*/

		/*
		-- Method 2: Top 1 WITH TIES within CTE, direct update against CTE
		-- does not work
		-- some groupings contain multiple worker threads 
		-- missing thread assignments (e.g. NULL value in Thread Column)
		-- deadlocking experienced
		;WITH dequeue AS
		(
			SELECT TOP 1 WITH TIES
				Main, Sub, Thread
			FROM	myQueue WITH (ROWLOCK, UPDLOCK, READPAST)
			WHERE	EndDT IS NULL
				AND	(Thread IS NULL OR Thread = @CurrentThread)
			ORDER BY Main, Sub
		)
		UPDATE	dequeue
		SET	Thread = @CurrentThread
		OUTPUT	DELETED.Main,
				DELETED.Sub
		INTO @updOUT
		*/

		/*
		-- Method 3: Top 1 WITH TIES within CTE, join to myQueue table
		-- does not work
		-- some groupings contain multiple worker threads 
		-- missing thread assignments (e.g. NULL value in Thread Column)
		-- deadlocking experienced
		;WITH dequeue AS
		(
			SELECT TOP 1 WITH TIES
				Main, Sub, Thread
			FROM	myQueue WITH (ROWLOCK, UPDLOCK, READPAST)
			WHERE	EndDT IS NULL
				AND	(Thread IS NULL OR Thread = @CurrentThread)
			ORDER BY Main, Sub
		)
		UPDATE	myQ
		SET	Thread = @CurrentThread
		OUTPUT	DELETED.Main,
				DELETED.Sub
		INTO @updOUT
		FROM	myQueue myQ WITH (ROWLOCK, UPDLOCK, READPAST)
					INNER JOIN dequeue
						ON myQ.Main = dequeue.Main
						AND myQ.Sub = dequeue.Sub 
		*/

		/*
		-- Method 4: Top 1 within CTE, join to myQueue table
		-- does not work
		-- some groupings contain multiple worker threads
		;WITH dequeue AS
		(
			SELECT TOP 1
				Main, Sub, Thread
			FROM	myQueue WITH (ROWLOCK, UPDLOCK, READPAST)
			WHERE	EndDT IS NULL
				AND	(Thread IS NULL OR Thread = @CurrentThread)
			ORDER BY Main, Sub
		)
		UPDATE	myQ
		SET	Thread = @CurrentThread
		OUTPUT	DELETED.Main,
				DELETED.Sub
		INTO @updOUT
		FROM	myQueue myQ WITH (ROWLOCK, UPDLOCK, READPAST)
					INNER JOIN dequeue
						ON myQ.Main = dequeue.Main
						AND myQ.Sub = dequeue.Sub 
		*/

		/*
		-- Method 5: Top 1 WITH TIES within CTE, join to myQueue table, PAGLOCK hint instead of ROWLOCK
		-- works*
		-- deadlocking experienced
		;WITH dequeue AS
		(
			SELECT TOP 1 WITH TIES
				Main, Sub, Thread
			FROM	myQueue WITH (PAGLOCK, UPDLOCK, READPAST)
			WHERE	EndDT IS NULL
				AND	(Thread IS NULL OR Thread = @CurrentThread)
			ORDER BY Main, Sub
		)
		UPDATE	myQ
		SET	Thread = @CurrentThread
		OUTPUT	DELETED.Main,
				DELETED.Sub
		INTO @updOUT
		FROM	myQueue myQ WITH (PAGLOCK, UPDLOCK, READPAST)
					INNER JOIN dequeue
						ON myQ.Main = dequeue.Main
						AND myQ.Sub = dequeue.Sub 
		*/

		/*
		-- Method 6: Top 1 WITH TIES within CTE, direct update against CTE, PAGLOCK hint instead of ROWLOCK
		-- works*
		-- deadlocking experienced
		;WITH dequeue AS
		(
			SELECT TOP 1 WITH TIES
				Main, Sub, Thread
			FROM	myQueue WITH (PAGLOCK, UPDLOCK, READPAST)
			WHERE	EndDT IS NULL
				AND	(Thread IS NULL OR Thread = @CurrentThread)
			ORDER BY Main, Sub
		)
		UPDATE	dequeue
		SET	Thread = @CurrentThread
		OUTPUT	DELETED.Main,
				DELETED.Sub
		INTO @updOUT

		*/

		/*
		-- Method 7: Top 1 within CTE, join to myQueue table, PAGLOCK hint instead of ROWLOCK
		-- works*
		-- deadlocking experienced
		;WITH dequeue AS
		(
			SELECT TOP 1
				Main, Sub, Thread
			FROM	myQueue WITH (PAGLOCK, UPDLOCK, READPAST)
			WHERE	EndDT IS NULL
				AND	(Thread IS NULL OR Thread = @CurrentThread)
			ORDER BY Main, Sub
		)
		UPDATE	myQ
		SET	Thread = @CurrentThread
		OUTPUT	DELETED.Main,
				DELETED.Sub
		INTO @updOUT
		FROM	myQueue myQ WITH (PAGLOCK, UPDLOCK, READPAST)
					INNER JOIN dequeue
						ON myQ.Main = dequeue.Main
						AND myQ.Sub = dequeue.Sub 
		*/

		SELECT	TOP 1 
			  @main = Main
			, @sub = Sub
		FROM @updOUT

		END TRY
		BEGIN CATCH
			SELECT @ErrorMessage = 'Msg ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) + ', Level ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) 
			+ ', State ' + CAST(ERROR_STATE() AS VARCHAR(10)) + ', Line ' + CAST(ERROR_LINE() AS VARCHAR(10))
			+ CHAR(13) + CHAR(10) + ERROR_MESSAGE()

			RAISERROR(@ErrorMessage, 1, 1) WITH NOWAIT

			-- Set to Uselss values so cursor doesn't fire
			SELECT @main = -1, @sub = -1
		END CATCH

		DELETE FROM @updOUT

		DECLARE WorkQueueCur INSENSITIVE CURSOR
		FOR
			SELECT	ID, Command
			FROM	myQueue
			WHERE	Main = @main
				AND Sub = @sub
			ORDER BY Detail

		OPEN WorkQueueCur

		FETCH NEXT FROM WorkQueueCur
		INTO @id, @command

		WHILE @@FETCH_STATUS = 0
		BEGIN

			RETRY1:

			BEGIN TRY
				UPDATE	myQueue
				SET StartDT = GETDATE()
				WHERE ID = @id
			END TRY
			BEGIN CATCH
				SELECT @ErrorMessage = 'Retry1: Msg ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) + ', Level ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) 
				+ ', State ' + CAST(ERROR_STATE() AS VARCHAR(10)) + ', Line ' + CAST(ERROR_LINE() AS VARCHAR(10))
				+ CHAR(13) + CHAR(10) + ERROR_MESSAGE()

				RAISERROR(@ErrorMessage, 1, 1) WITH NOWAIT

				GOTO RETRY1
			END CATCH

			EXEC(@command)

			RETRY2:
			
			BEGIN TRY
				UPDATE	myQueue
				Set	EndDT = GETDATE()
				WHERE ID = @id
			END TRY
			BEGIN CATCH
				SELECT @ErrorMessage = 'Retry2: Msg ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) + ', Level ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) 
				+ ', State ' + CAST(ERROR_STATE() AS VARCHAR(10)) + ', Line ' + CAST(ERROR_LINE() AS VARCHAR(10))
				+ CHAR(13) + CHAR(10) + ERROR_MESSAGE()

				RAISERROR(@ErrorMessage, 1, 1) WITH NOWAIT

				GOTO RETRY2
			END CATCH

			FETCH NEXT FROM WorkQueueCur
			INTO @id, @command
		END

		CLOSE WorkQueueCur
		DEALLOCATE WorkQueueCur


END

Confirmation of undesirable behaviors 2 and 3 (or lack thereof), above can be determined by running the following statement:

;WITH invalidMThread AS (
	SELECT	*, DENSE_RANK() OVER (PARTITION BY Main, Sub ORDER BY Thread) AS ThreadCount
	FROM	dbo.myQueue WITH (NOLOCK)
	WHERE	StartDT IS NOT NULL
), invalidNThread AS (
	SELECT	*
	FROM	dbo.myQueue WITH (NOLOCK)
	WHERE	Thread IS NULL
			AND StartDT IS NOT NULL
)
SELECT	t1.*, 'Multiple Threads' AS Issue
FROM	dbo.myQueue t1 WITH (NOLOCK) 
		INNER JOIN invalidMThread i1
			ON i1.Main = t1.Main
			AND i1.Sub = t1.Sub
WHERE	i1.ThreadCount > 1

UNION

SELECT	t1.*, 'Unassigned Thread(s)' AS Issue
FROM	dbo.myQueue t1 WITH (NOLOCK) 
		INNER JOIN invalidNThread i2
			ON i2.Main = t1.Main
			AND i2.Sub = t1.Sub

ORDER BY t1.Main, t1.Sub

Again, I fully anticipate I missed some critical point Remus made in the blog post, so any help in pointing that out would be very much appreciated.

John Eisbrener (9547 rep)

Feb 6, 2020, 02:58 PM • Last activity: Jan 26, 2023, 04:34 PM

31 votes

2 answers

68609 views

Automatic aging-out (deletion) of old records in Postgres

postgresql delete queue

Does Postgres have any features to support aging-out old records? I want to use Postgres for logging, as a sort of queue, where records (log events) older than two weeks are automatically deleted.

                                  Does Postgres have any features to support aging-out old records?

I want to use Postgres for logging, as a sort of queue, where records (log events) older than two weeks are automatically deleted.

Basil Bourque (11188 rep)

Jul 14, 2015, 12:17 AM • Last activity: Jul 27, 2022, 12:27 PM

1 votes

1 answers

36 views

The approach to take when creating/editing many interrelated records in a transactional way?

postgresql transaction queue

So I have a fairly complex system I would think, that is starting to come about. It is too involved to write out all the tables, but here is a brief description. Basically I am creating a badging system like StackOverflow for posts. You have these sorts of tables: - users - posts - events (saved to...

                                  So I have a fairly complex system I would think, that is starting to come about. It is too involved to write out all the tables, but here is a brief description.

Basically I am creating a badging system like StackOverflow for posts. You have these sorts of tables:

- users
- posts
- events (saved to database so you know when each important event happened)
- user_statistics (rollup of badge counts and such)
- post_statistics (rollup of operation counts on the post, "it has been edited 20 times")
- user_badges (the awarding of a badge to a user)
- badge_types

Then let's say you "update a post". Here is what happens:

- post record is updated
- event record is created next, which says "update action", which only gets created if it doesn't exist, otherwise it reuses the same update event (so as not to prevent spamming the system). It is associated to post id and user id.
- user statistics are updated to count the new event if it was created. there could be multiple statistics to update, as the stats may be scoped to certain categories (like all posts for a specific language)
- if the statistics reach a threshold, then check if we need to create a badge or potentially multiple badges, then create the badges.
- potentially create a notification record.
- potentially a few other things, such as escalating privileges on the site now that they have more reputation, etc..
- all of this needs to succeed, so nothing is left undone (all counts are correct and badges are awarded properly).

How do you appropriately accomplish this in PostgreSQL? In my specific case there appears to be about 10 tables which are queried, and at least 5 tables which are modified (records created or updated). All of this should in theory be atomic, in a single transaction, but it seems like a lot to pack into a transaction, especially if you have these "events" coming in multiple times a second.

The only initial way I can think of _maybe_ _potentially_ solving this, is using a queue and background jobs. Each step above would be done sequentially, _outside of a transaction_, with potential time gaps between steps. So there would be an intermediate state where things are inconsistent. But eventually (it seems in theory), the queue would run and retry until success, and get to the correct state. Is that how this should be done?

If not, is it okay to have this complex of a single transaction on every event? I can't tell, I didn't think the solution of implementing badges and these counters would turn out so complex, but there is a lot to consider and do on each event. Any pointer in the general right direction is all I am looking for, based on your expertise building scalable database systems.

Assume that this system must be this complex, because I am really asking about in theory how to handle complex transaction requirements. That is, if you know of an ideal way of modeling a badging system, that would be nice to know, but wouldn't really address the main part of the question. Thank you for the help!

For now, for my purposes, everything can be considered to fit on a single machine, not distributed across multiple databases.
                                

Lance Pollard (221 rep)

Jun 28, 2022, 06:29 PM • Last activity: Jun 29, 2022, 12:11 PM

4 votes

3 answers

940 views

Why is our query suddenly returning rows it should not (using READPAST and UPDLOCK options)?

sql-server availability-groups concurrency queue

We have a job table that looks like this CREATE TABLE [dbo].[Clearing]( [Skey] [decimal](19, 0) IDENTITY(1,1) NOT NULL, [BsAcctId] [int] NULL, [Status] [varchar](20) NULL, CONSTRAINT [csPk_Clearing] PRIMARY KEY CLUSTERED ( [Skey] ASC ) ) with a covering index like this CREATE NONCLUSTERED INDEX [IX_...

                                  We have a job table that looks like this

    CREATE TABLE [dbo].[Clearing](
    	[Skey] [decimal](19, 0) IDENTITY(1,1) NOT NULL,
    	[BsAcctId] [int] NULL,
    	[Status] [varchar](20) NULL,
         CONSTRAINT [csPk_Clearing] PRIMARY KEY CLUSTERED ( [Skey] ASC ) 
    ) 

with a covering index like this

    CREATE NONCLUSTERED INDEX [IX_Status] ON [dbo].[Clearing]
    (
    	[Status] ASC
    )
    INCLUDE ( [Skey], [BsAcctId])

and we use this query to pick the next job

    select top (1) Skey, BsAcctId, Status from Clearing with ( readpast, updlock )
      where (Clearing.Status = 'NEW')
      order by Clearing.Skey

(The real table has about 10 columns. They are all in the index include() clause and the select column list.)

The execution plan is very simple.  It does an index seek using IX_Status, then a top operator.  Since the index is sorted on (status, skey) the plan does not need a sort.

The table is in a database in an AlwaysOn Availability Group.  The group has 2 DB servers. (It is a test system.)

Normally this table and query work great.
So we go to apply Windows updates, and do the usual.

 1. Fail over the primary to the secondary
 2. Apply Windows updates on the former primary
 3. Fail over back to the original primary
 4. Apply Windows updates on the secondary

After the second fail over and all the worker processes get new connections to the new primary, the query starts failing in the sense that multiple processes start getting the same jobs.

The problem is load related.  With 4 worker processes running it did not happen.  But with 10 workers it happens consistently.

This is using SQL Server 2016 Enterprise.  We do not have the query store enabled to see if the execution plan was weird at some point.

Any suggestions on why the query would start failing after two fail overs?

Since the query is only using the index and not touching the table, is UPDLOCK reliable?

Update 1 - we changed the process to list the locks held by the spid (using sp_lock @@spid) just after doing the select.  For the same skey, we are seeing different KEY locks held on the IX_Status index (indid=9)

	KEY	(aad9d6e672f9)	U
	KEY	(154698b9131c)	U

Update 2 - using index hint did not help.

Update 3 - Removing order by clause in query avoided the problem.  But we have a second table with same problem that needs the order by.

Update 4 - Our worker processes maintain a db connection pool.  ODBC does not tell us when a fail over happens, so connections to an old primary stay in the pool until we try to use them and they fail.  We suspect after we failover DB1 -> DB2 -> DB1, then old connections to DB1 might not fail like they should.  We made a change to close all pooled connections after any one connection is lost, and this seems to have avoided the problem.  (SQL Server ODBC added a "Connection Resiliency" feature that is fueling this suspicion.)

                                

brian beuning (151 rep)

Jun 21, 2019, 05:21 PM • Last activity: Nov 28, 2021, 07:07 AM

1 votes

2 answers

1127 views

Correct way to continuously receive from a queue

sql-server sql-server-2016 service-broker queue

I have to receive messages from a queue continuously. My current code looks like this: ``` DECLARE @status INT=1 WHILE @status IS NOT NULL BEGIN WAITFOR (RECEIVE @status=status, @message = message_body FROM dbo.MyQueue) ... ``` Of course this code never ends, I have to stop it manually, but it does...

I have to receive messages from a queue continuously. My current code looks like this:

DECLARE @status INT=1
    WHILE @status IS NOT NULL
    BEGIN
    WAITFOR (RECEIVE  @status=status,
    		@message = message_body 
    
    		FROM dbo.MyQueue) 
...

Of course this code never ends, I have to stop it manually, but it does what's needed. I'm thinking on put this code in a SP invoked by a job, scheduled at a time, in order to have the call resuming in case of error. What I would like to ask is: - Is a job the correct way of starting the "infinite" listener? - If a job is scheduled, let's say, every ten seconds, does it create "overlapped" processes? - If I kill the job, is the invoked SP killed as well? - If not, how can I kill running SP?

Felice Pollano (133 rep)

Aug 12, 2021, 07:08 AM • Last activity: Sep 3, 2021, 11:59 PM

1 votes

1 answers

302 views

Optimize UPDATE query for single "next" row

postgresql query-performance optimization update queue

We had information from new relic that one our query has a bad performance (almost 16 seconds for execution). We use PostgreSQL 11.8 on Debian Linux. Table: ```sql postgres=# SELECT table_name, column_name, data_type, character_maximum_length, character_octet_length, numeric_precision, numeric_preci...

We had information from new relic that one our query has a bad performance (almost 16 seconds for execution). We use PostgreSQL 11.8 on Debian Linux. Table:

postgres=# SELECT table_name, column_name, data_type, character_maximum_length, character_octet_length, numeric_precision, numeric_precision_radix, numeric_scale, is_nullable, is_updatable FROM information_schema.columns WHERE table_name = 'store_codeinventory';
     table_name      |    column_name    |        data_type         | character_maximum_length | character_octet_length | numeric_precision | numeric_precision_radix | numeric_scale | is_nullable | is_updatable 
---------------------+-------------------+--------------------------+--------------------------+------------------------+-------------------+-------------------------+---------------+-------------+--------------
 store_codeinventory | id                | integer                  |                          |                        |                32 |                       2 |             0 | NO          | YES
 store_codeinventory | code              | character varying        |                      255 |                   1020 |                   |                         |               | NO          | YES
 store_codeinventory | limit_days        | integer                  |                          |                        |                32 |                       2 |             0 | NO          | YES
 store_codeinventory | deactivation_date | timestamp with time zone |                          |                        |                   |                         |               | YES         | YES
 store_codeinventory | cost              | numeric                  |                          |                        |                 8 |                      10 |             2 | NO          | YES
 store_codeinventory | price             | numeric                  |                          |                        |                 8 |                      10 |             2 | NO          | YES
 store_codeinventory | created_date      | timestamp with time zone |                          |                        |                   |                         |               | YES         | YES
 store_codeinventory | claimed_date      | timestamp with time zone |                          |                        |                   |                         |               | YES         | YES
 store_codeinventory | is_active         | boolean                  |                          |                        |                   |                         |               | NO          | YES
 store_codeinventory | book_id           | integer                  |                          |                        |                32 |                       2 |             0 | YES         | YES
 store_codeinventory | code_import_id    | integer                  |                          |                        |                32 |                       2 |             0 | NO          | YES
 store_codeinventory | creator_id        | integer                  |                          |                        |                32 |                       2 |             0 | NO          | YES
 store_codeinventory | inv_id            | integer                  |                          |                        |                32 |                       2 |             0 | YES         | YES
 store_codeinventory | label_id          | integer                  |                          |                        |                32 |                       2 |             0 | YES         | YES
 store_codeinventory | recipient_id      | integer                  |                          |                        |                32 |                       2 |             0 | YES         | YES
 store_codeinventory | purchase_id       | integer                  |                          |                        |                32 |                       2 |             0 | YES         | YES
(16 rows)

I don't have an access to production, but I tried to fill in local db and understand the problem. Query:

UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW()
    where id = ANY((select array(select id from store_codeinventory
        where recipient_id is NULL and inv_id = 72 and is_active=true
        ORDER BY ID ASC LIMIT 1 FOR UPDATE)) ::integer[]) and recipient_id is NULL;

Query plan:

QUERY PLAN                                                                                               
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on store_codeinventory  (cost=0.64..47.43 rows=10 width=92) (actual time=0.291..0.291 rows=0 loops=1)
   InitPlan 2 (returns $3)
     ->  Result  (cost=0.34..0.35 rows=1 width=32) (actual time=0.060..0.060 rows=1 loops=1)
           InitPlan 1 (returns $2)
             ->  Limit  (cost=0.29..0.34 rows=1 width=10) (actual time=0.058..0.058 rows=1 loops=1)
                   ->  LockRows  (cost=0.29..1031.77 rows=19963 width=10) (actual time=0.057..0.057 rows=1 loops=1)
                         ->  Index Scan using store_codeinventory_pkey on store_codeinventory store_codeinventory_1  (cost=0.29..832.14 rows=19963 width=10) (actual time=0.053..0.053 rows=1 loops=1)
                               Filter: ((recipient_id IS NULL) AND is_active AND (inv_id = 72))
                               Rows Removed by Filter: 94
   ->  Index Scan using store_codeinventory_pkey on store_codeinventory  (cost=0.29..47.08 rows=10 width=92) (actual time=0.065..0.066 rows=1 loops=1)
         Index Cond: (id = ANY ($3))
         Filter: (recipient_id IS NULL)
 Planning Time: 0.365 ms
 Execution Time: 0.327 ms
(14 rows)

I am not so good at writing sql. Could someone give me advice where a bottleneck is? As I understood there might be two places: LockRows and Index Scan, right? I think that the first one was caused FOR UPDATE and the second one - select where ORDER BY ID ASC LIMIT? Is there a way to optimize them? UPD. information about indexes(I removed columns which didn't use in query):

postgres=# SELECT indexname, indexdef FROM pg_indexes WHERE schemaname = 'public' and tablename = 'store_codeinventory' AND (indexdef LIKE '%inv_id%' OR indexdef LIKE '%pkey%' OR indexdef LIKE '%recipient_id%');
                           indexname                            |                                                                        indexdef                                                                         
----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------
 store_codeinventory_inv_id_recipient_id_is_active_822b92e5_idx | CREATE INDEX store_codeinventory_inv_id_recipient_id_is_active_822b92e5_idx ON public.store_codeinventory USING btree (inv_id, recipient_id, is_active)
 store_codeinventory_recipient_id_e32fbb18                      | CREATE INDEX store_codeinventory_recipient_id_e32fbb18 ON public.store_codeinventory USING btree (recipient_id)
 store_codeinventory_inv_id_21e20eb7                            | CREATE INDEX store_codeinventory_inv_id_21e20eb7 ON public.store_codeinventory USING btree (inv_id)
 store_codeinventory_pkey                                       | CREATE UNIQUE INDEX store_codeinventory_pkey ON public.store_codeinventory USING btree (id)
(4 rows)

*UPD2* I am not so good at sql, but I decided to analyse that query and has concluded that we can simplify it. What you think about this variant:

UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW()
    where id = (select id from store_codeinventory
        where recipient_id is NULL and inv_id = 72 and is_active=true 
        ORDER BY ID ASC LIMIT 1 FOR UPDATE) and recipient_id is NULL;

I thought that we don't need the array because we will get only one or null in sub-select. Then I thought that LIMIT 1 for ordered by ID ASC it is the same to get minimal value of id. What if we write:

UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW()
    where id = (select MIN(id) from store_codeinventory
        where recipient_id is NULL and inv_id = 72 and is_active=true);

Is it the same or not?

Pavel Mikhadziuk (13 rep)

Aug 12, 2021, 12:22 PM • Last activity: Aug 13, 2021, 04:03 PM

1 votes

0 answers

144 views

Service Broker's SEND statement causes security context switch

sql-server security service-broker sql-clr queue

We have triggers in multiple tables that will send messages to a Service Broker service upon INSERT, UPDATE and DELETE operations. These operations may be nested as parts of bigger, transactional operations in various different processes, with additional operations coming up after sending the messages but before transactions actually gets committed. Our users authenticate using SSPI (Windows Active Directory). Some legacy stored procedure have been setup with WITH EXECUTE AS OWNER clause so we don't have to worry about possible permission issues for whatever these procedures will do - including writing in tables described above. Yesterday, following up an update to one of our apps, it started logging a strange permission error on a very basic SELECT statement. I looked up the procedure (it happens to be a SQLCLR procedure) and it should indeed be running as its owning schema (dbo in this case, which happens to be mapped to sa user) and therefore have every possibly required permissions. I ran a Sql Profiler session while trying to reproduce the issue and noticed something strange: the SEND command caused LoginName to switch back to the domain user.

From this point on, it's as if the EXECUTE AS OWNER condition in the head procedure was being reverted. Just to make sure, I disabled Service Broker on the database and did the exact same action. Sure enough, there was no more security context switch. What could possibly be the cause for this? Is this by design? Is it smh because it's invoked from within SQLCLR? As a reference, here is the declaring code for the head procedure:

CREATE PROCEDURE [dbo].[ps_SQLCLR_RunTicket]
	@ticket [nvarchar](128)
WITH EXECUTE AS OWNER
AS
EXTERNAL NAME [SomeAssembly].[SomeStaticClass].[SomeMethod]
GO

And here is a slimmed down version of the nested code that's calling out SEND:

-- This code normally is in a stored procedure that's invoked from a trigger

DECLARE @msg VARCHAR(MAX) = ''

SET @handle = NEWID();

BEGIN DIALOG CONVERSATION @handle
FROM SERVICE auditService
TO SERVICE 'receivingService'
ON CONTRACT auditServiceContract
WITH ENCRYPTION = OFF, LIFETIME = 600;

SEND ON CONVERSATION @handle
MESSAGE TYPE auditMessageType (@msg);

END CONVERSATION @handle;

Crono (111 rep)

Aug 12, 2021, 08:44 PM

2 votes

0 answers

283 views

Exactly-once FIFO queue in Synapse

azure queue azure-data-factory azure-synapse-analytics

Creating a queue table in SQL Server is a [much][1]-[studied][2] [problem][3]. However, I would like to implement one in Azure Synapse where many of the building blocks do not exists. Specifically * no table hints (READPAST etc.) * no OUTPUT clause * sp_getapplock is not available Our Synapse instan...

                                  Creating a queue table in SQL Server is a much -studied  problem . However, I would like to implement one in Azure Synapse where many of the building blocks do not exists. Specifically

 * no table hints (READPAST etc.)

 * no OUTPUT clause

 * sp_getapplock is not available

Our Synapse instance is configured  READ UNCOMMITTED.

Each evening a batch job will run. Fewer than 100 items will be placed in the queue, in the desired order. These will then be processed by between 4 and 10 concurrent consumers until the queue is drained. The cycle repeats the following day.

The number of items and consumers can change from day to day. The elapsed time for items will be quite skewed, from under a minute to over an hour. The design of the queue and the consumer code is completely open. The process is orchestrated by Azure Data Factory (ADF); the solution can rely on ADF if needed. We would rather avoid additional Azure services to limit costs. When the run fails we do not want to re-process completed items but those in-flight can be abandoned and started from scratch i.e. checkpoint/ restart at the item level is desired.

Michael Green (25265 rep)

Aug 2, 2021, 01:12 PM

0 votes

1 answers

79 views

Queueing MySQL record inserts to avoid over-subscription of a related resource ... table locking?

mysql transaction queue

Given a simplified hypothetical of seats in a lifeboat, if I have the following setup with a lifeboats table and a seats table where each record is one occupied seat in the given lifeboat: ``` CREATE TABLE lifeboats ( id INT UNSIGNED NOT NULL, total_seats TINYINT UNSIGNED NOT NULL, PRIMARY KEY (id))...

Given a simplified hypothetical of seats in a lifeboat, if I have the following setup with a lifeboats table and a seats table where each record is one occupied seat in the given lifeboat:

CREATE TABLE lifeboats (
  id INT UNSIGNED NOT NULL,
  total_seats TINYINT UNSIGNED NOT NULL,
  PRIMARY KEY (id));

INSERT INTO lifeboats (id, total_seats) VALUES (1, 3);
INSERT INTO lifeboats (id, total_seats) VALUES (2, 5);

CREATE TABLE seats (
  lifeboat_id INT UNSIGNED NOT NULL);

INSERT INTO seats (lifeboat_id) VALUES (1);
INSERT INTO seats (lifeboat_id) VALUES (1);
INSERT INTO seats (lifeboat_id) VALUES (1);
INSERT INTO seats (lifeboat_id) VALUES (2);

I can find lifeboats with available seats by querying:

SELECT 
    l.id, l.total_seats, COUNT(s.lifeboat_id) AS seats_taken
FROM
    lifeboats AS l
        LEFT JOIN
    seats AS s ON s.lifeboat_id = l.id
GROUP BY l.id
HAVING COUNT(s.lifeboat_id) < l.total_seats

What is the best way to ensure 2 clients do not grab the last seat in a lifeboat without implementing some coordinating process queue? My only idea (assuming I'm trying to grab seat in lifeboat 2) is going LOCK TABLE rambo like:

LOCK TABLE seats WRITE, lifeboats AS l READ, seats AS s READ;

INSERT INTO seats (lifeboat_id)
SELECT 
    id
FROM
    (SELECT 
        l.id, l.total_seats, COUNT(s.lifeboat_id) AS seats_taken
    FROM
        lifeboats AS l
    LEFT JOIN seats AS s ON s.lifeboat_id = l.id
    WHERE l.id = 2
    GROUP BY l.id
    HAVING COUNT(s.lifeboat_id) < l.total_seats) AS still_available;

UNLOCK TABLES;

but this is not very elegant, needless to say. (My environment is MySQL8/InnoDB) **UPDATE ... Another go:** I've been called out for giving a bad example. The question is really just: For a given table, how would you best limit (to X) the number of records inserted with a given value Y? The process receives the limit X & value Y , you query the existing records where value = Y to see if you are under the limit X or not, and if so you insert the record. But obviously you risk 2 people grabbing the "last" record unless you ... do something .... but what? (I thought the lifeboat analogy was actually a good one!) Idea one: Write lock the table before beginning the process. Other processes forced to wait. But this stops everybody ... including others with a different value Y. Idea/Question 2: If I have a 2nd table t2 with unique set of all the Y values and my select "count of Y" query includes t2 reference + a "FOR UPDATE OF t2", will the write lock placed on the Y row in t2 effectively force processes with value=Y to wait until others have completed the process?

Jeff N (3 rep)

Feb 19, 2021, 07:04 PM • Last activity: Feb 20, 2021, 02:00 PM

2 votes

1 answers

457 views

Duplicate key errors in Postgres queue-like table when rows should be locked by SELECT FOR UPDATE SKIP LOCKED

postgresql python django queue

I have a table called `queue` with items that need to be processed: ```sql CREATE TABLE public.queue ( id serial NOT NULL CONSTRAINT queue_pkey PRIMARY KEY ); ``` Another table `process` represents processed items from the queue (e.g. a report has been generated). In reality, there are more such tab...

I have a table called queue with items that need to be processed:

CREATE TABLE public.queue (
    id serial NOT NULL
        CONSTRAINT queue_pkey
            PRIMARY KEY
);

Another table process represents processed items from the queue (e.g. a report has been generated). In reality, there are more such tables (there are more processes that need to be performed on an item). There is one-to-one relation between queue and process – each item can be processed just once.

CREATE TABLE public.process (
    id            serial  NOT NULL
        CONSTRAINT process_pkey
            PRIMARY KEY,
    queue_item_id integer NOT NULL
        CONSTRAINT process_queue_item_id_key
            UNIQUE
        CONSTRAINT process_queue_item_id_8953ec7b_fk_datastore
            REFERENCES public.queue
            DEFERRABLE INITIALLY DEFERRED
);

Here are some test data:

BEGIN;
TRUNCATE TABLE queue, process
    RESTART IDENTITY CASCADE;
INSERT INTO queue
SELECT GENERATE_SERIES(1, 10000);
COMMIT;

The worker that processes the items is implemented as follows. The code is in Django (Python framework), but I am convinced that my error is not caused by Django or its ORM. (For simplicity, there is no termination condition.)

while True:
    with transaction.atomic():
        queue_items = Queue.objects \
                          .filter(process=None) \
                          .order_by() \
                          .select_for_update(skip_locked=True, of=('self',))[:8]

        print('Generating report...')
        time.sleep(0.1)

        Process.objects.bulk_create(
            (Process(queue_item=q)
             for q in queue_items)
        )

Here is a transcript of the SQL queries that travel to the database:

-- while True:

BEGIN;

SELECT queue."id"
FROM queue
         LEFT OUTER JOIN "process"
                         ON (queue."id" = "process"."queue_item_id")
WHERE "process"."id" IS NULL
LIMIT 8 FOR UPDATE OF queue SKIP LOCKED;

-- print('Generating report...')
-- time.sleep(0.5)

INSERT INTO "process" ("queue_item_id")
VALUES (1),
       (2),
       (3),
       (4),
       (5),
       (6),
       (7),
       (8)
RETURNING "process"."id";

COMMIT;

If I start one worker, the queue is processed perfectly fine. If I run two or more workers, I start getting this error:

duplicate key value violates unique constraint "process_queue_item_id_key"
DETAIL:  Key (queue_item_id)=(**) already exists.

**How another transaction could create rows in process for items in a queue when those rows are locked?** What I tried: 1. I tried to rewrite SELECT query with EXISTS:

SELECT "queue"."id"
  FROM "queue"
 WHERE NOT (EXISTS(SELECT U0."id", U0."queue_item_id" FROM "process" U0 WHERE U0."queue_item_id" = "queue"."id"))
 LIMIT 8
   FOR UPDATE OF "queue" SKIP LOCKED

without success, the same error occurs. 2. If I arrange the rows randomly, the error occurs much later (almost at the end of the queue).

SELECT "queue"."id"
  FROM "queue"
  LEFT OUTER JOIN "process"
    ON ("queue"."id" = "process"."queue_item_id")
 WHERE "process"."id" IS NULL
 ORDER BY RANDOM()
 LIMIT 8
   FOR UPDATE OF "queue" SKIP LOCKED

3. I put a breakpoint in the middle of the transaction and in another transaction I checked that the row locking in my opinion works correctly:

SELECT id
FROM queue
WHERE id NOT IN (
    SELECT id
    FROM queue
        FOR UPDATE SKIP LOCKED
);

* My Postgres version: PostgreSQL 13.1, compiled by Visual C++ build 1914, 64-bit. * Each worker has its own connection to Postgres with default isolation level (read committed).

illagrenan (123 rep)

Feb 8, 2021, 03:20 PM • Last activity: Feb 9, 2021, 03:58 PM

0 votes

1 answers

420 views

Are recent versions of Oracle able to implement a queue with SKIP LOCKED?

oracle queue

My question: what is the very latest status on whether or not recent versions of Oracle actually make it possible to implement a queue in a straightforward way with SKIP LOCKED and limiting to a single row, without resorting to indirect solutions/fragile solutions? There seems to be alot of history...

                                  My question: what is the very latest status on whether or not recent versions of Oracle actually make it possible to implement a queue in a straightforward way with SKIP LOCKED and limiting to a single row, without resorting to indirect solutions/fragile solutions?

There seems to be alot of history going back a long time showing SKIP LOCKED queueing being problematic with Oracle - I'm trying to determine if newest Oracle's have cleared these problems up.

I have implemented queue style functionality using Postgres with SKIP LOCKED, and I now wish to do the same with Oracle. I'm happy to use any Oracle version that makes it possible. So before I head down the path of trying to implement this for Oracle, I wanted to first ask if it is impossible to do so.

I've been reading alot of documentation on the web to try to determine if it can be done ... older information seems to indicate that Oracle is not able to truly limit results returned to only one single row, which is a big problem when using "SKIP LOCKED" in a queue because for queue processing you want only one row.

Previous information indicating row limit in Oracle depends on fragile/indirect solutions:

https://stackoverflow.com/questions/16299663/select-for-update-skip-locked-with-row-limit 

https://stackoverflow.com/questions/6117254/force-oracle-to-return-top-n-rows-with-skip-locked 

https://stackoverflow.com/questions/54766489/oracle-how-to-limit-number-of-rows-in-select-for-update-skip-locked 

https://stackoverflow.com/questions/470542/how-do-i-limit-the-number-of-rows-returned-by-an-oracle-query-after-ordering 

https://stackoverflow.com/questions/50390146/query-limit-in-oracle-database 

There appears to be a recent FETCH statement implemented in recent Oracle versions, but again it is not clear if this truly restricts access to a single row.  Does FETCH make it possible to implement a queue with SKIP LOCKED in a direct and robust manner?

Please note I am aware Oracle has Advanced Queue functionality built in - I do not want to use that.

Here's what I wrote for Postgres - it's pretty straightforward - note I am aware it lacks needed transaction handling:

    import psycopg2
        import psycopg2.extras
        import random
    
        db_params = {
            'database': 'jobs',
            'user': 'jobsuser',
            'password': 'superSecret',
            'host': '127.0.0.1',
            'port': '5432',
        }
        
        conn = psycopg2.connect(**db_params)
        cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
        
        def do_some_work(job_data):
            if random.choice([True, False]):
                print('do_some_work FAILED')
                raise Exception
            else:
                print('do_some_work SUCCESS')
        
        def process_job():
        
            sql = """DELETE FROM message_queue 
        WHERE id = (
          SELECT id
          FROM message_queue
          WHERE status = 'new'
          ORDER BY created ASC 
          FOR UPDATE SKIP LOCKED
          LIMIT 1
        )
        RETURNING *;
        """
            cur.execute(sql)
            queue_item = cur.fetchone()
            print('message_queue says to process job id: ', queue_item['target_id'])
            sql = """SELECT * FROM jobs WHERE id =%s AND status='new_waiting' AND attempts <= 3 FOR UPDATE;"""
            cur.execute(sql, (queue_item['target_id'],))
            job_data = cur.fetchone()
            if job_data:
                try:
                    do_some_work(job_data)
                    sql = """UPDATE jobs SET status = 'complete' WHERE id =%s;"""
                    cur.execute(sql, (queue_item['target_id'],))
                except Exception as e:
                    sql = """UPDATE jobs SET status = 'failed', attempts = attempts + 1 WHERE id =%s;"""
                    # if we want the job to run again, insert a new item to the message queue with this job id
                    cur.execute(sql, (queue_item['target_id'],))
            else:
                print('no job found, did not get job id: ', queue_item['target_id'])
            conn.commit()
        
        process_job()
        cur.close()
        conn.close()
                                

Duke Dougal (179 rep)

Sep 14, 2020, 11:18 PM • Last activity: Sep 15, 2020, 02:08 AM

Showing page 1 of 20 total questions