Sample Header Ad - 728x90

PostgreSQL locking problem

1 vote
0 answers
80 views
TL;DR We have a postgres stored procedure running as a job, which calls another stored procedure, which calls yet another stored procedure, and this internal stored procedure acquires ROW EXCLUSIVE lock on a table. This whole thing runs for over a minute, and the lock is kept until the end. This blocks another application from acquiring an EXCLUSIVE lock on the same table. Full story We have PostgreSQL 16.2 installed on RedHat 8.9. We have two separate databases on it, each database has multiple schemas and users with appropriate rights. We have a locking problem between two schemas. One of the schemas has a table called EMAIL_MESSAGE, which stores emails to be sent. We have two instances of a mass email sender application, which first try to assign a specific amount of emails to themselves as step 1, and in step 2 they read the assigned email data and process them. In order for the two email sender applications to not overwrite each other's assigned emails, the assignment starts by locking the table. It is handled in a stored procedure like so: 1. Stored procedure starts with a BEGIN, so it starts a transaction 2. Verify some input parameters, exit in case of error 3. Another BEGIN, starting a subtransaction 4. Get EXCLUSIVE lock on the EMAIL_MESSAGE table, waiting for 15 seconds 5. If unsuccessful, we get 55P03 error, which is caught in the EXCEPTION block of the subtransaction 6. If successful, UPDATE a number of records with the instance id of the email sender application (stored procedure input parameter), then end the subtransaction, and end the procedure as well Then, the email senders will SELECT all data with their instance id, which was previously assigned to them. Some excerpts from the procedure:
CREATE OR REPLACE PROCEDURE assign_emails(
    IN pi_instance TEXT,
    ... other params)
LANGUAGE 'plpgsql'
AS $BODY$
BEGIN
    -- verify params, exit if wrong params
    BEGIN
        SET LOCAL lock_timeout = '15s';
        LOCK TABLE email_message IN EXCLUSIVE MODE;

        -- further processing after lock is acquired
        UPDATE email_message SET processing_instance = pi_instance WHERE ...;
    EXCEPTION
        IF SQLSTATE = '55P03' THEN
            -- logging lock error into a log table
        ELSE
            -- logging generic error into a log table
        END IF;
    END;

    COMMIT; -- not sure if needed here? doesn't seem to do anything, but no error also
EXCEPTION
    -- logging error into a log table
END;
$BODY$;
This on its own works well, the two instances don't bother each other, the above detailed UPDATE assignment takes very small time only, 15 seconds of wait time for the lock is more than enough. However, we added another schema for another development, which contains a table, and a REST API is receiving data and saving it into this table. There is a job executing every minute (via PgAgent) which takes the new records on this table based on status, as well as some of the old records. For the new records we have to send an email, for the old records, based on time difference and status, we need to send a reminder email. The problem is, that this job can sometimes take a lot of time, sometimes even more than a minute, and it seems like it is preventing the above detailed EXCLUSIVE lock on the EMAIL_MESSAGE table. Not sure why though. The job's stored procedure also starts with a BEGIN so it starts a transaction, and it SELECTs from the EMAIL_MESSAGE table, so it acquires ACCESS SHARE lock, but this should be fine. But in order to insert the emails to be sent, we invoke stored procedures in the original email-related schema. These stored procedures will of course INSERT and UPDATE on the EMAIL_MESSAGE table, acquiring ROW EXCLUSIVE lock, which is a problem because it will prevent the EXCLUSIVE lock we need from the email senders. And when this ROW_EXCLUSIVE lock is acquired, relatively soon when this job starts, it is never released until the end. Which again can take a minute, so the table is in ROW EXCLUSIVE lock for a minute, which is not good for me. Parts from this offending job stored procedure:
CREATE OR REPLACE PROCEDURE process_minutely()
LANGUAGE 'plpgsql'
SECURITY DEFINER
AS $BODY$
DECLARE
    v_cur CURSOR FOR SELECT ...;
    v_rec RECORD;
BEGIN
    -- loop through the appropriate records
    OPEN v_cur;

    LOOP
        FETCH v_cur INTO v_rec;
        EXIT WHEN NOT FOUND;
        -- process v_rec, very complex processing logic here depending on status, time difference, related table records, etc.

        SELECT col INTO var FROM email_message; -- this will acquire ACCESS SHARE lock

        CALL register_email(params...); -- we call the email sending here
    END LOOP;

    CLOSE v_cur;
EXCEPTION
    -- logging error into a log table
END;
$BODY$;
The mentioned register_email procedure, this is invoked multiple times from the above procedure, hence it is separated, this will invoke the real internal email sending procedures:
CREATE OR REPLACE PROCEDURE register_email(params...)
LANGUAGE 'plpgsql'
SECURITY DEFINER
AS $BODY$
BEGIN
    -- preprocessing, parameter checks
    BEGIN
        -- these procedures will internally acquire the ROW EXCLUSIVE lock by INSERT-ing and UPDATE-ing the email_message table
        CALL original_create_email(params...);
        CALL original_prepare_email(params...);

        -- COMMIT; -- tried putting commit here, results in error
    EXCEPTION
        -- logging error into a log table
    END;

    -- COMMIT; -- tried putting commit here, results in error
EXCEPTION
    -- logging error into a log table
END;
$BODY$;
So to add some images, maybe it helps. This is the correct behaviour, EXCLUSIVE lock is acquired by one of the application instances, it does its thing quickly, then releases it, and the other can proceed: enter image description here This is the incorrect behaviour, where this new job stored procedure is blocking the application with the automatic ROW EXCLUSIVE lock for over a minute: enter image description here I tried to put the part of this new stored procudedure which acquires the ROW EXCLUSIVE lock into its own transaction by putting a BEGIN END around it, it doesn't help. I tried to put a COMMIT to various places, into the subtransaction, and after the subtransaction, it is not allowed with "invalid transaction termination" error. I admit that PostgreSQL transaction handling is a bit mysterious to me, I was more used to Oracle (pragma autonomous_transaction...), but trying to learn. As far as I understand, when I enter into a BEGIN EXCEPTION END block, it is a subtransaction automatically, commited when it exits, and rolled back in the EXCEPTION block. Doesn't it mean that when it exits from the block inside the
procedure, it should release the ROW EXCLUSIVE lock which was acquired inside it, and allow the EXCLUSIVE lock which is already waiting? How could I achieve correct functionality? Is it even achievable? What I would like to see is that the ROW EXCLUSIVE lock is released when it's not needed in the long running job, and reacquired when needed. I understand this will result in performance issues, but this is not necessarily a problem, it is a background processing job, and it is not too important if the email is sent now or 2 minutes later. But now, due to the blocking, the emails are never sent, because the email sender application is not able to proceed. As a workaround, I will make this job run every 5 minutes only, hopefully that will give some time to the email senders to work between the job executions. ---------------------------------- Update: After making some changes based on the comments, the situation is better now significantly, although not perfect. I guess the main problem is that I misunderstand the lock handling. So what I did now is: 1. The long running procedure is still mostly the same, but now it inserts the emails to be sent into a new table, not directly into the EMAIL_MESSAGE table 2. This way the lock is not kept on the EMAIL_MESSAGE table for a long time, so the email sender app can work 3. I added a new procedure which runs every 5 minutes, and it has 3 BEGIN EXCEPTION END blocks internally: 1. To mark the emails that I want to process (due to parallel inserts from the other procedure) 2. To calculate the email bodies and email sending time (email body is a template with some parameters being replaced) 3. To batch insert everything into the EMAIL_MESSAGE table as a single INSERT INTO ... SELECT ... FROM ... statement 4. Then, and this is not in its own BEGIN EXCEPTION END block, we update the original input table, the same table which the long running procedure is also working on 5. Then finally we delete from this new table everything that we processed So far so good, this new middle-layer procedure is quite fast, normally it finishes within a second. But sometimes I notice that it hangs in step number 3.4 above, updating the input table, and that is because it is waiting for the main long running procedure, which holds the lock on the same table. I can understand why this is. But, at the same time, the email sending application also hangs, because it cannot access the EMAIL_MESSAGE table. It is locked in step number 3.3 above in the middle-layer stored procedure, even though at the same time the procedure is already in step 3.4, so it is out of the BEGIN EXCEPTION END block, the changes are already commited (I googled that I can do that by embedding BEGIN EXCEPTION END blocks to make a subtransaction), so why doesn't it release the lock?
Asked by Gábor Major (163 rep)
Sep 11, 2024, 04:23 PM
Last activity: Sep 20, 2024, 07:05 AM