Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes

1 answers

516 views

What is the impact of DROP SCHEMA on concurrent read only workloads in PostgreSQL?

I have an ELT process that deals with a pretty small amount of data. I would like to have 1 schema called `prod` with the results of the previous ELT run and 1 schema called `dev` where I will perform the transformations of the current ELT run. Once the current ELT run is complete I would like to ma...

                                  I have an ELT process that deals with a pretty small amount of data. I would like to have 1 schema called prod with the results of the previous ELT run and 1 schema called dev where I will perform the transformations of the current ELT run. Once the current ELT run is complete I would like to make the switch by running something along the lines of:

    start transaction;
    drop schema prod; 
    alter schema dev rename to prod;
    end transaction;

All queries issued outside of the ELT process are read only. How would MVCC handle this? What would happen to read only queries that were running when the drop schema command is issued? What would happen to read only queries issued after the drop schema command? What about after the end of the transaction?

                                

foobar0100 (641 rep)

May 2, 2016, 03:19 AM • Last activity: Jul 28, 2025, 04:05 PM

3 votes

1 answers

48 views

Postgres: limiting serialization errors on serializable transactions

postgresql psql mvcc

Recently I learnt that serializable isolation level in Postgres is based on optimistic locking, hence theoretically can be performed fully concurrently as long as transactions do not interfere with each other (e.g., not doing read-then-write operations on the same rows). However, in practice, algorithm for detecting such interferences may produce false positives. As written in the docs , row-level locks can be promoted to page-level locks, if it's preferred from resources usage point of view. This increases chances of getting serialization error. For example, when I try to update two different rows concurrently, and it turns out that those two rows are stored on the same page and that both transactions acquired page-level lock, the one that commits later will get serialization error. I was trying to address this by increasing max_pred_locks_per_transaction, max_pred_locks_per_relation and max_pred_locks_per_page, but no matter how big those values are, I still get the error. For instance, let's take a look at an example that simulates 1k concurrent, independent money transfer operations. With the following config:

enable_seqscan = off
max_locks_per_transaction = 4192
max_pred_locks_per_transaction = 4192
max_pred_locks_per_relation = 4192
max_pred_locks_per_page = 4192

Having the following table:

create table if not exists accounts (
    id bigserial primary key,
    balance numeric(9, 2) not null
    );

When I execute the following queries:

session #1> begin isolation level serializable;
    session #2> begin isolation level serializable;
    session #1> select id, balance from accounts where id in (select n from generate_series(1,1000,4) n); -- select from accts with id=1, 5, 9, ..., 997
    session #1> select id, balance from accounts where id in (select n+1 from generate_series(1,1000,4) n); --- select from accts with id=2, 6, 10, ..., 998
    session #2> select id, balance from accounts where id in (select n from generate_series(3,1000,4) n); --- select from accts with id=3, 7, 11, ..., 999
    session #2> select id, balance from accounts where id in (select n+1 from generate_series(3,1000,4) n); --- select from accts with id=4, 8, 12, ..., 1000
    session #3> select locktype, count(*) from pg_locks group by locktype;
    session #1>  update accounts set balance = 50 where id in (select n from generate_series(1,1000,4) n);
    session #1>  update accounts set balance = 50 where id in (select n+1 from generate_series(1,1000,4) n);
    session #2>  update accounts set balance = 50 where id in (select n from generate_series(3,1000,4) n);
    session #2>  update accounts set balance = 50 where id in (select n+1 from generate_series(3,1000,4) n);
    session #1> commit;
    session #2> commit;

Commit of transaction in session #2 gets rejected:

ERROR:  could not serialize access due to read/write dependencies among transactions
    DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.
    HINT:  The transaction might succeed if retried.

And select on pg_locks (executed in the middle on session #3) returns the following results:

developer=# select locktype, count(*) from pg_locks group by locktype;
       locktype    | count 
    ---------------+-------
     page          |    14
     virtualxid    |     3
     tuple         |  1750
     transactionid |     1
     relation      |     7
    (5 rows)

There are 14 page pred locks, even though only 1750 tuple pred locks were acquired, meaning there was still room to allocate more tuple-level locks. I understand that in certain cases, tuple lock got promoted to page lock, and as a database user, I must be prepared to retry such transactions. Nonetheless it increases response time, and I'm wondering if it's possible to somehow setup the DB so that, for instance, in case of 1k concurrent updates, the DB would still use tuple-level locks and not go for page-level locks. Is it required to adjust some other configurations to achieve that? Thanks in advance!

Dawid Kałuża (33 rep)

Jun 16, 2025, 03:11 PM • Last activity: Jun 17, 2025, 06:22 AM

3 votes

1 answers

192 views

How does row versioning impact the size of a non-clustered columnstore index?

sql-server sql-server-2022 columnstore size mvcc

I have a table with a non-clustered columnstore index. The database has a readable secondary AG replica ([filled with problems...](https://dba.stackexchange.com/q/346216/277982)) and the table has triggers. Neither snapshot isolation nor RCSI are enabled. I infer that this means the table must be pa...

                                  I have a table with a non-clustered columnstore index. The database has a readable secondary AG replica ([filled with problems...](https://dba.stackexchange.com/q/346216/277982))  and the table has triggers. Neither snapshot isolation nor RCSI are enabled. I infer that this means the table must be paying the 14-bytes-per-row penalty for row versioning. However, I generally regard compressed columnstore segments as immutable. This gives me my question: **how does row versioning impact the size of a non-clustered columnstore index?**

I've consulted the [documentation for row versioning](https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-locking-and-row-versioning-guide?view=sql-server-ver16)  but it only mentions columnstore once and it is in an irrelevant way. I have also read [Niko Neugebauer's entire 131 part series](https://www.nikoport.com/columnstore/) , but I do not recall it ever being mentioned there even in [this very relevant part](https://www.nikoport.com/2015/09/22/columnstore-indexes-part-67-clustered-columstore-isolation-levels-transactional-locking/) . This [blog post](https://sql-sasquatch.blogspot.com/2021/06/sqlserver-snapshot-isolation-level.html)  suggests that snapshot isolation inflicts a massive performance penalty on columnstore, but does not explain the internals.

You may assume that I do not care about the in-memory OLTP version of columnstore.

J. Mini (1225 rep)

Apr 18, 2025, 10:14 PM • Last activity: May 13, 2025, 05:41 PM

3 votes

2 answers

513 views

Separate small column updated a lot from large jsonb columns to optimize performance?

postgresql database-design postgresql-performance mvcc

A table has the following structure: - a: id - b: large jsonb - c: large jsonb - d: timestamp Looking at stats in 2 weeks, see many updates: - 90k requests updates the `d` field alone (total time: 25min measured from app) - 90k requests updates the "large jsonb" fields (total time: 6h30 measured fro...

                                  A table has the following structure:

 - a: id 
 - b: large jsonb 
 - c: large jsonb
 - d: timestamp

Looking at stats in 2 weeks, see many updates:
 - 90k requests updates the d field alone (total time: 25min measured from app)
 - 90k requests updates the "large jsonb" fields (total time: 6h30 measured from app)

Since Postgres follows MVCC pattern which rewrites rows on update, is it interesting to change the table structure to have d in a separate table?
                                

Slim (291 rep)

Sep 29, 2023, 08:40 AM • Last activity: Nov 9, 2024, 01:14 AM

0 votes

1 answers

117 views

Why MySQL repeatable read doesn't raise serialization error

mysql isolation-level mvcc

I'm looking at MySQL MVCC, and it has strange behavior at Repeatable Read isolation: 1. Regular SELECT returns the version from the snapshot 2. SELECT FOR SHARE returns the *latest* version of the record Which is probably the least expected behavior. I'm curious about the underlying reason. Postgres...

                                  I'm looking at MySQL MVCC, and it has strange behavior at Repeatable Read isolation:

1. Regular SELECT returns the version from the snapshot
2. SELECT FOR SHARE returns the *latest* version of the record

Which is probably the least expected behavior. I'm curious about the underlying reason. Postgres has a very simple fix for this - raise an error. So:

* If you never run an operation that require exclusive locks (SELECT FOR SHARE or FOR UPDATE, UPDATE, DELETE), then you always see the snapshot version (like in MySQL). 
* If you do run one of the operations that can't use the older version, then you get an error. And we have to repeat the whole tx.

Is there some technical difficulty that didn't allow MySQL check that there's a new version? I suppose not, so there must be some other reasoning.

Stanislav Bashkyrtsev (587 rep)

Aug 5, 2024, 09:08 AM • Last activity: Aug 5, 2024, 07:07 PM

1 votes

1 answers

418 views

Why readers (SELECT) can be blocked by writers (DML) in SQL Server?

sql-server locking mvcc

Could anyone explain to me why in SQL Server SELECT can be blocked by UPDATE statement? I thought, that MVCC (implemented in SQL Server by copying modified data to the tempdb) prevent readers to be blocked in all cases. Am I wrong? Why do I need to change the ISOLATION LEVEL to SNAPSHOT to release t...

                                  Could anyone explain to me why in SQL Server SELECT can be blocked by UPDATE statement? I thought, that MVCC (implemented in SQL Server by copying modified data to the tempdb) prevent readers to be blocked in all cases. Am I wrong? Why do I need to change the ISOLATION LEVEL to SNAPSHOT to release the lock?

That's how it works in Oracle :

> **A writer never blocks a reader.** When a row is being changed by a writer, the database uses undo data to provide readers with a consistent view of the row.

and PostgreSQL 

> The main advantage of using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so **reading never blocks writing and writing never blocks reading**.

drk

Jan 19, 2024, 10:21 AM • Last activity: Jan 19, 2024, 10:50 AM

1 votes

1 answers

127 views

Impact of Adding a New Column on Table Bloat in PostgreSQL

postgresql mvcc

I'm working with PostgreSQL and have a concern regarding table bloat when adding a new column to an existing table. I understand that PostgreSQL uses Multi-Version Concurrency Control (MVCC) and autovacuum to reclaim space from old rows. Here's the scenario: Let's say I have a table named `example_t...

                                  I'm working with PostgreSQL and have a concern regarding table bloat when adding a new column to an existing table. I understand that PostgreSQL uses Multi-Version Concurrency Control (MVCC) and autovacuum to reclaim space from old rows.

Here's the scenario:

Let's say I have a table named example_table with existing data, and I decide to add a new column, for example, new_column of a larger datatype:

    ALTER TABLE example_table ADD COLUMN new_column VARCHAR(255);

Additionally, I will be updating the new column with new values:

    UPDATE example_table SET new_column = 'some_value';

As far as I understand, after this operation, the old rows are marked as dead, and autovacuum will eventually reclaim the space. However, the old rows were of a smaller size compared to the new rows with the added column.

I have the following questions:

 - How does PostgreSQL (specifically, Autovacuum) handle the space reclaimed from old rows, considering the size difference between old and new rows?
 - Will this process potentially lead to table bloat since the dead tuples cannot be reused due to the size difference?
 - Is it necessary to perform a VACUUM FULL to efficiently reclaim space in such a scenario, or will autovacuum handle it adequately?

pramod (25 rep)

Nov 13, 2023, 10:07 AM • Last activity: Nov 13, 2023, 04:07 PM

0 votes

1 answers

574 views

How does MVCC work under read committed isolation level in Postgres

postgresql isolation-level mvcc

Let's say I have two transactions A and B. Transaction A starts first. then transaction B starts, inserts some data, and commits. So from transaction A (not committed yet), I can see data were committed by transaction B. Since transaction A starts before transaction B, its XID would be smaller than...

                                  Let's say I have two transactions A and B.
Transaction A starts first. then transaction B starts, inserts some data, and commits.
So from transaction A (not committed yet), I can see data were committed by transaction B.
Since transaction A starts before transaction B, its XID would be smaller than the XID of transaction B.
From my understanding, transaction A can only see rows that are modified or inserted by transactions whose XID is smaller than transaction A.

In the above example (read committed isolation level), the XID of the current transaction is 741, why it can see data was committed by the transaction whose XID is 742?

Sorry, I think my example wasn't clear enough. I started two transactions A and B concurrently, I let transaction B update and insert some data, and transaction A can see those data after transaction B commit. So, I wonder how MVCC rules work under read committed isolation level.

Quốc Khánh Bùi (25 rep)

Aug 20, 2023, 02:35 PM • Last activity: Aug 21, 2023, 02:35 PM

0 votes

0 answers

283 views

Unknown column od.id_warehouse in where clause

mysql php mvcc

SELECT LEFT(invoice_date, 10) as fix_date, COUNT(*) as countOrders, SUM((SELECT SUM(od.product_quantity) FROM ps_order_detail od WHERE o.id_order = od.id_order)) as countProducts, SUM(o.total_paid_tax_excl / o.conversion_rate) as totalSales FROM ps_orders o WHERE o.valid = 1 AND od.id_warehouse = 5...

                                      SELECT LEFT(invoice_date, 10) as fix_date, 
           COUNT(*) as countOrders, 
           SUM((SELECT SUM(od.product_quantity) FROM ps_order_detail od WHERE o.id_order = od.id_order)) as countProducts, 
           SUM(o.total_paid_tax_excl / o.conversion_rate) as totalSales 
    FROM ps_orders o 
    WHERE o.valid = 1 
    AND  od.id_warehouse = 5
    AND o.invoice_date BETWEEN '2022-09-01 00:00:00' AND '2022-09-30 23:59:59' 
    AND o.id_shop IN (1) 
    GROUP BY LEFT(invoice_date, 10)



I want to connect between table order and order_detail by left join but when I add left join occur an error in the title what solution? I want to connect between table order and order_detail by left join but when I add left join occur an error in the title what solution?

I want to connect between table order and order_detail by left join but when I add left join occur an error in the title what solution?
                                

Ahmed Ibrahim (1 rep)

Sep 27, 2022, 01:19 PM • Last activity: Sep 28, 2022, 09:57 AM

0 votes

1 answers

564 views

How to configure InnoDB/MariaDB to not fill ibdata1 with (rollback/undo?) data that is useless in an in-memory database?

innodb mariadb in-memory-database ibdata mvcc

I'm trying to run a tiny (~100-200 MB) database in-memory, but ibdata1 grows out of control. Situation: MariaDB is running on a resource constrained HW (Raspberry, 1GB RAM), storing non-critical sensory data (like room temperature). The real file system is an SD card, so frequent writes destroy it....

                                  I'm trying to run a tiny (~100-200 MB) database in-memory, but ibdata1 grows out of control.

Situation: MariaDB is running on a resource constrained HW (Raspberry, 1GB RAM), storing non-critical sensory data (like room temperature). The real file system is an SD card, so frequent writes destroy it. Storing data in-memory would be a good option.

What is not a solution:
- Memory storage engine can't handle TEXT columns
- Aria storage engine can't handle foreign keys
- MyRocks storage engine is not available for 32-bit platforms
- SQLite ///:memory: can't handle the "load", throws uncountable 'cannot commit - no transaction is active' errors

What I've tried:
- using tmpfs for datadir and tmpdir
- tweaking InnoDB to minimal resource usage
- disabling doublewrite buffering, disabling change buffering, using READ-UNCOMMITTED isolation

See [configuration](https://github.com/lmagyar/homeassistant-addon-mariadb-inmemory/blob/master/mariadb/rootfs/etc/my.cnf.d/mariadb-server.cnf) 

But when I purge old data, the DELETE causes ibdata1 to grow nearly to the 100% size of the table I delete 10-15% of the rows from. My **guess** is that Rollbacks Segments and Undo Tablespace is still very active, but I can't "disable" MVCC altogether.

**How to configure InnoDB/MariaDB to not fill ibdata1 with (rollback/undo?) data that is useless in an in-memory database?**

                                

lmagyar (111 rep)

Feb 8, 2021, 12:30 PM • Last activity: Feb 9, 2021, 11:16 AM

15 votes

1 answers

8927 views

What is "special" about PostgreSQL update vs delete+insert

postgresql concurrency mvcc

My understanding is that an update locks a tuple, marks it as deleted, and then adds a new tuple. In other words, update = delete + insert. Or so I had believe. But it appears there something fundamentally different about update from delete+insert in MVCC. --- Setup: ``` CREATE TABLE example (a int...

CREATE TABLE example (a int PRIMARY KEY, b int);
INSERT INTO example VALUES (1, 1);

Method 1: Update

-- session A                          session B
BEGIN;
UPDATE example SET b = 2 WHERE a = 1;
                                      DELETE FROM example WHERE a = 1;
COMMIT;
-- now there are 0 rows in table example (1 row was deleted by session B)

Method 2: Delete and insert

-- session A                          session B
BEGIN;
DELETE FROM example WHERE a = 1;
INSERT INTO example VALUES (1, 2);
                                      DELETE FROM example WHERE a = 1;
COMMIT;
-- now there is 1 row in table example (0 rows deleted by session B)

Thus

UPDATE example SET b = 2 WHERE a = 1;

is different from

DELETE FROM example WHERE a = 1;
INSERT INTO example VALUES (1, 2);

**How am I to understand the MVCC nature of update?** Does the tuple has some sort of MVCC "identity" that is preserved during the update? What is it?

Paul Draper (800 rep)

Oct 22, 2020, 01:41 AM • Last activity: Nov 13, 2020, 06:57 PM

2 votes

1 answers

474 views

Are snapshots generated for each sub query inside a Postgres READ_COMMITTED transaction?

postgresql transaction snapshot mvcc

From what I understand - in a READ_COMMITTED Postgres transaction “the transaction obtains a snapshot whenever an SQL command is executed” [source][1]. Does this mean a snapshot will be obtained for each sub query in a nested query? Does the use of CTEs in place of sub queries affect this behavior?...

                                  From what I understand - in a READ_COMMITTED Postgres transaction “the transaction obtains a snapshot whenever an SQL command is executed” source . Does this mean a snapshot will be obtained for each sub query in a nested query? Does the use of CTEs in place of sub queries affect this behavior?

                                

Jordan (151 rep)

Jul 30, 2020, 04:29 AM • Last activity: Jul 31, 2020, 09:09 PM

3 votes

0 answers

410 views

Is there a reliable way to determine the order in which Postgres transactions were committed without logical decoding?

postgresql transaction change-data-capture data-versioning mvcc

Is there a reliable way to determine the serial order in which Postgres transactions were committed without logical decoding? From what I understand `pg_xact_commit_timestamp` [can’t necessarily be relied upon for this][1] (and I’m not even sure I’ll be able to turn this on in this environment)? I d...

                                  Is there a reliable way to determine the serial order in which Postgres transactions were committed without logical decoding? From what I understand pg_xact_commit_timestamp can’t necessarily be relied upon for this  (and I’m not even sure I’ll be able to turn this on in this environment)? I don’t need to know the exact times transactions were committed - I just care about the order.

I should probably state the problem as perhaps there’s a better way to go about this. I have a series of tables which I am using to generate quotes. For any one of those quotes I need to be able to go back and prove that the correct quote was generated. Therefore for any tables that informed this calculation I also need to be able to reproduce their state at the time the point was generated. As this is derived data it makes the problem a bit harder.

My initial thought was I could generate these quotes using a REPEATABLE_READ READ ONLY transaction and record the txid_current_snapshot() so I know which transactions were still in-flight and therefore which snapshots to rule out from my history (I need to do this because with the current audit logs the timestamps are before the transaction was committed and therefore can not be used reliably to determine order). If there’s no way to reliably determine transaction ordering this solution falls apart.

This is what I’m using for my audit history: https://wiki.postgresql.org/wiki/Audit_trigger_91plus 

Streaming the WAL via logical decoding to some upstream consumer is the eventual goal, but it’s not something I have the time to do at the moment.

Jordan (151 rep)

Jul 30, 2020, 03:41 AM

1 votes

1 answers

204 views

Is it possible to have xid before and after wrap around with the same timestamp?

postgresql mvcc

I want to create sync API in which users can get the changes since their last sync request, the API exposes cursor-based pagination (sync token) A possible cursor may be to use `updatedAt` column along with `xmin` column (I need the `xmin` because many rows may have the same timestamp). My only conc...

I want to create sync API in which users can get the changes since their last sync request, the API exposes cursor-based pagination (sync token) A possible cursor may be to use updatedAt column along with xmin column (I need the xmin because many rows may have the same timestamp). My only concern is if it's possible to have rows with same (updatedAt) timestamp and wrapped around xmin:

+---------------------+------------+
| timestamp           | xmin       |
+---------------------+------------+
| 2011-01-01 00:00:00 | 4556455456 | -> first row
+---------------------+------------+
| 2011-01-01 00:00:00 | 6          | -> second row, wrapped around xid
+---------------------+------------+

Within a specific timestamp (moment), I can be sure xmin will behave like a simple sequence, right?

Michael (133 rep)

Jun 19, 2020, 02:03 PM • Last activity: Jun 20, 2020, 02:03 AM

-1 votes

2 answers

1291 views

Postgres 40001 Exception Conditional Insert

postgresql concurrency postgresql-9.6 mvcc

I have 3 tables, `A`,`B` and `C`. A function `actOnABC(i)` which is a read only operation on the tables: IF actOnABC(i) = 0 THEN RAISE EXCEPTION 'Message'; ELSE INSERT INTO TABLE B VALUES(...); END IF; I am running this with Spring JDBC, Transaction Isolation level set to SERIALIZABLE. As the load/n...

                                  I have 3 tables, A,B and C. A function actOnABC(i) which is a read only operation on the tables:

    IF actOnABC(i) = 0 THEN
        RAISE EXCEPTION 'Message';
    ELSE
        INSERT INTO TABLE B VALUES(...);
    END IF;

I am running this with Spring JDBC, Transaction Isolation level set to SERIALIZABLE. 

As the load/number of concurrent transactions increase, I get the Exception with message:

> ERROR:  could not serialize access due to read/write dependencies
> among transactions.

Lets say, I have two concurrent Transactions T1 and T2. T1 and T2 both are attempting to insert. If say, T1 is successful in making the insert first into table B, then potentially the result of actOnABC(i) which was already evaluated by T2 could change. Hence T2 will throw an exception.

Have I understood the situation correct? Also, is there any way I can combine the actOnABC(i) call with the INSERT statement so as to avoid this race condition? I did think of using transaction level locks. But I am afraid that won't work as I have other stored procedures actively updating the Tables A, B and C.

Is there a smarter way to deal with the situation?

EDIT: Some more information on the stored procedure actOnABC(). This procedure returns an integer which is calculated as below.

    SELECT count(*) into tmp_a from (... access tables..);
    SELECT count(*) into tmp_b from (... access tables...);
    return tmp_a - tmp_b;
                                

cplusplusrat (99 rep)

Mar 18, 2017, 12:22 AM • Last activity: Oct 20, 2019, 01:02 PM

0 votes

0 answers

59 views

Benchmarking MariaDB/InnoDB using TATP - Writers slow down readers?

mysql innodb mariadb benchmark mvcc

I'm benchmarking MariaDB 10.4 with TATP. The InnoDB config file has the following parameters set: [mysqld] innodb_data_home_dir=/usr/local/mysql/data innodb_undo_directory=/usr/local/mysql/data innodb_undo_log_truncate=1 innodb_undo_tablespaces=3 innodb_flush_method=O_DIRECT innodb_page_size=65536 W...

                                  I'm benchmarking MariaDB 10.4 with TATP. The InnoDB config file has the following parameters set:

    [mysqld]  
    innodb_data_home_dir=/usr/local/mysql/data  
    innodb_undo_directory=/usr/local/mysql/data  
    innodb_undo_log_truncate=1  
    innodb_undo_tablespaces=3  
    innodb_flush_method=O_DIRECT  
    innodb_page_size=65536  

When I run just the read-mix of TATP (i.e, the read queries alone), I get a result as follows:

    I 2019-10-06 16:06:06 STATISTICS 90% response time (us), by transaction  
    I 2019-10-06 16:06:06 STATISTICS GET_SUBSCRIBER_DATA: 744  
    I 2019-10-06 16:06:06 STATISTICS GET_NEW_DESTINATION: 1323  
    I 2019-10-06 16:06:06 STATISTICS GET_ACCESS_DATA: 738  
    I 2019-10-06 16:06:06 CONTROL MQTh for test run is 6772  

Focusing on the first query, GET_SUBSCRIBER_DATA, it has a 90% response time of 744us.

However, when I run the R90/W10 mix (i.e, 90% read requests, 10% write requests), the result is as follows:

    I 2019-10-07 01:23:27 STATISTICS 90% response time (us), by transaction  
    I 2019-10-07 01:23:27 STATISTICS GET_SUBSCRIBER_DATA: 3771  
    I 2019-10-07 01:23:27 STATISTICS GET_NEW_DESTINATION: 4712  
    I 2019-10-07 01:23:27 STATISTICS GET_ACCESS_DATA: 3687  
    I 2019-10-07 01:23:27 STATISTICS UPDATE_SUBSCRIBER_DATA: 18801  
    I 2019-10-07 01:23:27 STATISTICS UPDATE_LOCATION: 18831  
    I 2019-10-07 01:23:27 STATISTICS INSERT_CALL_FORWARDING: 20017  
    I 2019-10-07 01:23:27 STATISTICS DELETE_CALL_FORWARDING: 19027  
    I 2019-10-07 01:23:27 CONTROL MQTh for test run is 1464  

The response time of GET_SUBSCRIBER_DATA is much slower.

Queries are generated probabilistically, and keys are chosen uniformly at random. However, what I don't understand is, how are the writers slowing down the readers by that much, especially when MariaDB/InnoDB has **row-level locking**, to avoid precisely this issue of writers blocking readers. Are there any configuration parameters that need to be tweaked?


                                

Aarati K (31 rep)

Oct 7, 2019, 05:04 PM • Last activity: Oct 7, 2019, 05:27 PM

0 votes

2 answers

3250 views

How do writers not block reads in PostgreSql

postgresql transaction snapshot-isolation acid mvcc

Since readers do not block writers and writers do not block readers is what MVCC is all about, how does postgresql prevent writers who are in the process of committing their changes to blocks/pages prevent a reader from reading this inconsistent data? So from what I understand about PostgreSQL's MVC...

                                  Since readers do not block writers and writers do not block readers is what MVCC is all about, how does postgresql prevent writers who are in the process of committing their changes to blocks/pages prevent a reader from reading this inconsistent data?

So from what I understand about PostgreSQL's MVCC, each table is represented as a heap file and each heap file contains a collection of pages / blocks which are 8 kb segments of information.

So say two transaction T1 and T2 concurrently running and T1 decides to make some modification to block1. So at first, T1 reads block1 and makes modifications to it in-memory and when it's finally done, it decides to commit; ie actually write back these in-memory changes to block1. So I understand that xmax of the previous entries it modified are set to id(T1) and a new entry is created with xmin set to id(T1). Now what I find difficult is, say while the actual commit process occurs for T1, say T2 reads (while the write by T1 has started but not ended). How is this case handled?

Any answer would be appreciated

paiakshay998 (31 rep)

Jul 24, 2019, 02:08 AM • Last activity: Jul 24, 2019, 07:37 PM

2 votes

1 answers

1478 views

InnoDB undo logs vs history list

mysql isolation-level snapshot-isolation mvcc

I was reading about InnoDB undo logs, and in the process, the articles i was reading ended up confusing me on what undo logs are, and how they are different from the history list, or rather the same. So quoting article one by the MariaDB team, they defined history list as: > There is also a global h...

                                  I was reading about InnoDB undo logs, and in the process, the articles i was reading ended up confusing me on what undo logs are, and how they are different from the history list, or rather the same. 

So quoting article one by the MariaDB team, they defined history list as:

> There is also a global history list of the data. When a transaction is
> committed, its history is added to this history list. The order of the
> list is the chronological order of the commits.

Then in the same article, they went on to say this, which now brought even more confusion what the relationship is between undo logs, history list, and views of rows in the apparent undo logs:

> Understanding how the undo log works helps with understanding the
> negative effects long transactions.

The second article  by Percona describes the history list as:

> History list length 6 is number of unpurged transactions in undo
> space. It is increased as transactions which have done updates are
> commited and decreased as purge runs.

And lastly, the third article  tried to address the confusion by many of what the history list is, but in my opinion, just brought more confusion in distinguishing the undo logs from the history list, and how they both relate to views of rows that innodb transactions us to enforce consistency by MVCC support.Here is how the third article defined the history list:

> The best name for the unit (history list) is undo logs, which are “update undo logs
> for committed transactions”, to quote the source. But an “undo log” is
> a special term with specific meaning in InnoDB. An undo log is a
> single set of atomic changes a transaction makes, which may actually
> modify multiple records.

The first article  added this to even confuse me more:

> Understanding how the undo log works helps with understanding the
> negative effects long transactions.
> 
>  - Long transactions generate several old versions of the rows in the undo log. Those rows will probably be needed for a longer time,
> because other long transactions will need them. Since those
> transactions will generate more modified rows, a sort of combinatory
> explosion can be observed. Thus, the undo log requires more space.
> 
>  - Transaction may need to read very old versions of the rows in the history list, thus their performance will degrade.

Does the InnoDB undo log contain two types of data, one being views of rows for transaction to use stored as linked-lists, and the other being the history list, which keeps track of committed transaction in the buffer pool (not yet flushed to disk)? Can someone please help make a clear distinction between the undo logs, and how they relate to views of rows (created for running transaction to use) and the history list?

     https://mariadb.com/kb/en/library/xtradbinnodb-undo-log/ 
     https://www.percona.com/blog/2006/07/17/show-innodb-status-walk-through/ 
     https://www.vividcortex.com/blog/2015/07/20/what-is-innodb-history-list-length/ 
                                

giogio (23 rep)

Apr 11, 2019, 08:37 AM • Last activity: Apr 12, 2019, 01:25 AM

0 votes

1 answers

1351 views

Does Amazon Redshift support MVCC?

redshift mvcc

According to [documentation][1], Amazon Redshift is derived from PostgreSQL 8. PosgreSQL 8 [supports Multiversion Concurrency Control (MVCC)][2] which is in my understanding achieved on physical level by appending new row versions rather than updating in place or marking rows for deleting rather tha...

                                  According to documentation , Amazon Redshift is derived from PostgreSQL 8.

PosgreSQL 8 supports Multiversion Concurrency Control (MVCC)  which is in my understanding achieved on physical level by appending new row versions rather than updating in place or marking rows for deleting rather than deleting rows in place. Advantages of MVCC  are:
> MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading. PostgreSQL maintains this guarantee even when providing the strictest level of transaction isolation through the use of an innovative Serializable Snapshot Isolation (SSI) level.

Redshift documentation states  that
> the specialized data storage schema and query execution engine that Amazon Redshift uses are completely different from the PostgreSQL implementation

Also, unlike Postgres which has row-based storage, Redshift is column-based store .

At the same time, Redshift has vacuuming  which makes it similar to PostgreSQL. From Redshift documentation:
> Amazon Redshift does not automatically reclaim and reuse space that is freed when you delete rows and update rows. To perform an update, Amazon Redshift deletes the original row and appends the updated row, so every update is effectively a delete followed by an insert. When you perform a delete, the rows are marked for deletion, but not removed. The query processor needs to scan the deleted rows as well as undeleted rows, so too many deleted rows can cost unnecessary processing. You should vacuum following a significant number of deletes or updates to reclaim space and improve query performance.

In essence, this is similar to underlying MVCC mechanisms in Postgres. However, I wasn't able to find confirmation that Redshift supports MVCC. What I am looking for is confirmation that parallel read and write transactions on Redshift do not interfere because they operate on independent versions of rows, therefore the question:

Does Amazon Redshift support MVCC?

Mike (111 rep)

Apr 5, 2017, 04:50 PM • Last activity: Oct 22, 2018, 07:00 AM

3 votes

1 answers

209 views

Flags UPDATE optimization for MVCC

postgresql performance postgresql-9.6 mvcc postgresql-performance

We have a table with 200k rows where we change some flags multiple times per day. If i have understood correctly an UPDATE on postgres is a DELETE+INSERT on disk. I'm wondering if this is highly inefficient for our use case. What if each tuple has a big size? Are the entire tuples written again? I w...

                                  We have a table with 200k rows where we change some flags multiple times per day. 
If i have understood correctly an UPDATE on postgres is a DELETE+INSERT on disk.

I'm wondering if this is highly inefficient for our use case. What if each tuple has a big size? Are the entire tuples written again?

I was thinking to move those status flags on a separate table so that we rewrite only small tuples reducing useless i/o. Is this a correct approach or i'm on the wrong track?

Luca Looz (101 rep)

Jul 20, 2017, 06:03 AM • Last activity: Oct 13, 2017, 11:01 AM

Showing page 1 of 20 total questions