Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
1 answers
224 views
How repeatable read isolation level and others are implemented in distributed/replicated databases?
I'm studying distributed systems/DBs and I'm struggling understanding isolation levels when we talk about distributed systems. Avoiding problems like dirty read, non-repeatable read, phantom reads, write-skew, etc. when we have a single DB is pretty straightforward with the introduction of optimisti...
I'm studying distributed systems/DBs and I'm struggling understanding isolation levels when we talk about distributed systems. Avoiding problems like dirty read, non-repeatable read, phantom reads, write-skew, etc. when we have a single DB is pretty straightforward with the introduction of optimistic / pessimistic concurrency control algorithms. Nevertheless, I'm really not understanding how the same problems are avoided when we deal with distributed systems. ## Example Simple DB cluster with 3 nodes in strong consistency setup Let's say that we have three total nodes (*N = 3*) for our DB and we want strong consistency for some reason (*R = 2* and *W = 2*, so *R + W > N*). Let' say now that we have two transactions: T1, T2. - T1:
SELECT * FROM X WHERE X.field = 'something'

   ... DO WORK ...

   SELECT * FROM X WHERE X.field = 'something'
- T2:
INSERT INTO X VALUES(..,..,..)   -- impact T1 search criteria
T2 will commit while T1 is in "DO WORK" phase, so we will have a *phantom read* problem. ## Question How is this situation handled in the illustrated system above? Do systems like this use 2PC-like algorithm and rely on the fact that one transaction will fail in one node due to the R+W>N constraint? If yes, is it a used solution? I would say that this is complex (when we have to rollback the committed transaction in Node_X) and it is also slow probably. Do you have any useful material that I can check to continue studying this topic? I really cannot find much about this, there is very few material that discusses isolation level in distributed systems. Feel free to correct the above if I made a mistake. Thank you.
Dev (1 rep)
Aug 14, 2022, 04:20 PM • Last activity: Jun 11, 2025, 07:07 PM
1 votes
3 answers
1638 views
Replacing MYSQL table with data from another table - no downtime
I have two tables with identical structures: Table1 and Table2 id INT PRIMARY KEY, a TINYINT, b TINYINT, c TINYINT Table1 is actively being used by queries from other sessions; Table2 is not referenced/accessed by anything else. My goal is to end up with just Table1, but having the data from Table2....
I have two tables with identical structures: Table1 and Table2 id INT PRIMARY KEY, a TINYINT, b TINYINT, c TINYINT Table1 is actively being used by queries from other sessions; Table2 is not referenced/accessed by anything else. My goal is to end up with just Table1, but having the data from Table2. (Note that Table2 and Table1 do not have the same set of id-PRIMARY KEY.) What is the best way to do this without risking Table1 not existing momentarily for other sessions? I know I can do something like: DROP TABLE Table1; ALTER TABLE Table2 RENAME Table1; But am not sure how to ensure nothing tries to query Table1 while it doesn't exist or is empty. Do I use transactions or lock the tables or something else? (Ideally, I'd want any other sessions' queries to return either old Table1 or new Table1 data rather than failing.) I am using CodeIgniter in PHP to make the calls. **UPDATE:** Looks like this post answers it using a method from the MySQL RENAME TABLE Documentation: RENAME TABLE Table1 TO Table1_old, Table2 To Table1; And then I can DROP Table1_old at my leisure
Phil W. (11 rep)
Apr 4, 2017, 06:53 PM • Last activity: Feb 7, 2025, 07:06 AM
7 votes
4 answers
3032 views
Solving supertype-subtype relationship without sacrificing data consistency in a relational database
Trying to get better at designing databases, I noticed I'm always stuck trying to solve variations of the exact same problem. Here is an example using common requirements: 1. An online store sells different categories of `product`. 2. The system must be able to retrieve the list of all product categ...
Trying to get better at designing databases, I noticed I'm always stuck trying to solve variations of the exact same problem. Here is an example using common requirements: 1. An online store sells different categories of product. 2. The system must be able to retrieve the list of all product categories, say food and furniture. 3. A customer may order any product and retrieve his order history. 4. System must store specific properties depending on the product category ; say the expiration_date and calories for any food product and manufacture_date for any furniture product. If it wasn't for requirement 4, the model could be quite straightforward: enter image description here Problem is trying to solve requirement 4. I thought of something like this: enter image description here In this approach, the relationships product-furniture and product-food are *supertype-subtype* (or superclass-subclass) associations; the primary key of the subtype is also a foreign key to the supertype primary key. However, this approach can not guarantee the category referenced via a foreign key to the product will be consistent with its actual subtype. For instance, nothing stops me from setting food category to a product tuple having a subtype row in the Furniture table. I read various articles about inheritance in modelling relational databases, especially this one and this one which were very helpful but didn't solve my problem for the reason mentioned above. But whatever model I come with, I'm never satisfied with the data consistency. How can I solve requirement 4 without sacrificing data consistency ? Am I going all wrong here ? If so, what would be the best way to solve this problem based on these requirements ?
user174739 (71 rep)
Mar 15, 2019, 02:41 PM • Last activity: Aug 3, 2024, 03:49 PM
0 votes
0 answers
72 views
Backup database in SQL Server 2012 and restore this backup in SQL Server 2014
First what I do is to check consistency of database in SQL Server 2012: CHECKDB (DATABASE1) And it is ok, no errors and warnings. Next I do FULL backup and restore it on another machine with has SQL Server 2014 using SQL Server Management Studio with WITH REPLACE option. However after I run CHECKDB...
First what I do is to check consistency of database in SQL Server 2012: CHECKDB (DATABASE1) And it is ok, no errors and warnings. Next I do FULL backup and restore it on another machine with has SQL Server 2014 using SQL Server Management Studio with WITH REPLACE option. However after I run CHECKDB (DATABASE1) I receive consistency errors messages: Msg 8939, Level 16, State 98, Line 1 Table error: Object ID 293576084, index ID 67, partition ID 72057594074628096, alloc unit ID 72057594126598144 (type In-row data), page (1:5589561). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 133129 and -4. Msg 8928, Level 16, State 1, Line 1 Object ID 293576084, index ID 67, partition ID 72057594074628096, alloc unit ID 72057594126598144 (type In-row data): Page (1:5589561) could not be processed. See other errors for details. Msg 8976, Level 16, State 1, Line 1 Table error: Object ID 293576084, index ID 67, partition ID 72057594074628096, alloc unit ID 72057594126598144 (type In-row data). Page (1:5589561) was not seen in the scan although its parent (1:5580238) and previous (1:5589560) refer to it. Check any previous errors. Msg 8978, Level 16, State 1, Line 1 Table error: Object ID 293576084, index ID 67, partition ID 72057594074628096, alloc unit ID 72057594126598144 (type In-row data). Page (1:5589562) is missing a reference from previous page (1:5589561). Possible chain linkage problem. Msg 8939, Level 16, State 98, Line 1 Table error: Object ID 293576084, index ID 1, partition ID 72057594073645056, alloc unit ID 72057594125615104 (type In-row data), page (1:21142097). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 133129 and -4. Msg 8928, Level 16, State 1, Line 1 Object ID 293576084, index ID 1, partition ID 72057594073645056, alloc unit ID 72057594125615104 (type In-row data): Page (1:21142097) could not be processed. See other errors for details. Msg 8976, Level 16, State 1, Line 1 Table error: Object ID 293576084, index ID 1, partition ID 72057594073645056, alloc unit ID 72057594125615104 (type In-row data). Page (1:21142097) was not seen in the scan although its parent (1:21112030) and previous (1:21142096) refer to it. Check any previous errors. Msg 8978, Level 16, State 1, Line 1 Table error: Object ID 293576084, index ID 1, partition ID 72057594073645056, alloc unit ID 72057594125615104 (type In-row data). Page (1:21142098) is missing a reference from previous page (1:21142097). Possible chain linkage problem. CHECKDB found 0 allocation errors and 8 consistency errors in table 'TABLE1' (object ID 293576084). CHECKDB found 0 allocation errors and 8 consistency errors in database 'DATABASE1'. repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (DATABASE1). Am I doing something wrong with backup restoration?
piotrassss (101 rep)
Apr 3, 2024, 08:31 AM • Last activity: Apr 3, 2024, 09:34 AM
3 votes
1 answers
248 views
What happens to mysql synchronous replication when read replica can't respond?
In the official doc https://dev.mysql.com/doc/mysql-replication-excerpt/8.0/en/replication.html It mentioned that if synchronous replication is required, use NDB instead >For scenarios where synchronous replication is required, use NDB Cluster (see MySQL NDB Cluster 7.5 and NDB Cluster 7.6). **Quest...
In the official doc https://dev.mysql.com/doc/mysql-replication-excerpt/8.0/en/replication.html It mentioned that if synchronous replication is required, use NDB instead >For scenarios where synchronous replication is required, use NDB Cluster (see MySQL NDB Cluster 7.5 and NDB Cluster 7.6). **Question 1.** Does mysql not support synchronous replication, only mysql cluster support synchronous replication? **Question 2.** Where is the doc for synchronous replciation for NDB I see multiple post mentioning that NDB replication is synchronous by default https://stackoverflow.com/questions/53149674/can-i-implement-synchronous-and-asynchronous-replication-with-the-mysql-cluster But the official doc only mention asynchronous and semisynchronous replication https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-replication.html > NDB Cluster supports asynchronous replication, more usually referred to simply as “replication”. This section explains how to set up and manage a configuration in which one group of computers operating as an NDB Cluster replicates to a second computer or group of computers. We assume some familiarity on the part of the reader with standard MySQL replication as discussed elsewhere in this Manual. (See Chapter 19, Replication). **Question 3:** If mysql or NDB supports synchronous replication, do they use 2PC? What happens during network partition, or replica nodes are not available? Does NDB sacrifice availability over consistency? Does NDB do leadership election? This post say it does https://dev.mysql.com/blog-archive/2-phase-commit-in-ndbcluster/ But I can't find documentation on the behavior of the trade off on availability vs consistency during network partition or replica failure?
olaf (143 rep)
Feb 17, 2024, 07:05 PM • Last activity: Feb 17, 2024, 10:55 PM
0 votes
1 answers
230 views
How to ensure data consistency in a distributed system
I'm developing a distributed system with microservices with only one write-only database and the rest read-only in logical replication My challenge is to have consistency in the data, especially in the balance that the client has in the account Since my business model necessarily requires microservi...
I'm developing a distributed system with microservices with only one write-only database and the rest read-only in logical replication My challenge is to have consistency in the data, especially in the balance that the client has in the account Since my business model necessarily requires microservices to carry out updates to the database tables, and depending on the situation, a balance is added to the customer's account. My question is, how to guarantee consistency of the customer's balance if he carries out some process that reduces his account balance at the same time that some microservice is also manipulating his account balance I thought about creating a transaction in the database and immediately using this transaction to change its balance while the system finishes processing updates in other tables that are part of the process, as this blocks the reading of that customer's balance while the transaction is not completed. finished, is this the most correct way?
Miqueias Kevison (750 rep)
Jan 2, 2024, 08:24 PM • Last activity: Feb 9, 2024, 11:52 AM
2 votes
1 answers
1473 views
Non-Repeatable Read vs Phantom Read - Real World Scenario
About the mentioned 2 Data concurrency problems below is what I understand. 1. **Non-Repeatable Read** - is where within same transaction running the same query twice returns 2 different values for the same record, cause another transaction has updated the same record before the first transaction ru...
About the mentioned 2 Data concurrency problems below is what I understand. 1. **Non-Repeatable Read** - is where within same transaction running the same query twice returns 2 different values for the same record, cause another transaction has updated the same record before the first transaction running the same query 2nd time. 2. **Phantom Read** - is also similar. In this case instead about the same record, this is about the result set. First transaction running the same query first time had 10 rows, but when it run the same query for the second time, now it has 11 rows, before the 2nd run another transaction has made a change (Insert/Delete) which affected that query output. And this is what I can't wrap my head around. - Why exactly **a transaction has to run the same query twice** in a **real-world** scenario? If anyone can give a real-world example from their experience, where it has to run the same query again twice in a same transaction, which could results in above scenario, would be helpful. I'm trying to know why this concurrency problem is important based on a real-world example. Does the below script on Placing order is a real-world kind of script acceptable on where we have to run the same SELECT twice?.
BEGIN TRANSACTION

IF NOT EXISTS(SELECT 1 FROM products WHERE product_id = @product_id AND available_quantity> @quantity)
BEGIN
	ROLLBACK TRANSACTION; 
    RAISERROR('Quantity is not available.');
    RETURN;
END

-- Place order
INSERT INTO orders (customer_id, product_id, quantity, order_date)
VALUES (@customer_id, @product_id, @quantity, GETDATE());

-- Update product quantity 
IF NOT EXISTS(SELECT 1 FROM products WHERE product_id = @product_id AND available_quantity> @quantity)
BEGIN
    ROLLBACK TRANSACTION; 
    RAISERROR('Quantity is not available.');
    RETURN;
END

UPDATE products SET available_quantity = available_quantity - @quantity WHERE product_id = @product_id;
COMMIT;
RETURN;
Dhanuka Jayasinghe (185 rep)
Feb 15, 2023, 04:24 PM • Last activity: Feb 16, 2023, 12:21 AM
3 votes
1 answers
723 views
Table Consistency Errors
We have encountered database corruption / consistancy issue in one of our database tables, which I'd like to run past the community. **Disclaimer**: This is a MSSQL database we (the DBA team) have inherited. There are no non-corrupted backups of the database. The corruption has been there for an unk...
We have encountered database corruption / consistancy issue in one of our database tables, which I'd like to run past the community. **Disclaimer**: This is a MSSQL database we (the DBA team) have inherited. There are no non-corrupted backups of the database. The corruption has been there for an unknown amount of time (no checks were being performed prior to us inheriting it). We've restored a copy of the database to a safe location, while we work on this issue. I started off by running this statement against the database table; DBCC CHECKTABLE ('dbo.TableA') WITH NO_INFOMSGS, ALL_ERRORMSGS, TABLOCK; Here is the result set; > Msg 8965, Level 16, State 1, Line 1 Table error: Object ID 1436584206, > index ID 1, partition ID 72057637929943040, alloc unit ID > 72057603756130304 (type LOB data). The off-row data node at page > (1:197568), slot 0, text ID 14217500622848 is referenced by page > (1:191179), slot 0, but was not seen in the scan. > > Msg 8965, Level 16, State 1, Line 1 Table error: Object ID 1436584206, > index ID 1, partition ID 72057637929943040, alloc unit ID > 72057603756130304 (type LOB data). The off-row data node at page > (1:197569), slot 0, text ID 14217500622848 is referenced by page > (1:191179), slot 0, but was not seen in the scan. > > Msg 8965, Level 16, State 1, Line 1 Table error: Object ID 1436584206, > index ID 1, partition ID 72057637929943040, alloc unit ID > 72057603756130304 (type LOB data). The off-row data node at page > (1:197570), slot 0, text ID 14217500622848 is referenced by page > (1:191179), slot 0, but was not seen in the scan. > > Msg 8965, Level 16, State 1, Line 1 Table error: Object ID 1436584206, > index ID 1, partition ID 72057637929943040, alloc unit ID > 72057603756130304 (type LOB data). The off-row data node at page > (1:197571), slot 0, text ID 14217500622848 is referenced by page > (1:191179), slot 0, but was not seen in the scan. > > Msg 8965, Level 16, State 1, Line 1 Table error: Object ID 1436584206, > index ID 1, partition ID 72057637929943040, alloc unit ID > 72057603756130304 (type LOB data). The off-row data node at page > (1:197572), slot 0, text ID 14217500622848 is referenced by page > (1:191179), slot 0, but was not seen in the scan. > > Msg 8965, Level 16, State 1, Line 1 Table error: Object ID 1436584206, > index ID 1, partition ID 72057637929943040, alloc unit ID > 72057603756130304 (type LOB data). The off-row data node at page > (1:197573), slot 0, text ID 14217500622848 is referenced by page > (1:191179), slot 0, but was not seen in the scan. > > Msg 8965, Level 16, State 1, Line 1 Table error: Object ID 1436584206, > index ID 1, partition ID 72057637929943040, alloc unit ID > 72057603756130304 (type LOB data). The off-row data node at page > (1:197574), slot 0, text ID 14217500622848 is referenced by page > (1:191179), slot 0, but was not seen in the scan. > > Msg 8965, Level 16, State 1, Line 1 Table error: Object ID 1436584206, > index ID 1, partition ID 72057637929943040, alloc unit ID > 72057603756130304 (type LOB data). The off-row data node at page > (1:197575), slot 0, text ID 14217500622848 is referenced by page > (1:191179), slot 0, but was not seen in the scan. > > Msg 8929, Level 16, State 1, Line 1 Object ID 1436584206, index ID 1, > partition ID 72057637929943040, alloc unit ID 72057638792658944 (type > In-row data): Errors found in off-row data with ID 14217500622848 > owned by data record identified by RID = (1:319428:15) > > CHECKTABLE found 0 allocation errors and 9 consistency errors in table > 'TableA' (object ID 1436584206). repair_allow_data_loss is the minimum > repair level for the errors found by DBCC CHECKTABLE (DB1.dbo.TableA). Armed with this information, I then proceeded to use the DBCC PAGE command to check each of the pages that were mentioned in the above error output. DBCC TRACEON(3604) GO DBCC PAGE ('DB1', 1, 197568, 1); GO And here was the result; enter image description here As you can see from the page information above, there is no data in these pages anymore, yet they still remain. This prompts me to ask two questions.. 1) Has anyone else experienced this issue before? 2) I understand that using the repair_allow_data_loss option will drop the pages mentioned above, but because there is no data within the pages, no data should actually be lost from the table. Is that correct thinking?
Tom (221 rep)
Sep 21, 2022, 01:57 PM • Last activity: Sep 22, 2022, 05:11 PM
10 votes
2 answers
2306 views
What is the difference between integrity and consistency?
Whether I read about CAP or ACID, I see that consistency is referred to ensure the DB integrity constraints. So, I do not understand, why two terms are used to refer the same thing or there is a difference between the integrity and consistency? [I read][1] that > Anyhow, Atomic, Consistent, Isolated...
Whether I read about CAP or ACID, I see that consistency is referred to ensure the DB integrity constraints. So, I do not understand, why two terms are used to refer the same thing or there is a difference between the integrity and consistency? I read that > Anyhow, Atomic, Consistent, Isolated, Durable are properties of the transactions. It is true that atomicity + isolation is enough for you to roll-your-own consistency. But we can also roll our own atomicity, roll our own isolation, and roll our own persistence (durability). When we roll our own, we must pay for features with our own blood, sweat, and parentheses. We don't say the properties were given to us by a transaction system. which suggests that consistency is what user application can provide himself on top of database integrity constraints. This is not property that is provided by database, as AID properties are. Why to give C the title as you do to the other, system-provided AID properties?
Little Alien (299 rep)
Dec 2, 2016, 08:57 PM • Last activity: Sep 15, 2021, 03:31 PM
2 votes
1 answers
2733 views
When to use BEGIN DISTRIBUTED TRANSACTION instead of BEGIN TRANSACTION in SQL Server?
For this question lets assume there are two servers, **ServerA** and **ServerB**, and there is a linked server setup on **ServerA** to **ServerB**. Let's also assume I have [`REMOTE_PROC_TRANSACTIONS`](https://learn.microsoft.com/en-us/sql/t-sql/statements/set-remote-proc-transactions-transact-sql?v...
For this question lets assume there are two servers, **ServerA** and **ServerB**, and there is a linked server setup on **ServerA** to **ServerB**. Let's also assume I have [REMOTE_PROC_TRANSACTIONS](https://learn.microsoft.com/en-us/sql/t-sql/statements/set-remote-proc-transactions-transact-sql?view=sql-server-ver15) set to ON for both servers. According to Microsoft's BOL for [BEGIN TRANSACTION](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15) : > The local transaction started by the BEGIN TRANSACTION statement is **escalated to a distributed transaction** if the following actions are performed before the statement is committed or rolled back: > - An INSERT, DELETE, or UPDATE statement that references a remote table on a linked server is executed. The INSERT, UPDATE, or DELETE > statement fails if the OLE DB provider used to access the linked > server does not support the ITransactionJoin interface. > > - A call is made to a remote stored procedure when the REMOTE_PROC_TRANSACTIONS option is set to ON. Given the above, are there any cases where it makes sense or is mandatory to explicitly use BEGIN DISTRIBUTED TRANSACTION? Specifically, I'm planning to call a stored procedure (that executes DML queries) on **ServerB** via the linked server from **ServerA** with the same **transaction** wrapped around some DML statements executed on **ServerA** as well. My goal is to achieve transactional consistency for both sets of DML queries happening between the two servers.
J.D. (40893 rep)
Jun 10, 2021, 01:14 PM • Last activity: Jun 10, 2021, 02:03 PM
0 votes
1 answers
221 views
Choosing between NoSQL and Relational databases while having both flexibility and consistency requirements
I've never worked with databases professionally and only toyed with database coding using a small sqlite db for my own music collection. Now I'm asked to build an internal database system at work for multiple users. After basic reading of NoSQL vs. relational databases, I'd like to have some ideas o...
I've never worked with databases professionally and only toyed with database coding using a small sqlite db for my own music collection. Now I'm asked to build an internal database system at work for multiple users. After basic reading of NoSQL vs. relational databases, I'd like to have some ideas on how to choose the right technology. ## Context - Before this database, the users work with spreadsheets entirely. - Those sheets differ in structure (schema) from project to project. - Sometimes the same project has several different sheets that need to be isolated but still exist in the same database. - We want to import all those sheets into a database and with new fields added and old fields names/values updated on the fly without breaking the original spreadsheets - When source control ops happen, we want to be able to connect to the database and update certain fields regarding the source control management status - The database will be used by various users from different locations, all through the in-house network for now. There won't be a lot of users like on Facebook or Twitter. - All the data must be localized into multiple languages and ideally co-exist in the same database. ## Questions - Since we'll have different sheet structures, is NoSQL a better option than a relational database? - The users will go on using spreadsheets and may change their sheet structure on the fly. Will this be a lot of trouble for NoSQL or relational databases such as SQLite? - NoSQL does not support the ACID of relational databases, does it mean that data may get corrupted when multiple users work on the same record? We normally would like consistency. - If we maintain an optimized database but still want to be able to export selected portions or entire as CSV or JSON files? Will NoSQL or relational work better? - How to handle access control from scratch if we do it online? Will this affect the choice of database tech? - For this type of system, should we hire a database expert dev and/or maintainer? Can we build on some free/open-source infrastructure that requires the least amount of backend/frontend, security tech R&D? Thanks for your input in advance!
kakyo (111 rep)
May 26, 2021, 10:05 AM • Last activity: May 27, 2021, 02:09 AM
5 votes
4 answers
46180 views
DBCC CHECKDB consistency-based I/O error
Running: DBCC CHECKDB(DatabaseName) with NO_INFOMSGS gives me the following error: > Msg 824, Level 24, State 2, Line 1 > SQL Server detected a logical > consistency-based I/O error: incorrect pageid (expected 1:7753115; > actual 0:0). It occurred during a read of page (1:7753115) in database > ID 1...
Running: DBCC CHECKDB(DatabaseName) with NO_INFOMSGS gives me the following error: > Msg 824, Level 24, State 2, Line 1 > SQL Server detected a logical > consistency-based I/O error: incorrect pageid (expected 1:7753115; > actual 0:0). It occurred during a read of page (1:7753115) in database > ID 11 at offset 0x00000ec9b36000 in file 'K:\UAT > Databases\dbname.MDF'. Additional messages in the SQL Server error > log or system event log may provide more detail. This is a severe > error condition that threatens database integrity and must be > corrected immediately. Complete a full database consistency check > (DBCC CHECKDB). This error can be caused by many factors; for more > information, see SQL Server Books Online.' Also i found a entry in **dbo.suspect_pages** Please advise.
user3688698 (69 rep)
May 24, 2016, 02:33 PM • Last activity: May 13, 2021, 10:20 AM
4 votes
3 answers
624 views
Inconsistency in Relational Databases
I generally get confused when the term consistency is used. NoSQL tutorials always refer to the reads whereas Relational tutorials refer to a consistent state (ensuring referential integrity constraint) When the data is distributed across multiple servers (ex 1 master- n slaves configuration) 1. Do...
I generally get confused when the term consistency is used. NoSQL tutorials always refer to the reads whereas Relational tutorials refer to a consistent state (ensuring referential integrity constraint) When the data is distributed across multiple servers (ex 1 master- n slaves configuration) 1. Do relational databases ensure consistency in reads? I mean a committed write is immediately available to be read by other transactions? I suspect not given network is involved, making Relational DBs as eventual consistent DBs. 2. Do relational databases ensure referential integrity constraint well?
vvs14 (161 rep)
Apr 30, 2021, 12:04 PM • Last activity: May 1, 2021, 01:18 PM
1 votes
0 answers
102 views
The overhead of OCC validation phase
The validation phase of optimistic concurrency control has two derctions: one is backward validation, checking for conflicts with any previously validated transaction, the other is forward validation, checking for conflicts with transaction that not yet committed. The validated transactions install...
The validation phase of optimistic concurrency control has two derctions: one is backward validation, checking for conflicts with any previously validated transaction, the other is forward validation, checking for conflicts with transaction that not yet committed. The validated transactions install modifications into "global" database, which means the main work backward validation needs to do is checking conflicts with "global" database. However, the forward validation needs to check conflicts with each running transaction. It introduces expensive communication between threads if the database supports multithreading, and also extensive memory read when the concurrency level of transaction is pretty high. As far as I know, forword validation is more widely adopted than backward validation. Why? Which cases are suited for forward validation and backward validation respectively?
Hovin (41 rep)
Jan 5, 2021, 04:40 AM
1 votes
3 answers
745 views
What is the difference between concurrency control in operating systems and in trasactional databases?
Recently I'm learning concurrency control techniques in transactional databases. However, I am so confused about the differences between concurrency control in operating systems and in transactional databases. Within my understanding, the concurrency control techniques introduced in database literat...
Recently I'm learning concurrency control techniques in transactional databases. However, I am so confused about the differences between concurrency control in operating systems and in transactional databases. Within my understanding, the concurrency control techniques introduced in database literature can be used in a multithreading program, whose threads share some variables with each other, and vice versa. The techniques used in multithreading programs for sharing variables between threads can also be used in databases for concurrency control. Why do we bother to introduce this differently in database literature and in operating systems?
Hovin (41 rep)
Dec 20, 2020, 08:46 AM • Last activity: Jan 4, 2021, 12:40 PM
2 votes
1 answers
890 views
CAP Theorem and MySQL
There's a CAP Theorem: R + W > N. Let's say I have W=1 (master), R = 6 (slaves), N = 6, so one slave can write and read. As the theorem says we could be sure in strong consistency (we always will get the newest value). I read that data storage could check the version and return the newest one. But i...
There's a CAP Theorem: R + W > N. Let's say I have W=1 (master), R = 6 (slaves), N = 6, so one slave can write and read. As the theorem says we could be sure in strong consistency (we always will get the newest value). I read that data storage could check the version and return the newest one. But is it true in fact of using MySQL? Or maybe we need to use any kind of middleman? How does CAP theorem apply to MySQL?
Alex Yeremenko (23 rep)
Sep 10, 2020, 03:40 PM • Last activity: Sep 10, 2020, 07:04 PM
1 votes
1 answers
312 views
How to avoid errors in server log when conflict fails within a transaction before retry?
We have a table such as create table pages ( id uuid primary key not null, parent_id uuid not null references pages(id), shortname text not null, shorturl text not null, constraint globally_unique_shorturl unique (shorturl) constraint locally_unique_shortname unique (parent_id,shortname) ) where the...
We have a table such as create table pages ( id uuid primary key not null, parent_id uuid not null references pages(id), shortname text not null, shorturl text not null, constraint globally_unique_shorturl unique (shorturl) constraint locally_unique_shortname unique (parent_id,shortname) ) where the logical intent is to keep a collection of web pages where the full URL for each page is constructed from shortname fields defined by parent_id references. Root of the system points to itself with parent_id. As such, any given child must have unique shortname per parent (constraint locally_unique_shortname). In addition, we have globally unique shorturl (think bit.ly) that obviously must be globally unique for the whole table. The target is to keep both shortname and shorturl as short as possible and as such, we generate short random identifiers for shorturl. In addition, the shortname is selectable by user to allow system to create semantic URLs for every page. When I'm trying to add a new page to the system (or modify existing page), we do it in a transaction and we create new SAVEPOINT immediately before trying to add a new page. If we get error ERROR: duplicate key value violates unique constraint 'XYZ' we know that we need to rollback to savepoint and then figure out if shorturl or shortname should be modified before retrying. Note that if shorturl fails, we can generate a new one automatically, but if shortname fails, we need to prompt user for a new value, so we really need to know which constraint failed. (In addition, this query fails pretty seldom so I'd prefer not having to try SELECT with the new value to figure out if it's already taken because this mostly succeeds without any extra queries. Also note that such SELECT would still race against other concurrent transactions and I'd still end up with the same issue every now and then.) We're using serializable transactions, in case it makes a difference. However, this causes PostgreSQL to always store this error in the server log even when the code obviously handles this correctly (rollback is emitted immediately the query fails). These errors spam the log to make other possible *real* problems harder to notice. Obviously, I can just fall back to filtering the log before trying to monitor it, but *that* would feel pretty poor solution. To workaround this, https://dba.stackexchange.com/q/246302/29183 suggests that I should use syntax INSERT ... ON CONFLICT DO NOTHING. This would be otherwise fine (I could try to do insert and if no rows were affected I can assume that I have a conflict and nothing will be added to server log) **but** I'm missing info about which constraint failed when there are multiple possible constraints. **Is there any nice way to solve this without having to modify source code of PostgreSQL or load some custom extension in PostgreSQL?**
Mikko Rantalainen (1059 rep)
Sep 1, 2020, 12:52 PM • Last activity: Sep 1, 2020, 02:37 PM
0 votes
1 answers
44 views
How can I generate a consistent entity graph (say from 10 tables) from a database fetch
Suppose I have many tables in a database that need to be joined together to get a single consistent view of the entire entity graph. Am I forced to do this as a single query? I don't care about locks, I just want to ensure the graph is consistent as of the time of the query execution. My concern is...
Suppose I have many tables in a database that need to be joined together to get a single consistent view of the entire entity graph. Am I forced to do this as a single query? I don't care about locks, I just want to ensure the graph is consistent as of the time of the query execution. My concern is one of performance of joining a large number of entities together in a single query. Is it possible to do this in multiple queries but still realize the same level of consistency as having performed these individual selects as a single query?
Newbie Guest (3 rep)
Aug 29, 2020, 05:48 PM • Last activity: Aug 30, 2020, 09:41 AM
4 votes
2 answers
1450 views
Transaction atomicity implies consistency
Transactions are said to need both atomicity and consistency. But, what is the point of consistency? Once you ensure that you either update both accounts on money transfer, you are consistent. Wikipedia article on consistency says that consistency is a set of programmatic constraints, invariants. Bu...
Transactions are said to need both atomicity and consistency. But, what is the point of consistency? Once you ensure that you either update both accounts on money transfer, you are consistent. Wikipedia article on consistency says that consistency is a set of programmatic constraints, invariants. But you are free to code your transaction operations any way. So, I feel that programmer only needs Atomicity, Isolation and Durability. Consistency is an additional convenience DB tool, like trigger or stored procedure and is not absolute low level requirement. Given AID, the programmer can always ensure the consistency. It is a simple as check user data and reject the request if some rule is broken. Right? It looks like Wikipedia identifies Atomicity with Consistency in the Data Consistency article, but not in the article on ACID transactions. Listening on eventual consistency, I see that developers mean that DB is always in some correct snapshot state, i.e. data are stabilized and you can query/issue your transaction upon the DB, which you could not do if DB is available in transition. That is, consistent DB is one which satisfies the constraints at any moment of time. But, it satisifies the constraints if operations are atomic. Atomicity implies that we see only transaction results, which implies that we see only snapshots, which means consistence. Why do you need to single the consistence out as if it is independent property?
Valentin Tihomirov (1 rep)
Mar 11, 2016, 02:07 PM • Last activity: Jan 22, 2020, 11:19 PM
5 votes
1 answers
865 views
How the single SQL statement is isolated?
All the books I've seen so far, talking about transactions show scenarios with several SQL statements involved. But what's about single statements? What level of isolation do they have? Is it specified somewhere in a standard? Or does it depend on RDBMS and isolation level? Let me show a couple of e...
All the books I've seen so far, talking about transactions show scenarios with several SQL statements involved. But what's about single statements? What level of isolation do they have? Is it specified somewhere in a standard? Or does it depend on RDBMS and isolation level? Let me show a couple of examples. 1. UPDATE table SET value = value + 1 WHERE id = 1; This is a composite read-update-write operation. Can a parallel transaction change the value between read and write operations? Some book states that this operation is atomic (in multithreaded programming meaning) in the _most of the RDBMS_. 2. SELECT * FROM table t1 JOIN table t2 USING (id); If the table is quite large (or query would have some complicated filtering clause), is it possible at some isolation level that t1.* and t2.* columns would differ due to parallel update? 3. SELECT * FROM table1 WHERE id IN (SELECT t_id FROM table2); Is it possible that some records from table1 are removed after sub-select has been executed? (Here I assume that table2.t_id references table1.id with cascaded removal.) 4. CTE... Links to the useful manuals that fully explain all the details of transactions are also appreciated.
viator (151 rep)
Nov 16, 2019, 11:55 AM • Last activity: Nov 16, 2019, 06:02 PM
Showing page 1 of 20 total questions