Sample Header Ad - 728x90

How repeatable read isolation level and others are implemented in distributed/replicated databases?

0 votes
1 answer
224 views
I'm studying distributed systems/DBs and I'm struggling understanding isolation levels when we talk about distributed systems. Avoiding problems like dirty read, non-repeatable read, phantom reads, write-skew, etc. when we have a single DB is pretty straightforward with the introduction of optimistic / pessimistic concurrency control algorithms. Nevertheless, I'm really not understanding how the same problems are avoided when we deal with distributed systems. ## Example Simple DB cluster with 3 nodes in strong consistency setup Let's say that we have three total nodes (*N = 3*) for our DB and we want strong consistency for some reason (*R = 2* and *W = 2*, so *R + W > N*). Let' say now that we have two transactions: T1, T2. - T1:
SELECT * FROM X WHERE X.field = 'something'

   ... DO WORK ...

   SELECT * FROM X WHERE X.field = 'something'
- T2:
INSERT INTO X VALUES(..,..,..)   -- impact T1 search criteria
T2 will commit while T1 is in "DO WORK" phase, so we will have a *phantom read* problem. ## Question How is this situation handled in the illustrated system above? Do systems like this use 2PC-like algorithm and rely on the fact that one transaction will fail in one node due to the R+W>N constraint? If yes, is it a used solution? I would say that this is complex (when we have to rollback the committed transaction in Node_X) and it is also slow probably. Do you have any useful material that I can check to continue studying this topic? I really cannot find much about this, there is very few material that discusses isolation level in distributed systems. Feel free to correct the above if I made a mistake. Thank you.
Asked by Dev (1 rep)
Aug 14, 2022, 04:20 PM
Last activity: Jun 11, 2025, 07:07 PM