Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes
1 answers
910 views
Queries return ReadFailure exception, "Replica(s) failed to execute read"
I've a cluster consisting of 5 cassandra nodes. The cassandra version used is cassandra-3.11.3-1.noarch The `keyspace` strategy and is defined as follows: CREATE KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '2'} AND durable_writes = true; Now running qu...
I've a cluster consisting of 5 cassandra nodes. The cassandra version used is cassandra-3.11.3-1.noarch The keyspace strategy and is defined as follows: CREATE KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '2'} AND durable_writes = true; Now running queries on the cluster, the following issue occurs: > ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] > message="Operation failed - received 0 responses and 1 failures" > info={'consistency': 'LOCAL_ONE', 'received_responses': 0, 'required_responses': 1, 'failures': 1} > info={'failures': None, 'consistency': 'Not Set', 'required_responses': None, 'received_responses': None} Does anyone know what is causing this? If more information needed to better debug this problem, please let me know! **UPDATE 1** root# awk '!/#/' /etc/cassandra/conf/cassandra-rackdc.properties dc=datacenter1 rack=rack1
Valentin Bajrami (111 rep)
Aug 22, 2018, 08:29 AM • Last activity: Jul 30, 2025, 05:07 AM
3 votes
1 answers
181 views
How do distributed databases follow unique constraints?
Lets say i have an application where user can register, and the username has to be unqiue value. Now lets say i have `N partitions` and for each partition i have `M replicas with multiple leaders`. Now i have questions regarding these scenarios: First: 1) User 1 attempts to register with username `u...
Lets say i have an application where user can register, and the username has to be unqiue value. Now lets say i have N partitions and for each partition i have M replicas with multiple leaders. Now i have questions regarding these scenarios: First: 1) User 1 attempts to register with username user1 - the write request gets routed to partition1 and to leader1 2) User 2 attempts to register with username user1 - the write request gets routed to the same partition1 and also to the leader1, In this scenario the behavior is same as we had just one database. First transaction occures and the second one fails since the user1 value is already here and we are operating on the same replika Second: 1) User 1 attempts to register with username user1 - the write request gets routed to partition1 and to leader1 2) User 2 attempts to register with username user1 - the write request gets routed to the same partition1 and to leader2, In this case we have concurrent write. How does this determine what registration fails and what not? We can look at this as no partition and multiple leader and as far as i researched in this case the typical solution is to either 1) prevent this by doing first scenario or 2) merge the values which is not acceptable in this case. Or solve conflicts on application level that is also not acceptable. How do DB's deal with this ? Third: 1) User 1 attempts to register with username user1 - the write request gets routed to partition1 and to leader1 2) User 2 attempts to register with username user1 - the write request gets routed to the same partition2 and to leader3, In this case all writes go to different partitions ( what makes sense to me that this will probably not happen in real life since they have same value and thus should be routed to one partition ). How would the DB resolve what registration would succeed and which one would fail? How would it lock stuff or check if the value exists and so on? The more i read about distributed DB's and how it works (even on high level ) im more and more confused. Thanks for answers!
Johnyb (131 rep)
May 11, 2023, 01:39 PM • Last activity: Jun 30, 2025, 03:02 PM
0 votes
1 answers
224 views
How repeatable read isolation level and others are implemented in distributed/replicated databases?
I'm studying distributed systems/DBs and I'm struggling understanding isolation levels when we talk about distributed systems. Avoiding problems like dirty read, non-repeatable read, phantom reads, write-skew, etc. when we have a single DB is pretty straightforward with the introduction of optimisti...
I'm studying distributed systems/DBs and I'm struggling understanding isolation levels when we talk about distributed systems. Avoiding problems like dirty read, non-repeatable read, phantom reads, write-skew, etc. when we have a single DB is pretty straightforward with the introduction of optimistic / pessimistic concurrency control algorithms. Nevertheless, I'm really not understanding how the same problems are avoided when we deal with distributed systems. ## Example Simple DB cluster with 3 nodes in strong consistency setup Let's say that we have three total nodes (*N = 3*) for our DB and we want strong consistency for some reason (*R = 2* and *W = 2*, so *R + W > N*). Let' say now that we have two transactions: T1, T2. - T1:
SELECT * FROM X WHERE X.field = 'something'

   ... DO WORK ...

   SELECT * FROM X WHERE X.field = 'something'
- T2:
INSERT INTO X VALUES(..,..,..)   -- impact T1 search criteria
T2 will commit while T1 is in "DO WORK" phase, so we will have a *phantom read* problem. ## Question How is this situation handled in the illustrated system above? Do systems like this use 2PC-like algorithm and rely on the fact that one transaction will fail in one node due to the R+W>N constraint? If yes, is it a used solution? I would say that this is complex (when we have to rollback the committed transaction in Node_X) and it is also slow probably. Do you have any useful material that I can check to continue studying this topic? I really cannot find much about this, there is very few material that discusses isolation level in distributed systems. Feel free to correct the above if I made a mistake. Thank you.
Dev (1 rep)
Aug 14, 2022, 04:20 PM • Last activity: Jun 11, 2025, 07:07 PM
2 votes
1 answers
490 views
Distribute records on different MySQL databases - MySQL Proxy alternative
My scenario is the following: Right now I am using one big MySQL database with multiple tables to store user data. Many tables contain auto increment columns. I would like to split this into 2 or more databases. The distribution should be done by user_id and is determined (cannot be randomized). E.g...
My scenario is the following: Right now I am using one big MySQL database with multiple tables to store user data. Many tables contain auto increment columns. I would like to split this into 2 or more databases. The distribution should be done by user_id and is determined (cannot be randomized). E.g. user 1 and 2 should be on database1, user 3 on database2, user 4 on database3. Since I don't want to change my whole frontend, I would like to still use one db adapter and kind of add a layer between the query generation (frontend) and the query execution (on the right database). This layer should distribute the queries to the right database based on the user_id. I have found MySQL Proxy which sounds exactly like what I need. Unfortunately, it's in alpha and not recommended to be used in a production environment. For php there is MySQL Native Driver Plugin API which sounds promising but then I need a layer that supports at least php *and* java. Is there any other way I can achieve my objectives? Thanks!
horen (129 rep)
Mar 25, 2014, 08:47 AM • Last activity: Mar 3, 2025, 02:05 PM
1 votes
0 answers
1330 views
Postgres Citus error "complex joins are only supported when all distributed tables are joined on their distribution columns with equal operator"
I have two tables defined as distributed in Citus based on the same field: ``` select create_distributed_table('gedi','clould_metadata_id'); select create_distributed_table('cloud_metadata','clould_metadata_id'); ``` The clould_metadata_id is unique in cloud_metadata, gedi table stores millions of r...
I have two tables defined as distributed in Citus based on the same field:
select create_distributed_table('gedi','clould_metadata_id');
select create_distributed_table('cloud_metadata','clould_metadata_id');
The clould_metadata_id is unique in cloud_metadata, gedi table stores millions of records, cloud_metadata stores around 3000. When I try to inner join those tables using:
select * from cloud_metadata cm 
inner join gedi g 
on cm.clould_metadata_id = g.clould_metadata_id
I get the error message "SQL Error [0A000]: ERROR: complex joins are only supported when all distributed tables are joined on their distribution columns with equal operator". I believe that's precisely what I'm trying to do, join those tables using de distribution columns, so what I'm doing wrong?
Mauro Assis (111 rep)
Sep 26, 2021, 01:17 PM • Last activity: Mar 3, 2025, 07:04 AM
0 votes
1 answers
54 views
Strict Serializable vs Serializable Transaction Latency and Throughput
Is the transaction latency and throughput performance in the distributed database with a consistency level of strict serializable significantly lower than serializable? I'm already trying to understand the consistency level. My possible answer to this problem is that when the consistency level is hi...
Is the transaction latency and throughput performance in the distributed database with a consistency level of strict serializable significantly lower than serializable? I'm already trying to understand the consistency level. My possible answer to this problem is that when the consistency level is higher, the transaction performance is lower. The effect is because the distributed database behaves in less concurrent/parallel and waits for the state to be ideal when executing the transaction. Consistency level context: https://jepsen.io/consistency
muazhari (1 rep)
Sep 8, 2024, 01:32 PM • Last activity: Jan 3, 2025, 04:57 PM
1 votes
1 answers
763 views
Multi master vs leader-follower
Sorry if I ask this basic question. I am reading different articles and trying to understand how it works. Lets suppose I have two master DB(MySQL) in two different availability zone and two different region which are used for writing data and data is replicated in Synchronous way because I do not w...
Sorry if I ask this basic question. I am reading different articles and trying to understand how it works. Lets suppose I have two master DB(MySQL) in two different availability zone and two different region which are used for writing data and data is replicated in Synchronous way because I do not want any data to be lost during replication. And there are 4 slave (As per my understanding this is called Read replicas) which are used for reading the data. The data is replicated from Master to Slave as per the steps here https://dev.mysql.com/doc/internals/en/replication.html . Is it Synchronous or Asynchronous replication? The main purpose of multi-master setup as I understood is that in case one Master node is down then another Master node can be there always. But then I am thinking, why leader-follower algorithm is not used here, for example, lets suppose I have one Master node for writing purpose and 4 slaves for reading purpose in different Availability zone. The master DB is in Region US and AZ-1. But slaves DB And data is synchronized in the same way as before. Now, if the master node goes down then one of the slave node can become Master by implementing leader-follower algorithm. So, we do not need multi master. As per my understanding, in approach B there is a possibility that data can be lost if master node goes down. And also concurrent update into multiple Master node can cause data inconstancy. What are the pros and cons of this approach? Could you please give some use cases where both of the approaches fit.
Learner (11 rep)
Jun 27, 2022, 09:07 AM • Last activity: Dec 19, 2024, 06:09 PM
0 votes
2 answers
771 views
What is the real advantage of using a UUID instead of an auto-incremented element?
I've been looking at resources to understand UUIDs. I now understand what they are, but I still don't really see where they are truly useful. Most of the information I've found revolves around the idea of a distributed system: you get almost truly unique id's automatically generated and so on all th...
I've been looking at resources to understand UUIDs. I now understand what they are, but I still don't really see where they are truly useful. Most of the information I've found revolves around the idea of a distributed system: you get almost truly unique id's automatically generated and so on all the distributed systems you shouldn't expect to see an id twice. But why do you want that? what is the issue if two different databases have the same id? Is this only useful if you want to collect all the data from all databases and have them be unique?
masonCherry (101 rep)
Jan 25, 2023, 11:17 PM • Last activity: Aug 11, 2024, 04:03 PM
0 votes
1 answers
75 views
I need to move from sqlite to a distributed setup. What are my options?
I have an sqlite db that has grown to 30gb and I'm still pushing data to it everyday. I have a couple services that write to it using libsqlite3. Soon it will become too big to keep locally. What is the best way for me to migrate to a distributed setup? One way I can think of is sshfs but that won't...
I have an sqlite db that has grown to 30gb and I'm still pushing data to it everyday. I have a couple services that write to it using libsqlite3. Soon it will become too big to keep locally. What is the best way for me to migrate to a distributed setup? One way I can think of is sshfs but that won't work if my machine is offline. A solution that caches the most recently used parts of my db locally and keeps the rest somewhere else would be perfect. Most of the time I'll be using recently pushed data. I could start archiving years old data but accessing it will require manual work.
thewolf (103 rep)
May 26, 2024, 01:27 PM • Last activity: Jun 3, 2024, 09:19 PM
2 votes
1 answers
1151 views
Replication Subscribers and AlwaysOn Availability Groups (SQL Server)
I have AlwaysOn Availability Groups configured with Replication Subscribers, and is currently running successfully with one publisher using the command below. ``` -- commands to execute at the publisher, in the publisher database: EXEC sp_addsubscription @publication = N'yocc_pub', @subscriber = N'r...
I have AlwaysOn Availability Groups configured with Replication Subscribers, and is currently running successfully with one publisher using the command below.
-- commands to execute at the publisher, in the publisher database:
EXEC sp_addsubscription 
    @publication = N'yocc_pub', 
    @subscriber = N'repl_agl', 
    @destination_db = N'YOCCDB', 
    @subscription_type = N'Push', 
    @sync_type = N'automatic', 
    @article = N'all', 
    @update_mode = N'read only', 
    @subscriber_type = 0;
GO
    
EXEC sp_addpushsubscription_agent 
    @publication = N'yocc_pub', 
    @subscriber = N'repl_agl', 
    @subscriber_db = N'YOCCDB', 
    @job_login = null, 
    @job_password = null, 
    @subscriber_security_mode = 1;
GO
All commands complete successfully. **My first problem** is "I want to add one more publisher " to the AlwaysOn subscriber but I am getting the error below: > **sql server error-21488 > sql server error-18752** **Note:-** - node-1 (AlwaysOn primary node and replication subscriber) - node-2 (secondary replica and report server) - node-3 (publisher 1)/yocc_pub(configured as distributor) - node-4 (publisher 2)/yocc_pub2(configured successfully except the two above errors - #21488, #18752) - listener name - repl_agl - database - yoccdb **Second problem-** How do I maintain primary key values to avoid conflicts between publisher 1 and publisher 2 when they sync with database on node-1?
Gulrez Khan (363 rep)
Jul 22, 2016, 06:13 AM • Last activity: May 8, 2024, 02:46 PM
0 votes
0 answers
20 views
Distributions when only one physical node
Working on Azure Synapse, we have for now around 30 tables on a dev environment. I want to optimize the tables before replicating them in qal and prod env. As far as I understand, we only have one physical node: Subscription DW100c, and when running `SELECT DISTINCT(pwd_node_id) FROM sys.pdw_nodes_i...
Working on Azure Synapse, we have for now around 30 tables on a dev environment. I want to optimize the tables before replicating them in qal and prod env. As far as I understand, we only have one physical node: Subscription DW100c, and when running SELECT DISTINCT(pwd_node_id) FROM sys.pdw_nodes_indexes I get a unique id. At the moment, everything is round-robin and because there is unicity of node, I guess that transitionning some to hash won't lead in any improvement of performance, and will rather lead to a loss in loading time for the PowerBI semantic models that refresh every day. Could you please confirm?
Gregoire_M (1 rep)
Feb 5, 2024, 05:05 PM • Last activity: Feb 5, 2024, 05:13 PM
1 votes
2 answers
197 views
Safe upgrade routine for Cassandra
I've been tasked with upgrading an Ubuntu based Cassandra cluster - having no prior experience with Cassandra. I've tried digging through the [docs][1] but have been unable to find any instructions on how to do upgrades in a supported manner. Luckily I have a test cluster which upgrades fine by just...
I've been tasked with upgrading an Ubuntu based Cassandra cluster - having no prior experience with Cassandra. I've tried digging through the docs but have been unable to find any instructions on how to do upgrades in a supported manner. Luckily I have a test cluster which upgrades fine by just ensuring that apt points to the correct repo and doing apt upgrade - but perhaps that is just due to the database not being under any particular load. Should the nodes be cordoned prior to upgrade or something similar? Or is it safe to trust apt to do its thing? Should, for instance, nodetool upgradesstables be run manually after an upgrade from 4.0 to 4.1? Is there documentation on this topic that I've failed to locate? I managed to find this resource and it is quite a lot more involved than the apt upgrade I feel was completely successful.
azzid (113 rep)
Nov 20, 2023, 12:40 PM • Last activity: Nov 21, 2023, 07:04 PM
0 votes
1 answers
5233 views
The remote copy of database has not been rolled forward to a point in time that is encompassed in the local copy of the database log
Can someone help me with the issue I'm dealing with? I had to force failover from my primary datacenter to my secondary datacenter and I forgot to disable the log backup job. Meanwhile the new secondary servers were shut down for a couple of times because of switch update. The new changes on the dat...
Can someone help me with the issue I'm dealing with? I had to force failover from my primary datacenter to my secondary datacenter and I forgot to disable the log backup job. Meanwhile the new secondary servers were shut down for a couple of times because of switch update. The new changes on the database weren't committed on them but log backup job in the primary date center server started and the log file was reused without acknowledgment from the secondary nodes! In the result the database in the secondary server became unhealthy. I have no idea how the log file was reused without acknowledgment! Can you help me with this? I checked the the error logs on all servers and I got this error The remote copy of database has not been rolled forward to a point in time that is encompassed in the local copy of the database log **I wasn't expecting the log file to be reused without acknowledgment from the second server. log file was reused without acknowledgment from the secondary nodes The remote copy of database has not been rolled forward to a point in time that is encompassed in the local copy of the database log**
alireza (1 rep)
Jul 23, 2023, 12:01 PM • Last activity: Jul 25, 2023, 03:22 PM
0 votes
1 answers
42 views
Should I create 2 different databases if they would keep 70% similar data?
I am writing a microservices based NodeJS-Cassandra application and I haVe few services they would need 70% similar data( like username, avatar, videos, etc.) and I am just wondering to know is it a good idea that I create 2 databases like: DB1: > A,B,C,D,E,F,G,H,I,J,K,L,M,N DB2: > A,B,C,D,E,F,G,H,I...
I am writing a microservices based NodeJS-Cassandra application and I haVe few services they would need 70% similar data( like username, avatar, videos, etc.) and I am just wondering to know is it a good idea that I create 2 databases like: DB1: > A,B,C,D,E,F,G,H,I,J,K,L,M,N DB2: > A,B,C,D,E,F,G,H,I,J,W,X,Y,Z Or just create one database like: > A,B,C,D,E,F,G,H,I,J,K,L,M,N,W,X,Y,Z I know from microservices concept that we should give it's own database to each service, but as long as my data contains some massive data like video files, I don't know what should I do in this specific case? I like to know if there is any rule of thumb says **"if your databases would have more than x-% of similar data, or the data you should replicate in both databases exceeds x-GB, it's better to keep them in 1 database"**?
best_of_man (117 rep)
May 22, 2023, 05:34 PM • Last activity: May 23, 2023, 12:04 AM
2 votes
2 answers
11760 views
What configurations do I need to fix the "unable to begin a distributed transaction" error when trying to run a remote procedure?
This is the error message I'm receiving: > Msg 7391, Level 16, State 2, Procedure spStoredProc, Line 62 [Batch Start Line 1] > The operation could not be performed because OLE DB provider "SQLNCLI11" for linked server "MyLinkedServer" was unable to begin a distributed transaction. As a test my store...
This is the error message I'm receiving: > Msg 7391, Level 16, State 2, Procedure spStoredProc, Line 62 [Batch Start Line 1] > The operation could not be performed because OLE DB provider "SQLNCLI11" for linked server "MyLinkedServer" was unable to begin a distributed transaction. As a test my stored procedure query is just SELECT 1 AS A and works locally on the server but doesn't work when I call it remotely on a linked server.
J.D. (40893 rep)
Feb 24, 2020, 11:07 PM • Last activity: Dec 3, 2022, 12:05 PM
9 votes
3 answers
7482 views
Why is it more difficult to horizontally scale a relational database than a NoSQL database like MongoDB?
Is the main reason for the difficulty in horizontally scaling / distributing an RDBMS because of an adherence to ACID transactions? Is it the fact that multiple tables are so interconnected, or something else? My impression is that it’s mostly the ACID requirements, given that different nodes in a c...
Is the main reason for the difficulty in horizontally scaling / distributing an RDBMS because of an adherence to ACID transactions? Is it the fact that multiple tables are so interconnected, or something else? My impression is that it’s mostly the ACID requirements, given that different nodes in a cluster may have different values at any given time. But heck, I’m fuzzy on how ACID even works. Conversely, why are some NoSQL databases so much easier to distribute? I don’t know enough about distributed databases to understand why one db can be easily distributed but the other cannot. Can anyone shed some light?
Greg Thomas (201 rep)
Jul 23, 2022, 12:38 AM • Last activity: Aug 27, 2022, 05:44 AM
0 votes
1 answers
422 views
Shortest paths on huge graphs: Neo4J or OrientDB?
Kia Ora, I have a program that very frequently requires finding the fastest path (both the node sequence and total cost/length) on graphs containing ~50k nodes. Per run, I require on the order of millions of shortest path requests. I have just finished an OrientDB implementation which has significan...
Kia Ora, I have a program that very frequently requires finding the fastest path (both the node sequence and total cost/length) on graphs containing ~50k nodes. Per run, I require on the order of millions of shortest path requests. I have just finished an OrientDB implementation which has significantly improved the compute time over my initial, non-graphDB attempt (which simply crashed). To perform testing, I am running the server locally on a series of distributed machines. However, in theory, would Neo4J, or another such platform, be faster again? What gains could I expect to receive? Could I host this process online, for example? Ngā mihi.
Jordan MacLachlan (3 rep)
Dec 10, 2021, 04:18 AM • Last activity: Aug 23, 2022, 06:00 PM
0 votes
1 answers
160 views
generate unique primary key for all clients from the local database and NOT the server
I'm using MySQL Innodb database to work as a distributed database among many places and also each place will work offline but at the end all the data will be centralized in one server, I want to generate a unique primary key for registration table that will be unique for all clients, first I thought...
I'm using MySQL Innodb database to work as a distributed database among many places and also each place will work offline but at the end all the data will be centralized in one server, I want to generate a unique primary key for registration table that will be unique for all clients, first I thought about int auto_increment but then I realized that this will cause conflict between each client as all of them will generate a new auto_increment from their local database and not from server so in this case all will start from 1 and increment for each new register
now I'm thinking of using a CHAR or VARCHAR to put a prefix for each registration in client's local database, or is there another way to do so?
Bayar (101 rep)
Aug 31, 2016, 06:49 AM • Last activity: Dec 2, 2021, 05:06 AM
-1 votes
1 answers
2046 views
Perform a transaction across multiple databases at the same time
I have a system which performs many actions frequently and am looking at splitting databases up a bit to spread out the load and speed up the system hopefully. The fear is that by splitting up these databases is that transactions may become out of sync. My question is, if I split up my databases. Ca...
I have a system which performs many actions frequently and am looking at splitting databases up a bit to spread out the load and speed up the system hopefully. The fear is that by splitting up these databases is that transactions may become out of sync. My question is, if I split up my databases. Can I still perform transactions on say 3 to 6 databases at the same time? Meaning, if there's some sort of err, roll them all back at the same time? Edit: Am looking at splitting up the databases since there's certain areas that will be called frequently but are unrelated to each other. Edit 2: By database I mean a separate machine. Which I’m now seeing instance is probably the word I should use.
Trevor (137 rep)
Oct 11, 2021, 01:42 AM • Last activity: Oct 12, 2021, 12:27 AM
0 votes
0 answers
278 views
SQL Server linked server query issue
I am using SQL Server 2016 and recently one of the stored procedures I have been using is throwing error. I use a select statement with joins on many tables to the remote server using a linked server **SQL** BEGIN TRANSACTION SELECT a.col1, b.col1 FROM [LinkedServer].[Database].[Schema].[TableA] AS...
I am using SQL Server 2016 and recently one of the stored procedures I have been using is throwing error. I use a select statement with joins on many tables to the remote server using a linked server **SQL** BEGIN TRANSACTION SELECT a.col1, b.col1 FROM [LinkedServer].[Database].[Schema].[TableA] AS a JOIN [LinkedServer].[Database].[Schema].[TableB] AS b ON a.col2 = b.col2 COMMIT TRANSACTION **Error** New request is not allowed to start because it should come with valid transaction descriptor When I change the BEGIN TRANSACTION to BEGIN DISTRIBUTED TRANSACTION I am running into a different error OLE DB provider "SQLNCLI11" for linked server was unable to begin a distributed transaction. The remote server too is SQL Server 2016. Not sure on how to resolve the errors.
ITHelpGuy (109 rep)
Oct 6, 2021, 11:26 PM
Showing page 1 of 20 total questions