Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
139
views
How to configure mysql group replication with different ports?
I have a k8s cluster and each `mysql` instance is well connected together! But I have another `mysql` server outside the cluster! Each `mysql` inside the cluster has a default `mysql` port, which is `3306` and an external port; which is something random! So when I start `mysql` group replication wit...
I have a k8s cluster and each
mysql
instance is well connected together! But I have another mysql
server outside the cluster! Each mysql
inside the cluster has a default mysql
port, which is 3306
and an external port; which is something random! So when I start mysql
group replication with the instances that are inside the cluster everything works fine!
the thing is the mysql
instance that is outside of the cluster is trying to connect to the 3306
default port with repl
user but it should be connecting to the random port generated! and I don't know how to specify it to connect to the port I want to connect...
**how can i specify the outsider instance to use that random generated port to connect to other instances inside the cluster to use mysql group replication?**
here is my error log:
error connecting to master 'repl@db1-headless:3306'
Hasan Parasteh
(103 rep)
Aug 13, 2022, 02:31 PM
• Last activity: Aug 6, 2025, 01:03 AM
0
votes
0
answers
10
views
cassandra disk space not freeing up
I have Cassandra on K8s. It's being used as part of a temporal persistence store. I'm seeing that my disk space is ever growing. I'm deleting some completed workflows, but they still occupy disk space. I have set **gc_grace_period** to 6 hrs, and **LCS** is the compaction strategy being used. Forcin...
I have Cassandra on K8s. It's being used as part of a temporal persistence store. I'm seeing that my disk space is ever growing. I'm deleting some completed workflows, but they still occupy disk space.
I have set **gc_grace_period** to 6 hrs, and **LCS** is the compaction strategy being used. Forcing manual compaction will result in high disk I/O and CPU pressure, which can be problematic in my production environment. Is there any other way to solve this problem?
Mohit Agarwal
(1 rep)
Aug 3, 2025, 04:57 AM
• Last activity: Aug 3, 2025, 05:40 AM
3
votes
1
answers
63
views
remote query of postgres on kubernetes
I need to run queries against postgres. Postgres runs on a kubernetes PODs (HA architecture). In the old system this was done by copying a script onto the POD and run it locally. If I try that now, I get: >psql: FATAL: no pg_hba.conf entry for host "[local]" my_RO_user From some research it seems be...
I need to run queries against postgres. Postgres runs on a kubernetes PODs (HA architecture). In the old system this was done by copying a script onto the POD and run it locally. If I try that now, I get:
>psql: FATAL: no pg_hba.conf entry for host "[local]" my_RO_user
From some research it seems better-practice to run my query from another machine. I can find the master POD. What would I need as connect-string and query-string?
Note that /pgdata/pg11/pg_hba.conf contains:
# Do not edit this file manually!
# It will be overwritten by Patroni!
local all "postgres" peer
hostssl replication "_crunchyrepl" all cert
hostssl "postgres" "_crunchyrepl" all cert
host all "_crunchyrepl" all reject
host all "ccp_monitoring" "127.0.0.0/8" md5
host all "ccp_monitoring" "::1/128" md5
local all postgres peer
hostssl replication _crunchyrepl all cert
hostssl postgres _crunchyrepl all cert
host all _crunchyrepl all reject
hostssl all all all md5
host platform_analytics_wh mstr_pa all password
The database is analysisdb
, the user is rouser
, and the database listens on port 5432.
Can anyone suggest a way forward? Either fix that pg_hba error, or clarify how to query remotely?
Koos Schut
(31 rep)
Jul 4, 2025, 08:51 AM
• Last activity: Jul 4, 2025, 11:18 AM
0
votes
1
answers
295
views
MySQL InnoDB Cluster config - configure node address
I first asked this question on Stack Overflow but I woke up this morning to the realization that it belongs over here. I'm setting up an InnoDB Cluster using `mysqlsh`. This is in Kubernetes, but I think this question applies more generally. When I use `cluster.configureInstance()` I see messages th...
I first asked this question on Stack Overflow but I woke up this morning to the realization that it belongs over here.
I'm setting up an InnoDB Cluster using
mysqlsh
. This is in Kubernetes, but I think this question applies more generally.
When I use cluster.configureInstance()
I see messages that includes:
>This instance reports its own address as node-2:3306
However, the nodes can only find *each other* through DNS at an address like node-2.cluster:3306
. The problem comes when adding instances to the cluster; they try to find the other nodes without the qualified name. Errors are of the form:
[GCS] Error on opening a connection to peer node node-0:33061 when joining a group. My local port is: 33061.
It is using node-n:33061
rather than node-n.cluster:33061
.
If it matters, the "DNS" is set up as a headless service in Kubernetes that provides consistent addresses as pods come and go.It's very simple, and I named it "cluster" to created addresses of the form node-n.cluster
. The cluster itself is a statefulset that numbers nodes sequentially. I don't want to cloud this question with detail I don't think matters, however, as surely other configurations require the instances in the cluster to use DNS as well.
I thought that setting localAddress
when creating the cluster and adding the nodes would solve the problem. Indeed, after I added that to the createCluster
options, I can look in the database and see
| group_replication_local_address | node-0.cluster:33061 |
Yet when I look at the topology, it seems that the localAddress
option has no effect whatsoever (at this point I don't see what it does at all):
{
"clusterName": "mycluster",
"defaultReplicaSet": {
"name": "default",
"primary": "node-0:3306",
"ssl": "REQUIRED",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures.",
"topology": {
"node-0:3306": {
"address": "node-0:3306",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"replicationLag": null,
"role": "HA",
"status": "ONLINE",
"version": "8.0.29"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "node-0:3306"
}
And adding more instances continues to fail with the same communication errors.
Notable perhaps that I cannot set localAddress
to node-0.cluster:3306
; I get a message that that port is already in use. I tried 3307 but as before it had no effect.
How do I convince each instance that the address it needs to advertise is different? I will try other permutations of the localAddress
setting, but it doesn't look like it's intended to fix the problem I'm having. How do I reconcile the address the instance reports for itself with the address that's actually useful for other instances to find it?
Edit to add: Maybe it is a Kubernetes thing? Kubernetes sets the hostname of the pod to match the name of the pod. If so, how do I override it?
There has to be a way to tell the cluster to use something other than the machines' hostnames for discovering each other.
Jerry
(101 rep)
Jun 30, 2022, 04:43 PM
• Last activity: May 16, 2025, 08:09 AM
0
votes
0
answers
14
views
Commit Log Corruption & Cluster Resilience in K8ssandra on GKE
I'm Eran Betzalel from ExposeBox. We're migrating our big data infrastructure from Hadoop/HDFS/HBase VMs to a modern stack using GKE, GCS, and K8ssandra. While the move is exciting, we're currently facing a critical issue that I hope to get some insights on. **Issue Overview:** 1. **Commit Log Corru...
I'm Eran Betzalel from ExposeBox. We're migrating our big data infrastructure from Hadoop/HDFS/HBase VMs to a modern stack using GKE, GCS, and K8ssandra. While the move is exciting, we're currently facing a critical issue that I hope to get some insights on.
**Issue Overview:**
1. **Commit Log Corruption:**
* Our 8-node Cassandra cluster is experiencing frequent commit log corruptions. The latest occurred on two Cassandra nodes. Despite only two nodes showing the error, the entire cluster is affected, leading to a complete halt.
* The error message points to a bad header in one of the commit log files, suggesting a possible incomplete flush or other disk-related issues.
2. **Kubernetes Node Failure:**
* We detected that a Kubernetes node went down around the time the issue occurred. I'm curious how this event might be contributing to the corruption and what steps can be taken to shield Cassandra from such disruptions.
3. **Reliability Concerns:**
* It’s puzzling why corruption on just two nodes cascades to affect the whole cluster.
* I’m looking for recommendations on enhancing cluster resilience. Are there specific configurations or best practices to ensure that issues on individual nodes don’t compromise the entire cluster?
**Questions for the Community:**
* **Root Cause:**
* How can commit log corruption on only two nodes cause a complete cluster halt?
* **Resilience Strategies:**
* What are the best practices for configuring K8ssandra to handle node failures and unexpected Kubernetes disruptions?
* Are there specific settings or architectural changes that can help prevent such commit log issues from propagating cluster-wide?
* **Kubernetes Integration:**
* Given that a K8s node failure was detected around the time of the error, how can we make Cassandra more resilient in a dynamic, containerized environment?
Below is the stack trace for your reference:
org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Encountered bad header at position 8105604 of commit log /opt/cassandra/data/commitlog/CommitLog-8-1740071674398.log, with bad position but valid CRC
CommitLogReplayer.java:536 - Ignoring commit log replay error likely due to incomplete flush to disk
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:865)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:727)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:345)
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:208)
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:229)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:205)
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:208)
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:229)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:205)
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:147)
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:233)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:97)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:124)
Eran Betzalel
(101 rep)
Feb 23, 2025, 02:41 PM
0
votes
1
answers
371
views
Cassandra node fails to reconnect to the cluster on Kubernetes with Istio
I am intending to create a multi-datacenter Cassandra cluster with two datacenters distributed between two Kubernetes clusters. The cluster interconnection is configured by the means of Istio multicluster multi-primary setup. To provide an intercluster service discovery the DNS capture is enabled on...
I am intending to create a multi-datacenter Cassandra cluster with two datacenters distributed between two Kubernetes clusters. The cluster interconnection is configured by the means of Istio multicluster multi-primary setup. To provide an intercluster service discovery the DNS capture is enabled on both Istio control plains. The cassandra nodes on a cluster are configured as Kuberntes StatefulSet.
The caveat is that StatefulSet unique domain names for each pod in the set are not detected and resolved by Istio DNS capture. I have bypassed this by inserting an init container before the actual container with Cassandra, which creates a regular Kubernetes service, leading to the exact pod in a StatefulSet and have declared this service name as a
broadcast_address
in Cassandra configuration.
This setup works almost perfectly: the cluster is created, connection between remote datacenters is established and the data gets distributed between clusters.
The problem appears if the pod with Cassandra node is deleted and recreated. In such case the node cannot reconnect to the cluster. In the log I can see that the node tries to connect to the IP addresses of the custom services, created by init container and fails. But if the custom service is deleted before restarting the custom node (getting new IP address), the node successfully reconnects the cluster.
My possible guesses about the cause of the problem are:
1) Cassandra problem: for some reason the node cannot identify itself in a proper way and reconnect to the cluster
2) Istio problem: the Envoy proxy on querying the existing service returns a modified response, which, for some reason cannot be properly identified by Cassandra
Дмитро Іванов
(101 rep)
Mar 16, 2023, 07:40 AM
• Last activity: Jan 24, 2025, 04:07 AM
0
votes
0
answers
24
views
How to failover in Oracle Kubernetes environment when primary pod deleted with Observer?
I am running my Oracle primary database instance on pod `oracle-0`(SID=ORCLP) and standby database instance on `oracle-1` Kubernetes pod. They are connected with a headless service. data guard broker, fast_start failover all are enabled and configured properly. I am running an data guard broker obse...
I am running my Oracle primary database instance on pod
oracle-0
(SID=ORCLP) and standby database instance on oracle-1
Kubernetes pod. They are connected with a headless service. data guard broker, fast_start failover all are enabled and configured properly. I am running an data guard broker observer on another pod.
When I'm giving shutdown abort
command from primary pod, observer is successfully failover primary to standby pod oracle-1
.
But when I'm deleting primary pod(SID=ORCLP) for testing purpose if observer is handling failover properly, Observer is not failover primary to standby pod. Its giving me this error log continuously:
[W000 2025-01-08T05:53:05.356+00:00] Standby is in the SUSPEND state.
[W000 2025-01-08T05:53:06.355+00:00] Primary database cannot be reached.
[W000 2025-01-08T05:53:06.355+00:00] Fast-Start Failover suspended. Reset FSFO timer.
[W000 2025-01-08T05:53:06.355+00:00] Fast-Start Failover threshold has not exceeded. Retry for the next 30 seconds
[W000 2025-01-08T05:53:07.355+00:00] Try to connect to the primary.
[P005 2025-01-08T05:53:08.044+00:00] Failed to attach to ORCLP.
ORA-12545: Connect failed because target host or object does not exist
Unable to connect to database using ORCLP
Instead of failover after failover threshold, Its continuously trying connect with deleted primary pod(SID=ORCLP).
Is there anyway to solve this?
Sayed
(1 rep)
Jan 9, 2025, 05:16 AM
• Last activity: Jan 9, 2025, 05:17 AM
2
votes
1
answers
455
views
Cassandra cluster still trying to reach old IP
I've seen a strange behaviour since a node decommission; regularly some Cassandra nodes start massively outputting these logs: OutboundTcpConnection.java:570 - Cannot handshake version with /10.208.58.4 The issue is that that IP matches a non Cassandra pod. It's like that IP is a leftover of an old...
I've seen a strange behaviour since a node decommission; regularly some Cassandra nodes start massively outputting these logs:
OutboundTcpConnection.java:570 - Cannot handshake version with /10.208.58.4
The issue is that that IP matches a non Cassandra pod. It's like that IP is a leftover of an old pod, how can I solve it?
Aaron
(4420 rep)
Sep 23, 2022, 12:44 PM
• Last activity: Sep 12, 2024, 04:11 AM
0
votes
1
answers
122
views
Kubernetes create cluster multimaster using bitnami/mariadb-galera
I'm trying to deploy mariadb-galera multimaster on kubernetes. But when I set external volume I receive this error: ``` mkdir: cannot create directory '/bitnami/mariadb/data': Permission denied ``` this is my yaml: ``` # PersistentVolume apiVersion: v1 kind: PersistentVolume metadata: name: datadir-...
I'm trying to deploy mariadb-galera multimaster on kubernetes. But when I set external volume I receive this error:
mkdir: cannot create directory '/bitnami/mariadb/data': Permission denied
this is my yaml:
# PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: datadir-galera-0
namespace: mon-zabbix
labels:
app: galera-ss
podindex: "0"
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
hostPath:
path: /var/openebs/galera-0/datadir
---
# Persistent Volumes Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-datadir-galera-ss-0
namespace: mon-zabbix
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
selector:
matchLabels:
app: galera-ss
podindex: "0"
# Service
apiVersion: v1
kind: Service
metadata:
name: galera-ss
namespace: mon-zabbix
spec:
clusterIP: None
ports:
- name: mysql
port: 3306
selector:
app: galera-ss
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: galera-ss
namespace: mon-zabbix
spec:
serviceName: galera-ss
replicas: 3
selector:
matchLabels:
app: galera-ss
template:
metadata:
labels:
app: galera-ss
spec:
containers:
- name: galera
image: bitnami/mariadb-galera:latest
ports:
- containerPort: 3306
env:
- name: MARIADB_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: secrets
key: MYSQL_ROOT_PASSWORD
- name: MARIADB_GALERA_CLUSTER_NAME
valueFrom:
configMapKeyRef:
key: CLUSTER_NAME
name: configmap
- name: MARIADB_GALERA_NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MARIADB_GALERA_CLUSTER_BOOTSTRAP
value: "yes"
- name: MARIADB_GALERA_NODE_ADDRESS
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: MARIADB_GALERA_CLUSTER_ADDRESS
value: "galera-ss.default.svc.cluster.local"
- name: MARIADB_GALERA_MARIABACKUP_PASSWORD
valueFrom:
secretKeyRef:
name: secrets
key: XTRABACKUP_PASSWORD
volumeMounts:
- name: mysql-datadir
mountPath: /bitnami/mariadb
securityContext:
allowPrivilegeEscalation: false
volumeClaimTemplates:
- metadata:
name: mysql-datadir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
I tried to fix using the solutions find in other posts but it doesn't works for me. I'm using talos
to create the kubernetes cluster (v1.28.0)
Could someone show how can I fix it, please?
Marco Ferrara
(121 rep)
Apr 3, 2024, 03:12 PM
• Last activity: Sep 4, 2024, 02:50 PM
1
votes
1
answers
25
views
GridDB via Kubernetes
I want to deploy application using Kubernetes, one of the used components will be a GridDb database, I used the following article to deploy it - https://griddb.net/en/blog/creating-a-kubernetes-application-using-griddb-and-go/, the deployment manifest is shown below. One thing that I want to change...
I want to deploy application using Kubernetes, one of the used components will be a GridDb database, I used the following article to deploy it - https://griddb.net/en/blog/creating-a-kubernetes-application-using-griddb-and-go/ , the deployment manifest is shown below. One thing that I want to change is the securityContext, I'd like to avoid running as a root user. However, in documentation it says "we need to run as root user to have the sufficient permissions to save the changes to the config file". Any advise how I should proceed?
apiVersion: apps/v1
kind: Deployment
metadata:
name: griddb-server-deployment
spec:
replicas: 3
selector:
matchLabels:
app: griddb-server
template:
metadata:
labels:
app: griddb-server
spec:
volumes:
- name: griddb-pv-storage
persistentVolumeClaim:
claimName: griddb-server-pvc
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: griddb-server
containers:
- name: griddbcontainer
image: localhost:5000/griddb-server:01
imagePullPolicy: IfNotPresent
ports:
- containerPort: 10001
volumeMounts:
- mountPath: "/var/lib/gridstore/data"
name: griddb-pv-storage
securityContext:
runAsUser: 0
runAsGroup: 0
env:
- name: NOTIFICATION_MEMBER
value: '1'
- name: GRIDDB_CLUSTER_NAME
value: "myCluster"
Jacob_P
(29 rep)
Aug 10, 2024, 05:21 PM
• Last activity: Aug 19, 2024, 10:42 PM
0
votes
1
answers
26
views
Is there potential for inconsistencies when using LOCAL_QUORUM with Reaper in a sidecar?
I am preparing to run Cassandra Reaper 3.5.0 against Cassandra 4.1.x as a sidecar in a Kubernetes Pod (in a single Cassandra cluster). Is there any potential, given that, for inconsistent data across DCs when LOCAL_QUORUM is used consistently, if the hints window isn't exceeded?
I am preparing to run Cassandra Reaper 3.5.0 against Cassandra 4.1.x as a sidecar in a Kubernetes Pod (in a single Cassandra cluster).
Is there any potential, given that, for inconsistent data across DCs when LOCAL_QUORUM is used consistently, if the hints window isn't exceeded?
tdhso
(5 rep)
Mar 25, 2024, 04:45 PM
• Last activity: Aug 1, 2024, 11:03 AM
0
votes
1
answers
32
views
What is a viable low-cost DR option for a large cluster?
We have a Cassandra cluster running on GKE with a 32-CPU node pool and SSD disks. The current cluster size is nearly 1 PB, with each node utilizing an average of 5 TB on 10 TB allocated SSD disks. The cluster comprises 200 nodes, each with 10 TB disks, totaling 2 PB in size total allocated. Given th...
We have a Cassandra cluster running on GKE with a 32-CPU node pool and SSD disks. The current cluster size is nearly 1 PB, with each node utilizing an average of 5 TB on 10 TB allocated SSD disks. The cluster comprises 200 nodes, each with 10 TB disks, totaling 2 PB in size total allocated.
Given this cluster size, the maintenance costs are substantial. How can we achieve low-cost disaster recovery for such a large cluster?
One option I am considering is creating a new data center in a different region with a replication factor of 1 (RF1). While this is not recommended, it would at least reduce the cluster size by a factor of three.
Any suggestions would be greatly appreciated.
Sai
(39 rep)
Jul 12, 2024, 07:47 PM
• Last activity: Jul 26, 2024, 09:59 AM
0
votes
0
answers
1243
views
Postgresql replication slot does not exist
I am trying to set up logical replication between two databases (deployed via kubernetes). On the origin database, I created a logical replication slot as follows: ``` SELECT pg_create_logical_replication_slot('sub_test', 'pgoutput') ``` and a publication ``` CREATE PUBLICATION pub FOR TABLE "AOI" `...
I am trying to set up logical replication between two databases (deployed via kubernetes).
On the origin database, I created a logical replication slot as follows:
SELECT pg_create_logical_replication_slot('sub_test', 'pgoutput')
and a publication
CREATE PUBLICATION pub FOR TABLE "AOI"
On my subscriber db, I create the subscription
CREATE SUBSCRIPTION sub_test CONNECTION 'postgresql://user:password@localhost:5432/postgres'
PUBLICATION pub
WITH (slot_name=sub_test, create_slot=false, copy_data=True)
However, in the logs for my subscriber database, I am seeing:
2024-05-30 15:49:46.654 UTC ERROR: replication slot "sub_test" does not exist
2024-05-30 15:49:46.654 UTC STATEMENT: START_REPLICATION SLOT "sub_test" LOGICAL 0/0 (proto_version '1', publication_names '"pub"')
2024-05-30 15:49:46.654 UTC ERROR: could not start WAL streaming: ERROR: replication slot "sub_test" does not exist
Why does the replication slot not exist, when I have just created it? I am new to this and do not totally understand what the error is meant to tell me.
jm22b
(101 rep)
May 30, 2024, 09:14 PM
0
votes
1
answers
40
views
Cloned Cassandra cluster returns gossip IllegalStateException: Attempting gossip state mutation from illegal thread: GossipTasks:1
After cloning Cassandra database from one cluster to another cluster using nodetool refresh we are seeing below errors often. This might not be related to migration but not sure. What causes this issue? Is there any explanation? How to fix this and does this cause any issues? ```` ERROR 16:27:50 jav...
After cloning Cassandra database from one cluster to another cluster using nodetool refresh we are seeing below errors often. This might not be related to migration but not sure. What causes this issue? Is there any explanation? How to fix this and does this cause any issues?
`
ERROR 16:27:50 java.lang.IllegalStateException: Attempting gossip state mutation from illegal thread: GossipTasks:1
at org.apache.cassandra.gms.Gossiper.checkProperThreadForStateMutation(Gossiper.java:178)
at org.apache.cassandra.gms.Gossiper.evictFromMembership(Gossiper.java:465)
at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:895)
at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:78)
at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:240)
at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
at java.lang.Thread.run(Thread.java:748)
java.lang.IllegalStateException: Attempting gossip state mutation from illegal thread: GossipTasks:1
at org.apache.cassandra.gms.Gossiper.checkProperThreadForStateMutation(Gossiper.java:178) [apache-cassandra-3.11.5.jar:3.11.5]
at org.apache.cassandra.gms.Gossiper.evictFromMembership(Gossiper.java:465) [apache-cassandra-3.11.5.jar:3.11.5]
at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:895) [apache-cassandra-3.11.5.jar:3.11.5]
at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:78) [apache-cassandra-3.11.5.jar:3.11.5]
at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:240) [apache-cassandra-3.11.5.jar:3.11.5]
at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) [apache-cassandra-3.11.5.jar:3.11.5]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_222]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_222]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_222]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_222]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) [apache-cassandra-3.11.5.jar:3.11.5]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_222]
`
Sai
(39 rep)
Sep 18, 2023, 09:03 PM
• Last activity: May 15, 2024, 05:12 AM
1
votes
2
answers
74
views
Access a MySQL storage with two independent instances
I tried to deploy MySQL deployment with Kubernetes, having three replicas which are accessing the same storage(PVC). Here is the configuration ``` apiVersion: v1 kind: PersistentVolume metadata: name: mysql-pv labels: type: local spec: persistentVolumeReclaimPolicy: Retain capacity: storage: 1Gi acc...
I tried to deploy MySQL deployment with Kubernetes, having three replicas which are accessing the same storage(PVC). Here is the configuration
apiVersion: v1
kind: PersistentVolume
metadata:
name: mysql-pv
labels:
type: local
spec:
persistentVolumeReclaimPolicy: Retain
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
hostPath:
path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: mysql-service
spec:
type: NodePort
ports:
- protocol: TCP
port: 3307
targetPort: 3306
nodePort: 30091
selector:
app: mysql
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql
spec:
replicas: 2
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- image: mysql:latest
name: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: pwd
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-storage
persistentVolumeClaim:
claimName: mysql-pvc
When you apply this configuration file kubectl apply -f file_name.yaml
, you can create three pods, which are accessing the same storage for the databases. When you check the pods' status kubectl get pods
, you can see only one pod becomes running and others be in a CrashLoop
state. What is happening is, when creating more than one instance to use a common storage, only one instance can acquire the lock of the ibdata1
file. That's why only one pod becomes healthy and others in CrashLoop.( you can see this using kubectl logs pod-name
). What I want is,
1. Can I release the lock of the ibdata
file and use the storage for all the pods?(this mostly can not, because of consistency issues)
2. If not, how can I create the proposed idea?( accessing a single storage/volume using multiple pod instances)?
3. Would you suggest other ideas to achieve accessing a single storage using multiple pod instances?
Your answers and help are welcomed.
Sivakajan
(23 rep)
May 6, 2024, 12:30 PM
• Last activity: May 7, 2024, 07:29 AM
2
votes
0
answers
26
views
Kubernetes: Influxdb 1.8.10 container can’t create users
I deployed on docker **InfluxDB v 1.8.10** with command: docker run --name influxdb -t -e INFLUXDB_HTTP_AUTH_ENABLED=“true” -e INFLUXDB_DB=“mydatabase” -e INFLUXDB_USER=“user” -e INFLUXDB_USER_PASSWORD=“user” -e INFLUXDB_ADMIN_USER=“admin” -e INFLUXDB_ADMIN_PASSWORD=“admin” –restart unless-stopped -...
I deployed on docker **InfluxDB v 1.8.10** with command:
docker run --name influxdb -t
-e INFLUXDB_HTTP_AUTH_ENABLED=“true”
-e INFLUXDB_DB=“mydatabase”
-e INFLUXDB_USER=“user”
-e INFLUXDB_USER_PASSWORD=“user”
-e INFLUXDB_ADMIN_USER=“admin”
-e INFLUXDB_ADMIN_PASSWORD=“admin”
–restart unless-stopped
-d influxdb:1.8.10
When I connect I see that the new **Admin** user is created.
Now I would like to deploy it on Kubernetes, this is my code:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: influxdb
namespace: mon-grafana
spec:
serviceName: influxdb
replicas: 3
selector:
matchLabels:
app: influxdb
template:
metadata:
labels:
app: influxdb
spec:
containers:
- name: influxdb
image: influxdb:1.8.10
ports:
- containerPort: 8086
name: influxdb
protocol: TCP
resources:
requests:
cpu: 250m
memory: 500Mi
limits:
cpu: 2
memory: 500Mi
env:
- name: INFLUXDB_HTTP_AUTH_ENABLED
value: "true"
- name: INFLUXDB_DB
value: "mydatabase"
- name: INFLUXDB_USER
value: "user"
- name: INFLUXDB_USER_PASSWORD
value: "user"
- name: INFLUXDB_ADMIN_USER
value: "admin"
- name: INFLUXDB_ADMIN_PASSWORD
value: "admin"
volumeMounts:
- name: pvc-influxdb
mountPath: /var/lib/influxdb
- name: influxdb-config
mountPath: "/etc/influxdb/influxdb.conf"
subPath: influxdb.conf
securityContext:
allowPrivilegeEscalation: false
volumes:
- name: influxdb-config
configMap:
name: configmap
items:
- key: influxdb.conf
path: influxdb.conf
volumeClaimTemplates:
- metadata:
name: pvc-influxdb
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2Gi
this is my **infliuxdb.conf** file:
root@influxdb-0:/# cat /etc/influxdb/influxdb.conf
reporting-enabled = false
[meta]
dir = “/var/lib/influxdb/meta”
[data]
dir = “/var/lib/influxdb/data”
wal-dir = “/var/lib/influxdb/wal”
max-series-per-database = 1000000
max-values-per-tag = 100000
index-version = “tsi1”
max-index-log-file-size = “100k”
[http]
log-enabled = false
enabled = true
bind-address = “0.0.0.0:8086”
https-enabled = false
flux-enabled = true
After deploy it in Kubernetes, the **Admin** user is not created during deploy, and when I try to connect I receive this error:
> root@influxdb-0:/# influx -username admin -password admin Connected to
> http://localhost:8086 version 1.8.10 InfluxDB shell version: 1.8.10
>
> show databases; ERR: error authorizing query: create admin user first
> or disable authentication Warning: It is possible this error is due to
> not setting a database. Please set a database with the command “use ”.
How can create the Admin user during the initial deploy (like I did in docker)?
Marco Ferrara
(121 rep)
May 2, 2024, 08:14 AM
• Last activity: May 2, 2024, 09:13 AM
1
votes
3
answers
557
views
Need a Role that can CREATE USER but not allowed to GRANT Predefined Roles in PostgreSQL <16
We are administrating DB in kubernetes to our customers. For each new cluster, we create a new user with `CREATE ROLE` and `CREATE DB` roles given to them, so that they can create their own databases and new users with stricter roles. The problem is, with `CREATE ROLE` permission, the user can GRANT...
We are administrating DB in kubernetes to our customers.
For each new cluster, we create a new user with
CREATE ROLE
and CREATE DB
roles given to them, so that they can create their own databases and new users with stricter roles.
The problem is, with CREATE ROLE
permission, the user can GRANT himself pg_execute_server_program
role, and then use reverse shell attack and then get shell from our pod and read the environment variables, which is not desired. E.g. we have several secrets in env vars that the customer can take advantage of and then increase their attack range and take over more things.
In short, I want to have a user that can create new users, but can't grant himself specific predefined roles.
Update: I looked and found that ADMIN_OPTION
has been added in PostgreSQL 16 to resolve such issue. My problem is we're using PostgreSQL version 13,14,15 and we can't just force upgrade all clusters.
imans77
(111 rep)
Feb 4, 2024, 06:11 PM
• Last activity: Feb 24, 2024, 06:44 PM
0
votes
1
answers
142
views
How to migrate 2 different Cassandra databases into one cluster
We have two separate Cassandra clusters, each with six nodes: Cluster A and Cluster B. Our requirement is to merge these two clusters to save costs. Because this is a critical application, we need to plan this with no downtime or a very minimal cutover window. We can create tables from Cluster B to...
We have two separate Cassandra clusters, each with six nodes: Cluster A and Cluster B. Our requirement is to merge these two clusters to save costs. Because this is a critical application, we need to plan this with no downtime or a very minimal cutover window.
We can create tables from Cluster B to Cluster A and copy data using snapshots, but how can we ensure continuous writes for live data into the new Cluster A and minimize downtime? Any suggestions on how to proceed with this activity?
Sai
(39 rep)
Feb 14, 2024, 08:23 PM
• Last activity: Feb 19, 2024, 04:48 PM
0
votes
0
answers
102
views
Block access from pgAdmin to Postgres in kubernetes
We are trying to setup a pgAdmin but want to avoid that the root user could connect and create, update or delete anything. We have create a new readonly-user for this case and this works fine. But it is still possible to create a server in pgAdmin with root data. Our idea was to block the connection...
We are trying to setup a pgAdmin but want to avoid that the root user could connect and create, update or delete anything.
We have create a new readonly-user for this case and this works fine. But it is still possible to create a server in pgAdmin with root data. Our idea was to block the connection for root but in kubernetes there are no static IPs for services. That means we cant block the root access without blocking the backend and keycloak.
Are there any other concepts we are missing but very helpful for our usecase?
user6266369
(101 rep)
Jan 15, 2024, 01:39 PM
0
votes
1
answers
62
views
MSSQL 2019 CU16 on Linux: virtual_address_space_committed_kb vs RSS
I am running SQL Server 2019 CU15 on Linux in kubernetes and trying to investigate why SQL Server pod gets killed by OOMKiller. I see some inconsistency in reporting of memory consumption. SQL Server POD has a limit of 16 Gigs of RAM (both as a limit and as a startup argument) now and only 14000 as...
I am running SQL Server 2019 CU15 on Linux in kubernetes and trying to investigate why SQL Server pod gets killed by OOMKiller. I see some inconsistency in reporting of memory consumption. SQL Server POD has a limit of 16 Gigs of RAM (both as a limit and as a startup argument) now and only 14000 as Max Server Memory. Currently Target Server Memory is reported as 11397920 Kb, which is aligned with what I get from memory clerks by summing up. So inside of SQL Server it all looks consistent. If I query sys.dm_os_process_memory then virtual_address_space_committed_kb equals to 11410248 which is almost the same number. But I can't get that order of number from OS level by any means. I am using image from Microsoft, so I cannot add anything for extended diag. RSS value in ps shows 15 gigs. same is for top. I have a gut feeling that this difference is exactly what crosses the limit and causes OOMKiller to kill the pod. Knee-jerk would be to add this gap to any SQL Server pod's limit while keeping max server memory the same. But I still want to have an understanding of **what is that extra mem used for and how to track it internally?**
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000630+ 7 0.0 0.0 61636 23524 ? Sl Aug03 0:00 /opt/mssql/bin/sqlservr
1000630+ 9 26.9 11.4 278592548 15122076 ? Sl Aug03 12325:54 /opt/mssql/bin/sqlservr
1000630+ 400010 0.0 0.0 2628 544 pts/0 Ss 11:15 0:00 sh -i -c TERM=xterm sh
1000630+ 400016 0.0 0.0 2628 548 pts/0 S 11:15 0:00 sh
1000630+ 400019 0.0 0.0 6192 3388 pts/0 S+ 11:15 0:00 top
1000630+ 400254 0.0 0.0 2628 612 pts/1 Ss 11:42 0:00 sh -i -c TERM=xterm sh
1000630+ 400260 0.0 0.0 2628 612 pts/1 S 11:42 0:00 sh
1000630+ 400262 0.0 0.0 6192 3296 pts/1 S+ 11:42 0:00 top
1000630+ 400880 0.0 0.0 2628 544 pts/2 Ss 12:51 0:00 sh -i -c TERM=xterm sh
1000630+ 400886 0.0 0.0 2628 544 pts/2 S 12:51 0:00 sh
1000630+ 400954 0.0 0.0 5912 2904 pts/2 R+ 12:59 0:00 ps aux
Oleg Lavrov
(1 rep)
Sep 4, 2023, 01:20 PM
• Last activity: Sep 4, 2023, 02:52 PM
Showing page 1 of 20 total questions