Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

139 views

How to configure mysql group replication with different ports?

I have a k8s cluster and each `mysql` instance is well connected together! But I have another `mysql` server outside the cluster! Each `mysql` inside the cluster has a default `mysql` port, which is `3306` and an external port; which is something random! So when I start `mysql` group replication wit...

I have a k8s cluster and each mysql instance is well connected together! But I have another mysql server outside the cluster! Each mysql inside the cluster has a default mysql port, which is 3306 and an external port; which is something random! So when I start mysql group replication with the instances that are inside the cluster everything works fine! the thing is the mysql instance that is outside of the cluster is trying to connect to the 3306 default port with repl user but it should be connecting to the random port generated! and I don't know how to specify it to connect to the port I want to connect... **how can i specify the outsider instance to use that random generated port to connect to other instances inside the cluster to use mysql group replication?** here is my error log:

error connecting to master 'repl@db1-headless:3306'

Hasan Parasteh (103 rep)

Aug 13, 2022, 02:31 PM • Last activity: Aug 6, 2025, 01:03 AM

0 votes

0 answers

10 views

cassandra disk space not freeing up

cassandra kubernetes

I have Cassandra on K8s. It's being used as part of a temporal persistence store. I'm seeing that my disk space is ever growing. I'm deleting some completed workflows, but they still occupy disk space. I have set **gc_grace_period** to 6 hrs, and **LCS** is the compaction strategy being used. Forcin...

                                  I have Cassandra on K8s. It's being used as part of a temporal persistence store. I'm seeing that my disk space is ever growing. I'm deleting some completed workflows, but they still occupy disk space. 

I have set **gc_grace_period** to 6 hrs, and **LCS** is the compaction strategy being used. Forcing manual compaction will result in high disk I/O and CPU pressure, which can be problematic in my production environment. Is there any other way to solve this problem?

Mohit Agarwal (1 rep)

Aug 3, 2025, 04:57 AM • Last activity: Aug 3, 2025, 05:40 AM

3 votes

1 answers

63 views

remote query of postgres on kubernetes

postgresql kubernetes

I need to run queries against postgres. Postgres runs on a kubernetes PODs (HA architecture). In the old system this was done by copying a script onto the POD and run it locally. If I try that now, I get: >psql: FATAL: no pg_hba.conf entry for host "[local]" my_RO_user From some research it seems better-practice to run my query from another machine. I can find the master POD. What would I need as connect-string and query-string? Note that /pgdata/pg11/pg_hba.conf contains:

# Do not edit this file manually!
# It will be overwritten by Patroni!
local all "postgres" peer
hostssl replication "_crunchyrepl" all cert
hostssl "postgres" "_crunchyrepl" all cert
host all "_crunchyrepl" all reject
host all "ccp_monitoring" "127.0.0.0/8" md5
host all "ccp_monitoring" "::1/128" md5
local all postgres peer
hostssl replication _crunchyrepl all cert
hostssl postgres _crunchyrepl all cert
host all _crunchyrepl all reject
hostssl all all all md5
host platform_analytics_wh mstr_pa all password

The database is analysisdb, the user is rouser, and the database listens on port 5432. Can anyone suggest a way forward? Either fix that pg_hba error, or clarify how to query remotely?

Koos Schut (31 rep)

Jul 4, 2025, 08:51 AM • Last activity: Jul 4, 2025, 11:18 AM

0 votes

1 answers

295 views

MySQL InnoDB Cluster config - configure node address

innodb mysql-innodb-cluster kubernetes

I first asked this question on Stack Overflow but I woke up this morning to the realization that it belongs over here. I'm setting up an InnoDB Cluster using `mysqlsh`. This is in Kubernetes, but I think this question applies more generally. When I use `cluster.configureInstance()` I see messages th...

I first asked this question on Stack Overflow but I woke up this morning to the realization that it belongs over here. I'm setting up an InnoDB Cluster using mysqlsh. This is in Kubernetes, but I think this question applies more generally. When I use cluster.configureInstance() I see messages that includes: >This instance reports its own address as node-2:3306 However, the nodes can only find *each other* through DNS at an address like node-2.cluster:3306. The problem comes when adding instances to the cluster; they try to find the other nodes without the qualified name. Errors are of the form: [GCS] Error on opening a connection to peer node node-0:33061 when joining a group. My local port is: 33061. It is using node-n:33061 rather than node-n.cluster:33061. If it matters, the "DNS" is set up as a headless service in Kubernetes that provides consistent addresses as pods come and go.It's very simple, and I named it "cluster" to created addresses of the form node-n.cluster. The cluster itself is a statefulset that numbers nodes sequentially. I don't want to cloud this question with detail I don't think matters, however, as surely other configurations require the instances in the cluster to use DNS as well. I thought that setting localAddress when creating the cluster and adding the nodes would solve the problem. Indeed, after I added that to the createCluster options, I can look in the database and see | group_replication_local_address | node-0.cluster:33061 | Yet when I look at the topology, it seems that the localAddress option has no effect whatsoever (at this point I don't see what it does at all):

{
    "clusterName": "mycluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "node-0:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures.", 
        "topology": {
            "node-0:3306": {
                "address": "node-0:3306", 
                "memberRole": "PRIMARY", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "replicationLag": null, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.29"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "node-0:3306"
}

And adding more instances continues to fail with the same communication errors. Notable perhaps that I cannot set localAddress to node-0.cluster:3306; I get a message that that port is already in use. I tried 3307 but as before it had no effect. How do I convince each instance that the address it needs to advertise is different? I will try other permutations of the localAddress setting, but it doesn't look like it's intended to fix the problem I'm having. How do I reconcile the address the instance reports for itself with the address that's actually useful for other instances to find it? Edit to add: Maybe it is a Kubernetes thing? Kubernetes sets the hostname of the pod to match the name of the pod. If so, how do I override it? There has to be a way to tell the cluster to use something other than the machines' hostnames for discovering each other.

Jerry (101 rep)

Jun 30, 2022, 04:43 PM • Last activity: May 16, 2025, 08:09 AM

0 votes

0 answers

14 views

Commit Log Corruption & Cluster Resilience in K8ssandra on GKE

cassandra corruption kubernetes

I'm Eran Betzalel from ExposeBox. We're migrating our big data infrastructure from Hadoop/HDFS/HBase VMs to a modern stack using GKE, GCS, and K8ssandra. While the move is exciting, we're currently facing a critical issue that I hope to get some insights on. **Issue Overview:** 1. **Commit Log Corruption:** * Our 8-node Cassandra cluster is experiencing frequent commit log corruptions. The latest occurred on two Cassandra nodes. Despite only two nodes showing the error, the entire cluster is affected, leading to a complete halt. * The error message points to a bad header in one of the commit log files, suggesting a possible incomplete flush or other disk-related issues. 2. **Kubernetes Node Failure:** * We detected that a Kubernetes node went down around the time the issue occurred. I'm curious how this event might be contributing to the corruption and what steps can be taken to shield Cassandra from such disruptions. 3. **Reliability Concerns:** * It’s puzzling why corruption on just two nodes cascades to affect the whole cluster. * I’m looking for recommendations on enhancing cluster resilience. Are there specific configurations or best practices to ensure that issues on individual nodes don’t compromise the entire cluster? **Questions for the Community:** * **Root Cause:** * How can commit log corruption on only two nodes cause a complete cluster halt? * **Resilience Strategies:** * What are the best practices for configuring K8ssandra to handle node failures and unexpected Kubernetes disruptions? * Are there specific settings or architectural changes that can help prevent such commit log issues from propagating cluster-wide? * **Kubernetes Integration:** * Given that a K8s node failure was detected around the time of the error, how can we make Cassandra more resilient in a dynamic, containerized environment? Below is the stack trace for your reference:

org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Encountered bad header at position 8105604 of commit log /opt/cassandra/data/commitlog/CommitLog-8-1740071674398.log, with bad position but valid CRC
CommitLogReplayer.java:536 - Ignoring commit log replay error likely due to incomplete flush to disk
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:865)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:727)
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:345)
    at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:208)
    at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:229)
    at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:205)
    at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:208)
    at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:229)
    at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:205)
    at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:147)
    at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:233)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
    at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:97)
    at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:124)

Eran Betzalel (101 rep)

Feb 23, 2025, 02:41 PM

0 votes

1 answers

371 views

Cassandra node fails to reconnect to the cluster on Kubernetes with Istio

cassandra kubernetes

I am intending to create a multi-datacenter Cassandra cluster with two datacenters distributed between two Kubernetes clusters. The cluster interconnection is configured by the means of Istio multicluster multi-primary setup. To provide an intercluster service discovery the DNS capture is enabled on...

                                  I am intending to create a multi-datacenter Cassandra cluster with two datacenters distributed between two Kubernetes clusters. The cluster interconnection is configured by the means of Istio multicluster multi-primary setup. To provide an intercluster service discovery the DNS capture is enabled on both Istio control plains. The cassandra nodes on a cluster are configured as Kuberntes StatefulSet.

The caveat is that StatefulSet unique domain names for each pod in the set are not detected and resolved by Istio DNS capture. I have bypassed this by inserting an init container before the actual container with Cassandra, which creates a regular Kubernetes service, leading to the exact pod in a StatefulSet and have declared this service name as a broadcast_address in Cassandra configuration.

This setup works almost perfectly: the cluster is created, connection between remote datacenters is established and the data gets distributed between clusters.

The problem appears if the pod with Cassandra node is deleted and recreated. In such case the node cannot reconnect to the cluster. In the log I can see that the node tries to connect to the IP addresses of the custom services, created by init container and fails. But if the custom service is deleted before restarting the custom node (getting new IP address), the node successfully reconnects the cluster.

My possible guesses about the cause of the problem are:

1) Cassandra problem: for some reason the node cannot identify itself in a proper way and reconnect to the cluster
2) Istio problem: the Envoy proxy on querying the existing service returns a modified response, which, for some reason cannot be properly identified by Cassandra

Дмитро Іванов (101 rep)

Mar 16, 2023, 07:40 AM • Last activity: Jan 24, 2025, 04:07 AM

0 votes

0 answers

24 views

How to failover in Oracle Kubernetes environment when primary pod deleted with Observer?

oracle dataguard kubernetes

I am running my Oracle primary database instance on pod `oracle-0`(SID=ORCLP) and standby database instance on `oracle-1` Kubernetes pod. They are connected with a headless service. data guard broker, fast_start failover all are enabled and configured properly. I am running an data guard broker obse...

I am running my Oracle primary database instance on pod oracle-0(SID=ORCLP) and standby database instance on oracle-1 Kubernetes pod. They are connected with a headless service. data guard broker, fast_start failover all are enabled and configured properly. I am running an data guard broker observer on another pod. When I'm giving shutdown abort command from primary pod, observer is successfully failover primary to standby pod oracle-1. But when I'm deleting primary pod(SID=ORCLP) for testing purpose if observer is handling failover properly, Observer is not failover primary to standby pod. Its giving me this error log continuously:

[W000 2025-01-08T05:53:05.356+00:00] Standby is in the SUSPEND state.
[W000 2025-01-08T05:53:06.355+00:00] Primary database cannot be reached.
[W000 2025-01-08T05:53:06.355+00:00] Fast-Start Failover suspended. Reset FSFO timer.
[W000 2025-01-08T05:53:06.355+00:00] Fast-Start Failover threshold has not exceeded. Retry for the next 30 seconds
[W000 2025-01-08T05:53:07.355+00:00] Try to connect to the primary.
[P005 2025-01-08T05:53:08.044+00:00] Failed to attach to ORCLP.
ORA-12545: Connect failed because target host or object does not exist

Unable to connect to database using ORCLP

Instead of failover after failover threshold, Its continuously trying connect with deleted primary pod(SID=ORCLP). Is there anyway to solve this?

Sayed (1 rep)

Jan 9, 2025, 05:16 AM • Last activity: Jan 9, 2025, 05:17 AM

2 votes

1 answers

455 views

Cassandra cluster still trying to reach old IP

cassandra kubernetes

I've seen a strange behaviour since a node decommission; regularly some Cassandra nodes start massively outputting these logs: OutboundTcpConnection.java:570 - Cannot handshake version with /10.208.58.4 The issue is that that IP matches a non Cassandra pod. It's like that IP is a leftover of an old...

                                  I've seen a strange behaviour since a node decommission; regularly some Cassandra nodes start massively outputting these logs:

    OutboundTcpConnection.java:570 - Cannot handshake version with /10.208.58.4

The issue is that that IP matches a non Cassandra pod. It's like that IP is a leftover of an old pod, how can I solve it?

Aaron (4420 rep)

Sep 23, 2022, 12:44 PM • Last activity: Sep 12, 2024, 04:11 AM

0 votes

1 answers

122 views

Kubernetes create cluster multimaster using bitnami/mariadb-galera

mariadb galera kubernetes

I'm trying to deploy mariadb-galera multimaster on kubernetes. But when I set external volume I receive this error: ``` mkdir: cannot create directory '/bitnami/mariadb/data': Permission denied ``` this is my yaml: ``` # PersistentVolume apiVersion: v1 kind: PersistentVolume metadata: name: datadir-...

I'm trying to deploy mariadb-galera multimaster on kubernetes. But when I set external volume I receive this error:

mkdir: cannot create directory '/bitnami/mariadb/data': Permission denied

this is my yaml:

# PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
  name: datadir-galera-0
  namespace: mon-zabbix
  labels:
    app: galera-ss
    podindex: "0"
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1Gi
  hostPath:
    path: /var/openebs/galera-0/datadir
---
# Persistent Volumes Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-datadir-galera-ss-0
  namespace: mon-zabbix
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  selector:
    matchLabels:
      app: galera-ss
      podindex: "0"
# Service
apiVersion: v1
kind: Service
metadata:
  name: galera-ss
  namespace: mon-zabbix
spec:
  clusterIP: None
  ports:
  - name: mysql
    port: 3306
  selector:
    app: galera-ss
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: galera-ss
  namespace: mon-zabbix
spec:
  serviceName: galera-ss
  replicas: 3
  selector:
    matchLabels:
      app: galera-ss
  template:
    metadata:
      labels:
        app: galera-ss
    spec:
      containers:
      - name: galera
        image: bitnami/mariadb-galera:latest
        ports:
        - containerPort: 3306
        env:
        - name: MARIADB_ROOT_PASSWORD
          valueFrom:
              secretKeyRef:
                name: secrets
                key: MYSQL_ROOT_PASSWORD
        - name: MARIADB_GALERA_CLUSTER_NAME
          valueFrom:
              configMapKeyRef:
                key: CLUSTER_NAME
                name: configmap
        - name: MARIADB_GALERA_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: MARIADB_GALERA_CLUSTER_BOOTSTRAP
          value: "yes"
        - name: MARIADB_GALERA_NODE_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: MARIADB_GALERA_CLUSTER_ADDRESS
          value: "galera-ss.default.svc.cluster.local"
        - name: MARIADB_GALERA_MARIABACKUP_PASSWORD
          valueFrom:
              secretKeyRef:
                name: secrets
                key: XTRABACKUP_PASSWORD
        volumeMounts:
        - name: mysql-datadir
          mountPath: /bitnami/mariadb
        securityContext:
          allowPrivilegeEscalation: false
  volumeClaimTemplates:
  - metadata:
      name: mysql-datadir
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

I tried to fix using the solutions find in other posts but it doesn't works for me. I'm using talos to create the kubernetes cluster (v1.28.0) Could someone show how can I fix it, please?

Marco Ferrara (121 rep)

Apr 3, 2024, 03:12 PM • Last activity: Sep 4, 2024, 02:50 PM

1 votes

1 answers

25 views

GridDB via Kubernetes

kubernetes griddb

I want to deploy application using Kubernetes, one of the used components will be a GridDb database, I used the following article to deploy it - https://griddb.net/en/blog/creating-a-kubernetes-application-using-griddb-and-go/ , the deployment manifest is shown below. One thing that I want to change is the securityContext, I'd like to avoid running as a root user. However, in documentation it says "we need to run as root user to have the sufficient permissions to save the changes to the config file". Any advise how I should proceed?

apiVersion: apps/v1
kind: Deployment
metadata:
  name: griddb-server-deployment
spec:
  replicas: 3 
  selector:
    matchLabels:
      app: griddb-server
  template:
    metadata:
      labels:
        app: griddb-server
    spec:
      volumes:
        - name: griddb-pv-storage
          persistentVolumeClaim:
            claimName: griddb-server-pvc
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: griddb-server
      containers:
        - name: griddbcontainer
          image: localhost:5000/griddb-server:01
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 10001
          volumeMounts:
            - mountPath: "/var/lib/gridstore/data"
              name: griddb-pv-storage
          securityContext:
            runAsUser: 0
            runAsGroup: 0
          env:
          - name: NOTIFICATION_MEMBER
            value: '1'
          - name: GRIDDB_CLUSTER_NAME
            value: "myCluster"

Jacob_P (29 rep)

Aug 10, 2024, 05:21 PM • Last activity: Aug 19, 2024, 10:42 PM

0 votes

1 answers

26 views

Is there potential for inconsistencies when using LOCAL_QUORUM with Reaper in a sidecar?

cassandra kubernetes

I am preparing to run Cassandra Reaper 3.5.0 against Cassandra 4.1.x as a sidecar in a Kubernetes Pod (in a single Cassandra cluster). Is there any potential, given that, for inconsistent data across DCs when LOCAL_QUORUM is used consistently, if the hints window isn't exceeded?

                                  I am preparing to run Cassandra Reaper 3.5.0 against Cassandra 4.1.x as a sidecar in a Kubernetes Pod (in a single Cassandra cluster).

Is there any potential, given that, for inconsistent data across DCs when LOCAL_QUORUM is used consistently, if the hints window isn't exceeded?

tdhso (5 rep)

Mar 25, 2024, 04:45 PM • Last activity: Aug 1, 2024, 11:03 AM

0 votes

1 answers

32 views

What is a viable low-cost DR option for a large cluster?

cassandra disaster-recovery database-size kubernetes

We have a Cassandra cluster running on GKE with a 32-CPU node pool and SSD disks. The current cluster size is nearly 1 PB, with each node utilizing an average of 5 TB on 10 TB allocated SSD disks. The cluster comprises 200 nodes, each with 10 TB disks, totaling 2 PB in size total allocated. Given th...

                                  We have a Cassandra cluster running on GKE with a 32-CPU node pool and SSD disks. The current cluster size is nearly 1 PB, with each node utilizing an average of 5 TB on 10 TB allocated SSD disks. The cluster comprises 200 nodes, each with 10 TB disks, totaling 2 PB in size total allocated.

Given this cluster size, the maintenance costs are substantial. How can we achieve low-cost disaster recovery for such a large cluster?

One option I am considering is creating a new data center in a different region with a replication factor of 1 (RF1). While this is not recommended, it would at least reduce the cluster size by a factor of three.

Any suggestions would be greatly appreciated.

Sai (39 rep)

Jul 12, 2024, 07:47 PM • Last activity: Jul 26, 2024, 09:59 AM

0 votes

0 answers

1243 views

Postgresql replication slot does not exist

postgresql kubernetes

I am trying to set up logical replication between two databases (deployed via kubernetes). On the origin database, I created a logical replication slot as follows: ``` SELECT pg_create_logical_replication_slot('sub_test', 'pgoutput') ``` and a publication ``` CREATE PUBLICATION pub FOR TABLE "AOI" `...

I am trying to set up logical replication between two databases (deployed via kubernetes). On the origin database, I created a logical replication slot as follows:

SELECT pg_create_logical_replication_slot('sub_test', 'pgoutput')

and a publication

CREATE PUBLICATION pub FOR TABLE "AOI"

On my subscriber db, I create the subscription

CREATE SUBSCRIPTION sub_test CONNECTION 'postgresql://user:password@localhost:5432/postgres' 
PUBLICATION pub 
WITH (slot_name=sub_test, create_slot=false, copy_data=True)

However, in the logs for my subscriber database, I am seeing:

2024-05-30 15:49:46.654 UTC  ERROR:  replication slot "sub_test" does not exist 
2024-05-30 15:49:46.654 UTC  STATEMENT:  START_REPLICATION SLOT "sub_test" LOGICAL 0/0 (proto_version '1', publication_names '"pub"')
2024-05-30 15:49:46.654 UTC  ERROR:  could not start WAL streaming: ERROR:  replication slot "sub_test" does not exist

Why does the replication slot not exist, when I have just created it? I am new to this and do not totally understand what the error is meant to tell me.

jm22b (101 rep)

May 30, 2024, 09:14 PM

0 votes

1 answers

40 views

Cloned Cassandra cluster returns gossip IllegalStateException: Attempting gossip state mutation from illegal thread: GossipTasks:1

cassandra kubernetes

After cloning Cassandra database from one cluster to another cluster using nodetool refresh we are seeing below errors often. This might not be related to migration but not sure. What causes this issue? Is there any explanation? How to fix this and does this cause any issues? ```` ERROR 16:27:50 jav...

`
ERROR 16:27:50 java.lang.IllegalStateException: Attempting gossip state mutation from illegal thread: GossipTasks:1
	at org.apache.cassandra.gms.Gossiper.checkProperThreadForStateMutation(Gossiper.java:178)
	at org.apache.cassandra.gms.Gossiper.evictFromMembership(Gossiper.java:465)
	at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:895)
	at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:78)
	at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:240)
	at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
	at java.lang.Thread.run(Thread.java:748)

java.lang.IllegalStateException: Attempting gossip state mutation from illegal thread: GossipTasks:1
	at org.apache.cassandra.gms.Gossiper.checkProperThreadForStateMutation(Gossiper.java:178) [apache-cassandra-3.11.5.jar:3.11.5]
	at org.apache.cassandra.gms.Gossiper.evictFromMembership(Gossiper.java:465) [apache-cassandra-3.11.5.jar:3.11.5]
	at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:895) [apache-cassandra-3.11.5.jar:3.11.5]
	at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:78) [apache-cassandra-3.11.5.jar:3.11.5]
	at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:240) [apache-cassandra-3.11.5.jar:3.11.5]
	at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) [apache-cassandra-3.11.5.jar:3.11.5]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_222]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_222]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_222]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_222]
	at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) [apache-cassandra-3.11.5.jar:3.11.5]
	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_222]

Sai (39 rep)

Sep 18, 2023, 09:03 PM • Last activity: May 15, 2024, 05:12 AM

1 votes

2 answers

74 views

Access a MySQL storage with two independent instances

mysql ibdata kubernetes

I tried to deploy MySQL deployment with Kubernetes, having three replicas which are accessing the same storage(PVC). Here is the configuration ``` apiVersion: v1 kind: PersistentVolume metadata: name: mysql-pv labels: type: local spec: persistentVolumeReclaimPolicy: Retain capacity: storage: 1Gi acc...

I tried to deploy MySQL deployment with Kubernetes, having three replicas which are accessing the same storage(PVC). Here is the configuration

apiVersion: v1
kind: PersistentVolume
metadata:
  name: mysql-pv
  labels:
    type: local
spec:
  persistentVolumeReclaimPolicy: Retain
  capacity:
    storage: 1Gi
  accessModes:
  - ReadWriteMany
  hostPath:
    path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-service
spec:
  type: NodePort
  ports:
  - protocol: TCP
    port: 3307
    targetPort: 3306
    nodePort: 30091
  selector:
    app: mysql
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - image: mysql:latest
        name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: pwd
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-storage
        persistentVolumeClaim:
          claimName: mysql-pvc

When you apply this configuration file kubectl apply -f file_name.yaml, you can create three pods, which are accessing the same storage for the databases. When you check the pods' status kubectl get pods, you can see only one pod becomes running and others be in a CrashLoop state. What is happening is, when creating more than one instance to use a common storage, only one instance can acquire the lock of the ibdata1 file. That's why only one pod becomes healthy and others in CrashLoop.( you can see this using kubectl logs pod-name). What I want is, 1. Can I release the lock of the ibdata file and use the storage for all the pods?(this mostly can not, because of consistency issues) 2. If not, how can I create the proposed idea?( accessing a single storage/volume using multiple pod instances)? 3. Would you suggest other ideas to achieve accessing a single storage using multiple pod instances? Your answers and help are welcomed.

Sivakajan (23 rep)

May 6, 2024, 12:30 PM • Last activity: May 7, 2024, 07:29 AM

2 votes

0 answers

26 views

Kubernetes: Influxdb 1.8.10 container can’t create users

docker kubernetes influx-db

I deployed on docker **InfluxDB v 1.8.10** with command: docker run --name influxdb -t -e INFLUXDB_HTTP_AUTH_ENABLED=“true” -e INFLUXDB_DB=“mydatabase” -e INFLUXDB_USER=“user” -e INFLUXDB_USER_PASSWORD=“user” -e INFLUXDB_ADMIN_USER=“admin” -e INFLUXDB_ADMIN_PASSWORD=“admin” –restart unless-stopped -...

                                  I deployed on docker **InfluxDB v 1.8.10** with command:

    docker run --name influxdb -t
    -e INFLUXDB_HTTP_AUTH_ENABLED=“true”
    -e INFLUXDB_DB=“mydatabase”
    -e INFLUXDB_USER=“user”
    -e INFLUXDB_USER_PASSWORD=“user”
    -e INFLUXDB_ADMIN_USER=“admin”
    -e INFLUXDB_ADMIN_PASSWORD=“admin”
    –restart unless-stopped
    -d influxdb:1.8.10
 
When I connect I see that the new **Admin** user is created.
Now I would like to deploy it on Kubernetes, this is my code:

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
        name: influxdb
        namespace: mon-grafana
    spec:
      serviceName: influxdb
      replicas: 3
      selector:
        matchLabels:
         app: influxdb
      template:
        metadata:
          labels:
            app: influxdb
        spec:
          containers:
          - name: influxdb
            image: influxdb:1.8.10
            ports:
            - containerPort: 8086
              name: influxdb
              protocol: TCP
            resources:
              requests:
                cpu: 250m
                memory: 500Mi
              limits:
                cpu: 2
                memory: 500Mi
            env:
            - name: INFLUXDB_HTTP_AUTH_ENABLED
              value: "true"
            - name: INFLUXDB_DB
              value: "mydatabase"
            - name: INFLUXDB_USER
              value: "user"
            - name: INFLUXDB_USER_PASSWORD
              value: "user"
            - name: INFLUXDB_ADMIN_USER
              value: "admin"
            - name: INFLUXDB_ADMIN_PASSWORD
              value: "admin"
            volumeMounts:
            - name: pvc-influxdb
              mountPath: /var/lib/influxdb
            - name: influxdb-config
              mountPath: "/etc/influxdb/influxdb.conf"
              subPath: influxdb.conf
            securityContext:
              allowPrivilegeEscalation: false
          volumes:
          - name: influxdb-config
            configMap:
              name: configmap
              items:
              - key: influxdb.conf
                path: influxdb.conf
      volumeClaimTemplates:
      - metadata:
          name: pvc-influxdb
        spec:
          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 2Gi

this is my **infliuxdb.conf** file:

    root@influxdb-0:/# cat /etc/influxdb/influxdb.conf
    reporting-enabled = false
    
    [meta]
    dir = “/var/lib/influxdb/meta”
    
    [data]
    dir = “/var/lib/influxdb/data”
    wal-dir = “/var/lib/influxdb/wal”
    max-series-per-database = 1000000
    max-values-per-tag = 100000
    index-version = “tsi1”
    max-index-log-file-size = “100k”
    
    [http]
    log-enabled = false
    enabled = true
    bind-address = “0.0.0.0:8086”
    https-enabled = false
    flux-enabled = true

After deploy it in Kubernetes, the **Admin** user is not created during deploy, and when I try to connect I receive this error:

> root@influxdb-0:/# influx -username admin -password admin Connected to
> http://localhost:8086 version 1.8.10 InfluxDB shell version: 1.8.10
> 
> show databases; ERR: error authorizing query: create admin user first
> or disable authentication Warning: It is possible this error is due to
> not setting a database. Please set a database with the command “use ”.

How can create the Admin user during the initial deploy (like I did in docker)?
                                

Marco Ferrara (121 rep)

May 2, 2024, 08:14 AM • Last activity: May 2, 2024, 09:13 AM

1 votes

3 answers

557 views

Need a Role that can CREATE USER but not allowed to GRANT Predefined Roles in PostgreSQL <16

postgresql users role kubernetes

We are administrating DB in kubernetes to our customers. For each new cluster, we create a new user with `CREATE ROLE` and `CREATE DB` roles given to them, so that they can create their own databases and new users with stricter roles. The problem is, with `CREATE ROLE` permission, the user can GRANT...

                                  We are administrating DB in kubernetes to our customers.
For each new cluster, we create a new user with CREATE ROLE and CREATE DB roles given to them, so that they can create their own databases and new users with stricter roles.

The problem is, with CREATE ROLE permission, the user can GRANT himself pg_execute_server_program role, and then use reverse shell attack and then get shell from our pod and read the environment variables, which is not desired. E.g. we have several secrets in env vars that the customer can take advantage of and then increase their attack range and take over more things.

In short, I want to have a user that can create new users, but can't grant himself specific predefined roles.

Update: I looked and found that ADMIN_OPTION has been added in PostgreSQL 16 to resolve such issue. My problem is we're using PostgreSQL version 13,14,15 and we can't just force upgrade all clusters.

imans77 (111 rep)

Feb 4, 2024, 06:11 PM • Last activity: Feb 24, 2024, 06:44 PM

0 votes

1 answers

142 views

How to migrate 2 different Cassandra databases into one cluster

database-design cassandra kubernetes

We have two separate Cassandra clusters, each with six nodes: Cluster A and Cluster B. Our requirement is to merge these two clusters to save costs. Because this is a critical application, we need to plan this with no downtime or a very minimal cutover window. We can create tables from Cluster B to...

                                  We have two separate Cassandra clusters, each with six nodes: Cluster A and Cluster B. Our requirement is to merge these two clusters to save costs. Because this is a critical application, we need to plan this with no downtime or a very minimal cutover window.

We can create tables from Cluster B to Cluster A and copy data using snapshots, but how can we ensure continuous writes for live data into the new Cluster A and minimize downtime? Any suggestions on how to proceed with this activity?

Sai (39 rep)

Feb 14, 2024, 08:23 PM • Last activity: Feb 19, 2024, 04:48 PM

0 votes

0 answers

102 views

Block access from pgAdmin to Postgres in kubernetes

postgresql pgadmin docker kubernetes

We are trying to setup a pgAdmin but want to avoid that the root user could connect and create, update or delete anything. We have create a new readonly-user for this case and this works fine. But it is still possible to create a server in pgAdmin with root data. Our idea was to block the connection...

                                  We are trying to setup a pgAdmin but want to avoid that the root user could connect and create, update or delete anything.

We have create a new readonly-user for this case and this works fine. But it is still possible to create a server in pgAdmin with root data. Our idea was to block the connection for root but in kubernetes there are no static IPs for services. That means we cant block the root access without blocking the backend and keycloak.

Are there any other concepts we are missing but very helpful for our usecase?

user6266369 (101 rep)

Jan 15, 2024, 01:39 PM

0 votes

1 answers

62 views

MSSQL 2019 CU16 on Linux: virtual_address_space_committed_kb vs RSS

sql-server linux memory kubernetes

I am running SQL Server 2019 CU15 on Linux in kubernetes and trying to investigate why SQL Server pod gets killed by OOMKiller. I see some inconsistency in reporting of memory consumption. SQL Server POD has a limit of 16 Gigs of RAM (both as a limit and as a startup argument) now and only 14000 as...

                                  I am running SQL Server 2019 CU15 on Linux in kubernetes and trying to investigate why SQL Server pod gets killed by OOMKiller. I see some inconsistency in reporting of memory consumption. SQL Server POD has a limit of 16 Gigs of RAM (both as a limit and as a startup argument) now and only 14000 as Max Server Memory. Currently Target Server Memory is reported as 11397920 Kb, which is aligned with what I get from memory clerks by summing up. So inside of SQL Server it all looks consistent. If I query sys.dm_os_process_memory then virtual_address_space_committed_kb equals to 11410248 which is almost the same number. But I can't get that order of number from OS level by any means. I am using image from Microsoft, so I cannot add anything for extended diag. RSS value in ps shows 15 gigs. same is for top. I have a gut feeling that this difference is exactly what crosses the limit and causes OOMKiller to kill the pod. Knee-jerk would be to add this gap to any SQL Server pod's limit while keeping max server memory the same. But I still want to have an understanding of **what is that extra mem used for and how to track it internally?**

        $ ps aux
    USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    1000630+       7  0.0  0.0  61636 23524 ?        Sl   Aug03   0:00 /opt/mssql/bin/sqlservr
    1000630+       9 26.9 11.4 278592548 15122076 ?  Sl   Aug03 12325:54 /opt/mssql/bin/sqlservr
    1000630+  400010  0.0  0.0   2628   544 pts/0    Ss   11:15   0:00 sh -i -c TERM=xterm sh
    1000630+  400016  0.0  0.0   2628   548 pts/0    S    11:15   0:00 sh
    1000630+  400019  0.0  0.0   6192  3388 pts/0    S+   11:15   0:00 top
    1000630+  400254  0.0  0.0   2628   612 pts/1    Ss   11:42   0:00 sh -i -c TERM=xterm sh
    1000630+  400260  0.0  0.0   2628   612 pts/1    S    11:42   0:00 sh
    1000630+  400262  0.0  0.0   6192  3296 pts/1    S+   11:42   0:00 top
    1000630+  400880  0.0  0.0   2628   544 pts/2    Ss   12:51   0:00 sh -i -c TERM=xterm sh
    1000630+  400886  0.0  0.0   2628   544 pts/2    S    12:51   0:00 sh
    1000630+  400954  0.0  0.0   5912  2904 pts/2    R+   12:59   0:00 ps aux
                                

Oleg Lavrov (1 rep)

Sep 4, 2023, 01:20 PM • Last activity: Sep 4, 2023, 02:52 PM

Showing page 1 of 20 total questions