Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
2
answers
51
views
Cassandra and Prometheus communication over docker compose
I am sorry to come back to this point, but it's still unsolved. Let me introduce the scenario. Working on Linux machine (3.10.0-1160.el7.x86_64), I am using docker compose to deploy different containers such as cassandra, kafka, prometheus, grafana, etc... After deploying the docker compose file, al...
I am sorry to come back to this point, but it's still unsolved. Let me introduce the scenario. Working on Linux machine (3.10.0-1160.el7.x86_64), I am using docker compose to deploy different containers such as cassandra, kafka, prometheus, grafana, etc... After deploying the docker compose file, all the containers look running:
docker ps --format "{{.Names}}: {{.Status}}"
jenkins-schema-registry-1: Up 41 minutes
jenkins-broker-1: Up 41 minutes
jenkins-storage-1: Up 41 minutes
jenkins-prometheus-1: Up 41 minutes
jenkins-acs-1: Up 41 minutes
jenkins-zookeeper-1: Up 41 minutes
jenkins-logaggregator-1: Up 41 minutes
jenkins-grafana-1: Up 41 minutes
jenkins-loki-1: Up 41 minutes
jenkins-promtail-1: Up 41 minutes
jenkins-sql_acadacdb-1: Up 41 minutes (healthy)
jenkins-opcuasimulatoraas-1: Up 41 minutes
jenkins-opcuasimulatormon-1: Up 41 minutes
jenkins-logsimulator-1: Up 41 minutes
jenkins-hmi_redis-1: Up 41 minutes
jenkins-mysql-1: Up 41 minutes (healthy)
jenkins-mongo-1: Up 41 minutes
In detail, here is the network configuration:
docker ps --format "{{.ID}}: {{.Names}} -> {{.Networks}}"
bef94007b3cc: jenkins-schema-registry-1 -> jenkins_monitoring
8c5dd97de847: jenkins-broker-1 -> jenkins_monitoring
525a4d21f146: jenkins-storage-1 -> jenkins_monitoring
790f5a91013b: jenkins-prometheus-1 -> jenkins_monitoring
cd1e964deed8: jenkins-acs-1 -> jenkins_default
315859268aa9: jenkins-zookeeper-1 -> jenkins_monitoring
a7229f21f3c5: jenkins-logaggregator-1 -> jenkins_monitoring
8ee8483ad5a0: jenkins-grafana-1 -> jenkins_monitoring
29f552f4d239: jenkins-loki-1 -> jenkins_loki-net,jenkins_monitoring
c08294688cec: jenkins-promtail-1 -> jenkins_loki-net,jenkins_monitoring
e3cf072659f0: jenkins-sql_acadacdb-1 -> jenkins_default
cb78b00c13fe: jenkins-opcuasimulatoraas-1 -> jenkins_default
01d046b685c8: jenkins-opcuasimulatormon-1 -> jenkins_default
7a978478f082: jenkins-logsimulator-1 -> jenkins_default
cd981c617974: jenkins-hmi_redis-1 -> jenkins_default
0ef9bee718a4: jenkins-mysql-1 -> jenkins_default
6cc8588a3910: jenkins-mongo-1 -> jenkins_default
And the exposed ports for each container:
docker ps --format "{{.ID}}: {{.Names}} -> {{.Ports}}"
bef94007b3cc: jenkins-schema-registry-1 -> 0.0.0.0:32800->8081/tcp, :::32800->8081/tcp
8c5dd97de847: jenkins-broker-1 -> 0.0.0.0:7072->7072/tcp, :::7072->7072/tcp, 0.0.0.0:9091->9091/tcp, :::9091->9091/tcp, 9092/tcp
525a4d21f146: jenkins-storage-1 -> 0.0.0.0:7000-7001->7000-7001/tcp, :::7000-7001->7000-7001/tcp, 0.0.0.0:7199->7199/tcp, :::7199->7199/tcp, 0.0.0.0:9042->9042/tcp, :::9042->9042/tcp, 0.0.0.0:9100->9100/tcp, :::9100->9100/tcp, 0.0.0.0:9160->9160/tcp, :::9160->9160/tcp
790f5a91013b: jenkins-prometheus-1 -> 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp
cd1e964deed8: jenkins-acs-1 ->
315859268aa9: jenkins-zookeeper-1 -> 2888/tcp, 3888/tcp, 0.0.0.0:32796->2181/tcp, :::32796->2181/tcp
a7229f21f3c5: jenkins-logaggregator-1 -> 0.0.0.0:32798->5044/tcp, :::32798->5044/tcp
8ee8483ad5a0: jenkins-grafana-1 -> 3000/tcp, 0.0.0.0:3210->3210/tcp, :::3210->3210/tcp
29f552f4d239: jenkins-loki-1 -> 0.0.0.0:3100->3100/tcp, :::3100->3100/tcp
c08294688cec: jenkins-promtail-1 ->
e3cf072659f0: jenkins-sql_acadacdb-1 -> 3306/tcp, 33060-33061/tcp
cb78b00c13fe: jenkins-opcuasimulatoraas-1 -> 0.0.0.0:32799->52522/tcp, :::32799->52522/tcp
01d046b685c8: jenkins-opcuasimulatormon-1 -> 0.0.0.0:32797->52520/tcp, :::32797->52520/tcp
7a978478f082: jenkins-logsimulator-1 ->
cd981c617974: jenkins-hmi_redis-1 -> 6379/tcp
0ef9bee718a4: jenkins-mysql-1 -> 3306/tcp, 33060-33061/tcp
6cc8588a3910: jenkins-mongo-1 -> 27017/tcp
Here are the prometheus, cassandra and kafka sections from the docker-compose file:
storage:
image: oci-reg-cta.zeuthen.desy.de/acada/loggingsystem/monstorage:lite
ports:
- "7000:7000" # Gossip communication
- "7001:7001" # Intra-node TLS
- "7199:7199" # JMX port
- "9042:9042" # Native transport
- "9160:9160" # Thrift service
- "9100:9100" # Prometheus JMX Exporter
volumes:
- storage_cassandra:/var/lib/cassandra
- ./jmx_prometheus_javaagent-0.15.0.jar:/opt/jmx_prometheus_javaagent.jar
- ./cassandra.yml:/opt/cassandra.yml
environment:
- JVM_OPTS=-javaagent:/opt/jmx_prometheus_javaagent.jar=0.0.0.0:7071:/opt/cassandra.yml -Dlog.level=DEBUG -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcassandra.jmx.remote.port=7199
cap_add:
- SYS_ADMIN
security_opt:
- seccomp:unconfined
networks:
- monitoring
broker:
image: oci-reg-cta.zeuthen.desy.de/acada/confluentinc/cp-kafka:5.4.0
depends_on:
- zookeeper
ports:
- "7072:7072" # Porta per Prometheus JMX Exporter
- "9091:9091" #Porta RMI
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 # must be set to 1 when running with a single-node cluster
KAFKA_DEFAULT_REPLICATION_FACTOR: 1 # cannot be larger than the number of brokers
KAFKA_NUM_PARTITIONS: 3 # default number of partitions per topic
KAFKA_OPTS: -javaagent:/opt/jmx_prometheus_javaagent.jar=0.0.0.0:7072:/opt/kafka.yml -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 -Dcom.sun.management.jmxremote.rmi.port=9091
KAFKA_CONFLUENT_SUPPORT_METRICS_ENABLE: "false"
#JVM_OPTS: -javaagent:/opt/jmx_prometheus_javaagent.jar=7072:/opt/kafka.yml
volumes:
- broker_data:/var/lib/kafka/data
- broker_secrets:/etc/kafka/secrets
- ./jmx_prometheus_javaagent-0.15.0.jar:/opt/jmx_prometheus_javaagent.jar
- ./kafka.yml:/opt/kafka.yml
networks:
- monitoring
prometheus:
image: prom/prometheus:v2.53.1
ports:
- "9090:9090" # Prometheus web interface and API (TCP)
volumes:
- type: bind
source: ./prometheus.yml
target: /etc/prometheus/prometheus.yml
networks:
- monitoring
And the .yaml files:
cassandra.yml
---
startDelaySeconds: 0
hostPort: 0.0.0.0:7199
username: xxxxx
password: xxxxx
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
whitelistObjectNames: ["org.apache.cassandra.metrics:*"]
rules:
- pattern: 'org.apache.cassandra.metricsValue'
name: cassandra_$1_$2
type: GAUGE
labels:
mylabel: "myvalue"
help: "Cassandra metric $1 $2"
- pattern: 'org.apache.cassandra.metricsValue'
name: cassandra_$1_$2_$3
type: GAUGE
help: "Cassandra metric $1 $2 $3"
- pattern: 'org.apache.cassandra.metricsValue'
name: cassandra_$1_$2_$3_$4
type: GAUGE
help: "Cassandra metric $1 $2 $3 $4"
prometheus.yml
global:
scrape_interval: 25s
scrape_timeout: 25s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090'] # Prometheus server
- job_name: 'storage'
metrics_path: /metrics
static_configs:
- targets: ['storage:7071'] # Storage/Cassandra JMX exporter
- job_name: 'broker'
static_configs:
- targets: ['broker:7072'] # Broker/Kafka JMX exporter
kafka.yml
#jmxUrl: "service:jmx:rmi:///jndi/rmi://localhost:7072/jmxrmi"
lowercaseOutputName: true
rules:
- pattern: kafka.serverValue
name: kafka_server_fetcher_bytes_consumed_total
labels:
client_id: "$1"
Querying Prometheus about the status of its targets containers, the situation looks normal:
curl -s http://localhost:9090/api/v1/targets | grep '"health":"up"'
{"status":"success","data":{"activeTargets":[{"discoveredLabels":{"__address__":"broker:7072","__metrics_path__":"/metrics","__scheme__":"http","__scrape_interval__":"25s","__scrape_timeout__":"25s","job":"broker"},"labels":{"instance":"broker:7072","job":"broker"},"scrapePool":"broker","scrapeUrl":"http://broker:7072/metrics ","globalUrl":"http://broker:7072/metrics ","lastError":"","lastScrape":"2025-01-28T06:13:23.641150824Z","lastScrapeDuration":0.089283854,"health":"up","scrapeInterval":"25s","scrapeTimeout":"25s"},{"discoveredLabels":{"__address__":"localhost:9090","__metrics_path__":"/metrics","__scheme__":"http","__scrape_interval__":"25s","__scrape_timeout__":"25s","job":"prometheus"},"labels":{"instance":"localhost:9090","job":"prometheus"},"scrapePool":"prometheus","scrapeUrl":"http://localhost:9090/metrics","globalUrl":"http://790f5a91013b:9090/metrics ","lastError":"","lastScrape":"2025-01-28T06:13:06.761760372Z","lastScrapeDuration":0.011126742,"health":"up","scrapeInterval":"25s","scrapeTimeout":"25s"},{"discoveredLabels":{"__address__":"storage:7071","__metrics_path__":"/metrics","__scheme__":"http","__scrape_interval__":"25s","__scrape_timeout__":"25s","job":"storage"},"labels":{"instance":"storage:7071","job":"storage"},"scrapePool":"storage","scrapeUrl":"http://storage:7071/metrics ","globalUrl":"http://storage:7071/metrics ","lastError":"","lastScrape":"2025-01-28T06:13:07.393065033Z","lastScrapeDuration":3.497353034,"health":"up","scrapeInterval":"25s","scrapeTimeout":"25s"}],"droppedTargets":[],"droppedTargetCounts":{"broker":0,"prometheus":0,"storage":0}}}
Checking for the Cassandra metrics ketch by Prometheus, I get (for example):
curl -s http://localhost:9090/api/v1/query?query=jvm_memory_bytes_used | grep 'result'
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"jvm_memory_bytes_used","area":"heap","instance":"broker:7072","job":"broker"},"value":[1738044886.425,"443680528"]},{"metric":{"__name__":"jvm_memory_bytes_used","area":"nonheap","instance":"broker:7072","job":"broker"},"value":[1738044886.425,"67814792"]},{"metric":{"__name__":"jvm_memory_bytes_used","area":"heap","instance":"storage:7071","job":"storage"},"value":[1738044886.425,"438304896"]},{"metric":{"__name__":"jvm_memory_bytes_used","area":"nonheap","instance":"storage:7071","job":"storage"},"value":[1738044886.425,"75872616"]}]}}
It seems Prometheus is working and communicating with Cassandra and Kafka.
But with the following commands (asking to a specific port) I don't get any result (exception done for port 7072 ... it seems just some java metrics and not SPECIFIC to Kafka):
[common@muoni-wn-15 jenkins]$ curl -s http://localhost:7071/metrics
[common@muoni-wn-15 jenkins]$ curl -s http://localhost:7199/metrics
[common@muoni-wn-15 jenkins]$ curl -s http://localhost:7071/metrics
[common@muoni-wn-15 jenkins]$ curl -s http://localhost:9091/metrics
[common@muoni-wn-15 jenkins]$ curl -s http://localhost:7072/metrics
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 6275.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 6275.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 0.0
So, I suspect there is some misconfiguration in the middle, because I am not sure: 1. the JMX are collecting information; 2. the information are what I want
And also, more severely... I got the following exceptions:
KAFKA -->
[common@muoni-wn-15 jenkins]$ docker compose -f docker-compose.yml.prometheus-cassandra.v4.yml exec broker bash -l
root@3b8e9d856d3f:/# kafka-topics --bootstrap-server localhost:29092 --list
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.net.httpserver.ServerImpl.(ServerImpl.java:100)
at sun.net.httpserver.HttpServerImpl.(HttpServerImpl.java:50)
at sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35)
at com.sun.net.httpserver.HttpServer.create(HttpServer.java:130)
at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.(HTTPServer.java:179)
at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31)
... 6 more
FATAL ERROR in native method: processing of -javaagent failed
Aborted (core dumped)
CASSANDRA -->
[common@muoni-wn-15 jenkins]$ docker compose -f docker-compose.yml.prometheus-cassandra.v3.yml exec storage bash -l
root@a1c1e2c5e95c:/# nodetool status
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:461)
at sun.nio.ch.Net.bind(Net.java:453)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
at sun.net.httpserver.ServerImpl.(ServerImpl.java:100)
at sun.net.httpserver.HttpServerImpl.(HttpServerImpl.java:50)
at sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35)
at com.sun.net.httpserver.HttpServer.create(HttpServer.java:130)
at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.(HTTPServer.java:179)
at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31)
... 6 more
FATAL ERROR in native method: processing of -javaagent failed
Aborted (core dumped)
I hope someone will help me to leave this dark tunnel ...
Best regards,
Emilio
Hello All,
recently I decided to simply the docker compose file to identify the error source. This is the storage section:
storage:
image: oci-reg-cta.zeuthen.desy.de/acada/loggingsystem/monstorage:lite
ports:
- "1234:1234" # JMX Exporter HTTP port
- "7000:7000" # Gossip
- "7001:7001" # TLS
- "7199:7199" # JMX port (Cassandra native)
- "7198:7198" # RMI registry port (newly added)
- "9042:9042" # CQL
volumes:
- storage_cassandra:/var/lib/cassandra
- ./jmx_prometheus_javaagent-0.15.0.jar:/opt/jmx_prometheus_javaagent.jar
- ./cassandra.v5.yml:/opt/cassandra.v5.yml
environment:
- JVM_OPTS=
-javaagent:/opt/jmx_prometheus_javaagent.jar=1234:/opt/cassandra.v5.yml
-Dcassandra.jmx.remote.port=7199
-Dcom.sun.management.jmxremote.rmi.port=7198
-Djava.rmi.server.hostname=storage
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
and this is the cassandra.v5.yml
startDelaySeconds: 0
lowercaseOutputName: false
lowercaseOutputLabelNames: false
whitelistObjectNames: ["org.apache.cassandra.metrics:*"]
rules:
- pattern: 'org.apache.cassandra.metricsValue'
name: cassandra_$1_$2
type: GAUGE
help: "Cassandra metric $1 $2"
- pattern: 'org.apache.cassandra.metricsValue'
name: cassandra_$1_$2_$3
type: GAUGE
help: "Cassandra metric $1 $2 $3"
And this is the prometheus.v5.yml
global:
scrape_interval: 25s
scrape_timeout: 25s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090'] # Prometheus server
- job_name: 'storage'
metrics_path: /metrics
static_configs:
- targets: ['storage:1234'] # Storage/Cassandra JMX exporter
Using this configuration I can scrape the cassandra metrics exposed at http://localhost:1234/metrics , and there's no problem to query the kafka container, as you can see:
[common@muoni-wn-15 jenkins]$ docker exec -it jenkins-broker-1 kafka-topics --bootstrap-server localhost:29092 --list
__confluent.support.metrics
_schemas
logStorage
But when I try to invoke the nodetool command inside the storage container, I get always the same error ... never mind the port I use (the error is the same even if I don't specify the -p 7197 option of if I change the jmx_prometheus_javaagent.jar port):
[common@muoni-wn-15 jenkins]$ docker exec -it jenkins-storage-1 nodetool -p 7197 status
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:461)
at sun.nio.ch.Net.bind(Net.java:453)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
at sun.net.httpserver.ServerImpl.(ServerImpl.java:100)
at sun.net.httpserver.HttpServerImpl.(HttpServerImpl.java:50)
at sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35)
at com.sun.net.httpserver.HttpServer.create(HttpServer.java:130)
at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.(HTTPServer.java:179)
at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31)
... 6 more
FATAL ERROR in native method: processing of -javaagent failed
Aborted (core dumped)
Please, I hope someone will help me to leave this tunnel ...
Hello again, I checked the logs as suggested by piotr:
root@8bf0bcba72c7:/var/log/cassandra# grep 1234 system.log
INFO [main] 2025-02-05 07:09:45,973 CassandraDaemon.java:507 - JVM Arguments: [-javaagent:/opt/jmx_prometheus_javaagent.jar=1234:/opt/cassandra.v5.yml, -Dcassandra.jmx.remote.port=7199, -Dcom.sun.management.jmxremote.rmi.port=7198, -Djava.rmi.server.hostname=storage, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.ssl=false, -Xloggc:/opt/cassandra/logs/gc.log, -ea, -XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42, -XX:+HeapDumpOnOutOfMemoryError, -Xss256k, -XX:StringTableSize=1000003, -XX:+AlwaysPreTouch, -XX:-UseBiasedLocking, -XX:+UseTLAB, -XX:+ResizeTLAB, -XX:+UseNUMA, -XX:+PerfDisableSharedMem, -Djava.net.preferIPv4Stack=true, -Xms1G, -Xmx1G, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:CMSWaitDuration=10000, -XX:+CMSParallelInitialMarkEnabled, -XX:+CMSEdenChunksRecordAlways, -XX:+CMSClassUnloadingEnabled, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintHeapAtGC, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -XX:+PrintPromotionFailure, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=10, -XX:GCLogFileSize=10M, -Xmn2048M, -XX:+UseCondCardMark, -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler, -javaagent:/opt/cassandra/lib/jamm-0.3.0.jar, -Dcassandra.jmx.local.port=7199, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password, -Djava.library.path=/opt/cassandra/lib/sigar-bin, -Dcassandra.libjemalloc=/usr/local/lib/libjemalloc.so, -XX:OnOutOfMemoryError=kill -9 %p, -Dlogback.configurationFile=logback.xml, -Dcassandra.logdir=/opt/cassandra/logs, -Dcassandra.storagedir=/opt/cassandra/data, -Dcassandra-foreground=yes]
showing the configuration file is loaded (if I am not wrong), and the process looks running and waiting:
root@8bf0bcba72c7:/var/log/cassandra# ss -tulp
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
udp UNCONN 0 0 127.0.0.11:43202 0.0.0.0:*
tcp LISTEN 0 50 0.0.0.0:7198 0.0.0.0:*
tcp LISTEN 0 50 0.0.0.0:7199 0.0.0.0:*
tcp LISTEN 0 128 127.0.0.11:36235 0.0.0.0:*
tcp LISTEN 0 128 0.0.0.0:9042 0.0.0.0:*
tcp LISTEN 0 3 0.0.0.0:1234 0.0.0.0:*
tcp LISTEN 0 128 172.31.0.8:7000 0.0.0.0:*
Of course, both checks have be done inside the cassandra container.
Emilio Mastriani
(1 rep)
Jan 28, 2025, 06:44 AM
• Last activity: Feb 6, 2025, 08:35 AM
0
votes
1
answers
65
views
Prometheus cannot connect to Cassandra, both running as containers on the same Docker network
I hope someone will help me. I am trying to run cassandra and prometheus as containers inside of my docker compose, with many other containers. The docker starts up with no problems, but the prometheus and the storage containers are not able to communicate. Here are some logs (JSON output formatted...
I hope someone will help me.
I am trying to run cassandra and prometheus as containers inside of my docker compose, with many other containers.
The docker starts up with no problems, but the prometheus and the storage containers are not able to communicate.
Here are some logs (JSON output formatted for readability):
$ curl -s http://localhost:9090/api/v1/targets | grep '"health":"up"'
{
"status":"success",
"data":{
"activeTargets":[
{
"discoveredLabels":{
"__address__":"broker:7072",
"__metrics_path__":"/metrics",
"__scheme__":"http",
"__scrape_interval__":"25s",
"__scrape_timeout__":"25s",
"job":"broker"
},
"labels":{
"instance":"broker:7072",
"job":"broker"
},
"scrapePool":"broker",
"scrapeUrl":"http://broker:7072/metrics ",
"globalUrl":"http://broker:7072/metrics ",
"lastError":"",
"lastScrape":"2025-01-20T07:23:35.117923579Z",
"lastScrapeDuration":0.076186592,
"health":"up",
"scrapeInterval":"25s",
"scrapeTimeout":"25s"
},
{
"discoveredLabels":{
"__address__":"localhost:9090",
"__metrics_path__":"/metrics",
"__scheme__":"http",
"__scrape_interval__":"25s",
"__scrape_timeout__":"25s",
"job":"prometheus"
},
"labels":{
"instance":"localhost:9090",
"job":"prometheus"
},
"scrapePool":"prometheus",
"scrapeUrl":"http://localhost:9090/metrics",
"globalUrl":"http://e71775d91730:9090/metrics ",
"lastError":"",
"lastScrape":"2025-01-20T07:23:32.411864452Z",
"lastScrapeDuration":0.007709083,
"health":"up",
"scrapeInterval":"25s",
"scrapeTimeout":"25s"
},
{
"discoveredLabels":{
"__address__":"storage:7200",
"__metrics_path__":"/metrics",
"__scheme__":"http",
"__scrape_interval__":"25s",
"__scrape_timeout__":"25s",
"job":"storage"
},
"labels":{
"instance":"storage:7200",
"job":"storage"
},
"scrapePool":"storage",
"scrapeUrl":"http://storage:7200/metrics ",
"globalUrl":"http://storage:7200/metrics ",
"lastError":"Get \"http://storage:7200/metrics\ ": context deadline exceeded",
"lastScrape":"2025-01-20T07:23:25.989659286Z",
"lastScrapeDuration":25.001000789,
"health":"down",
"scrapeInterval":"25s",
"scrapeTimeout":"25s"
}
],
"droppedTargets":[
],
"droppedTargetCounts":{
"broker":0,
"prometheus":0,
"storage":0
}
}
}
As we can see, prometheus is able to scrape other containers, but not the storage one.
Of course, the storage container is up and running:
$ docker ps | grep storage
61d26bad890e oci-reg-cta.zeuthen.desy.de/acada/loggingsystem/monstorage:lite "docker-entrypoint.s…" About an hour ago Up 41 minutes 0.0.0.0:7000-7001->7000-7001/tcp, :::7000-7001->7000-7001/tcp, 0.0.0.0:7200->7200/tcp, :::7200->7200/tcp, 0.0.0.0:9042->9042/tcp, :::9042->9042/tcp, 0.0.0.0:9100->9100/tcp, :::9100->9100/tcp, 0.0.0.0:9160->9160/tcp, :::9160->9160/tcp, 7199/tcp jenkins-storage-1
And both the containers belong to the same network:
[common@muoni-wn-15 jenkins]$ docker ps --format "{{.ID}}: {{.Names}} ->
{{.Networks}}"
61d26bad890e: jenkins-storage-1 -> jenkins_monitoring
e71775d91730: jenkins-prometheus-1 -> jenkins_monitoring
Reading the jenkins-log file I got the following warning:
WARN [main] 2025-01-20 06:40:44,058 StartupChecks.java:169 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
And, repeated many times, the following error:
INFO [main] 2025-01-20 06:40:58,881 CassandraDaemon.java:650 - Startup complete
Jan 20, 2025 6:41:55 AM io.prometheus.jmx.shaded.io.prometheus.jmx.JmxCollector collect
SEVERE: JMX scrape failed: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is:
java.net.SocketTimeoutException: Read timed out]
Here are the involved part of the docker-compose file:
storage:
image: oci-reg-cta.zeuthen.desy.de/acada/loggingsystem/monstorage:lite
ports:
- "7000:7000" # Gossip communication
- "7001:7001" # Intra-node TLS
- "7200:7200" # JMX port
- "9042:9042" # Native transport
- "9160:9160" # Thrift service
- "9100:9100" # Prometheus JMX Exporter
volumes:
- storage_cassandra:/var/lib/cassandra
- ./jmx_prometheus_javaagent-0.15.0.jar:/opt/jmx_prometheus_javaagent.jar
- ./cassandra.yml:/opt/cassandra.yml
environment:
- JVM_OPTS=-javaagent:/opt/jmx_prometheus_javaagent.jar=0.0.0.0:7200:/opt/cassandra.yml -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 -Dcom.sun.management.jmxremote.rmi.port=7200 -Dcassandra.jmx.remote.port=7200
cap_add:
- SYS_ADMIN
security_opt:
- seccomp:unconfined
networks:
- monitoring
and the cassandra.yml file:
startDelaySeconds: 0
hostPort: 0.0.0.0:7200
username: xxxxxxxxx
password: xxxxxxxxx
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
whitelistObjectNames: ["org.apache.cassandra.metrics:*"]
rules:
- pattern: 'org.apache.cassandra.metricsValue'
name: cassandra_$1_$2
type: GAUGE
labels:
mylabel: "myvalue"
help: "Cassandra metric $1 $2"
- pattern: 'org.apache.cassandra.metricsValue'
name: cassandra_$1_$2_$3
type: GAUGE
help: "Cassandra metric $1 $2 $3"
- pattern: 'org.apache.cassandra.metricsValue'
name: cassandra_$1_$2_$3_$4
type: GAUGE
help: "Cassandra metric $1 $2 $3 $4"
After running the storage container, all the files are placed in the expected folder (/opt):
root@61d26bad890e:/opt# pwd
/opt
root@61d26bad890e:/opt# ll
total 416
drwxr-xr-x. 1 root root 63 Jan 20 06:19 ./
drwxr-xr-x. 1 root root 62 Jan 20 06:19 ../
drwxr-xr-x. 9 root root 232 Jun 18 2021 cassandra/
-rw-rw-r--. 1 1003 1003 775 Dec 4 03:18 cassandra.yml
drwxr-xr-x. 3 root root 21 Jun 18 2021 java/
-rw-rw-r--. 1 1003 1003 418240 Jan 25 2021 jmx_prometheus_javaagent.jar
Docker is running on Linux 61d26bad890e 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux.
Honestly, I am not expert on JMX communication, so probably I have done some misconfiguration. I am sure someone will help me.
Thank you in advance for your support.
Emilio
According to the Erick comments, I report some information about the networking status of the storage container and other ones.
About the storage container, I checked the open ports and connections:
[common@muoni-wn-15 ~]$ docker inspect -f '{{.State.Pid}}' jenkins-storage-1
29964
[common@muoni-wn-15 ~]$ sudo nsenter -t 29964 -n netstat
[sudo] password for common:
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 muoni-wn-15.oact.:fodms 172.23.0.6:45414 ESTABLISHED
tcp 0 0 muoni-wn-15.oact.:51880 muoni-wn-15.oact.:fodms TIME_WAIT
tcp 1 0 muoni-wn-15.oact.:fodms 172.23.0.6:45408 CLOSE_WAIT
tcp 0 0 muoni-wn-15.oact.:51886 muoni-wn-15.oact.:fodms ESTABLISHED
tcp 0 0 muoni-wn-15.oact.:51892 muoni-wn-15.oact.:fodms ESTABLISHED
tcp 0 0 muoni-wn-15.oact.:51870 muoni-wn-15.oact.:fodms TIME_WAIT
tcp 0 0 muoni-wn-15.oact.:fodms muoni-wn-15.oact.:51892 ESTABLISHED
tcp 0 0 muoni-wn-15.oact.:fodms muoni-wn-15.oact.:51886 ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ] STREAM CONNECTED 489307
unix 2 [ ] STREAM CONNECTED 489251
Indeed, I cannot see any service waiting for connection at port 7200
Checking for the network configuration, here is the mapping for every container:
[common@muoni-wn-15 ~]$ docker ps --format "{{.ID}}: {{.Names}} -> {{.Ports}}"
291d28956c5b: jenkins-schema-registry-1 -> 0.0.0.0:32772->8081/tcp, :::32772->8081/tcp
2577ff5c41bf: jenkins-broker-1 -> 0.0.0.0:7072->7072/tcp, :::7072->7072/tcp, 0.0.0.0:9091->9091/tcp, :::9091->9091/tcp, 9092/tcp
61d26bad890e: jenkins-storage-1 -> 0.0.0.0:7000-7001->7000-7001/tcp, :::7000-7001->7000-7001/tcp, 0.0.0.0:7200->7200/tcp, :::7200->7200/tcp, 0.0.0.0:9042->9042/tcp, :::9042->9042/tcp, 0.0.0.0:9100->9100/tcp, :::9100->9100/tcp, 0.0.0.0:9160->9160/tcp, :::9160->9160/tcp, 7199/tcp
e71775d91730: jenkins-prometheus-1 -> 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp
cd1e964deed8: jenkins-acs-1 ->
315859268aa9: jenkins-zookeeper-1 -> 2888/tcp, 3888/tcp, 0.0.0.0:32769->2181/tcp, :::32769->2181/tcp
a7229f21f3c5: jenkins-logaggregator-1 -> 0.0.0.0:32771->5044/tcp, :::32771->5044/tcp
8ee8483ad5a0: jenkins-grafana-1 -> 3000/tcp, 0.0.0.0:3210->3210/tcp, :::3210->3210/tcp
29f552f4d239: jenkins-loki-1 -> 0.0.0.0:3100->3100/tcp, :::3100->3100/tcp
c08294688cec: jenkins-promtail-1 ->
e3cf072659f0: jenkins-sql_acadacdb-1 -> 3306/tcp, 33060-33061/tcp
cb78b00c13fe: jenkins-opcuasimulatoraas-1 -> 0.0.0.0:32768->52522/tcp, :::32768->52522/tcp
01d046b685c8: jenkins-opcuasimulatormon-1 -> 0.0.0.0:32770->52520/tcp, :::32770->52520/tcp
7a978478f082: jenkins-logsimulator-1 ->
cd981c617974: jenkins-hmi_redis-1 -> 6379/tcp
0ef9bee718a4: jenkins-mysql-1 -> 3306/tcp, 33060-33061/tcp
6cc8588a3910: jenkins-mongo-1 -> 27017/tcp
I tought the lines "- JVM_OPTS=-javaagent:/opt/jmx_prometheus_javaagent.jar=0.0.0.0:7200:/opt/cassandra.yml -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 -Dcom.sun.management.jmxremote.rmi.port=7200 -Dcassandra.jmx.remote.port=7200" and ""7200:7200" # JMX Port"
in my docker-compose file and the line "hostPort: 0.0.0.0:7200"
in the cassandra.yml file should be enough to configure the JMX port for connection. But, unfortunally I not expert on JMX too.
Maybe the simplest solution will be to set the JMX port to 7199, the default one (as you mentioned). Anyway, I appreciate any help.
Emilio
NB.:
I deployed both cassandra and prometheus with the docker-compose file. Here is the prometheus section
prometheus:
image: prom/prometheus:v2.53.1
ports:
- "9090:9090" # Prometheus web interface and API (TCP)
#volumes:
# - ./prometheus.yml:/etc/prometheus/prometheus.yml
#command:
# - '--config.file=/etc/prometheus.yml'
volumes:
- type: bind
source: ./prometheus.yml
target: /etc/prometheus/prometheus.yml
networks:
- monitoring prometheus.yml
Here is the prometheus.yml
global:
scrape_interval: 25s
scrape_timeout: 25s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090'] # Prometheus server
- job_name: 'storage'
static_configs:
- targets: ['storage:7200'] # Storage/Cassandra JMX exporter
- job_name: 'broker'
static_configs:
- targets: ['broker:7072'] # Broker/Kafka JMX exporter
Emilio Mastriani
(1 rep)
Jan 20, 2025, 07:50 AM
• Last activity: Jan 21, 2025, 03:03 AM
Showing page 1 of 2 total questions