Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
2 answers
51 views
Cassandra and Prometheus communication over docker compose
I am sorry to come back to this point, but it's still unsolved. Let me introduce the scenario. Working on Linux machine (3.10.0-1160.el7.x86_64), I am using docker compose to deploy different containers such as cassandra, kafka, prometheus, grafana, etc... After deploying the docker compose file, al...
I am sorry to come back to this point, but it's still unsolved. Let me introduce the scenario. Working on Linux machine (3.10.0-1160.el7.x86_64), I am using docker compose to deploy different containers such as cassandra, kafka, prometheus, grafana, etc... After deploying the docker compose file, all the containers look running: docker ps --format "{{.Names}}: {{.Status}}" jenkins-schema-registry-1: Up 41 minutes jenkins-broker-1: Up 41 minutes jenkins-storage-1: Up 41 minutes jenkins-prometheus-1: Up 41 minutes jenkins-acs-1: Up 41 minutes jenkins-zookeeper-1: Up 41 minutes jenkins-logaggregator-1: Up 41 minutes jenkins-grafana-1: Up 41 minutes jenkins-loki-1: Up 41 minutes jenkins-promtail-1: Up 41 minutes jenkins-sql_acadacdb-1: Up 41 minutes (healthy) jenkins-opcuasimulatoraas-1: Up 41 minutes jenkins-opcuasimulatormon-1: Up 41 minutes jenkins-logsimulator-1: Up 41 minutes jenkins-hmi_redis-1: Up 41 minutes jenkins-mysql-1: Up 41 minutes (healthy) jenkins-mongo-1: Up 41 minutes In detail, here is the network configuration: docker ps --format "{{.ID}}: {{.Names}} -> {{.Networks}}" bef94007b3cc: jenkins-schema-registry-1 -> jenkins_monitoring 8c5dd97de847: jenkins-broker-1 -> jenkins_monitoring 525a4d21f146: jenkins-storage-1 -> jenkins_monitoring 790f5a91013b: jenkins-prometheus-1 -> jenkins_monitoring cd1e964deed8: jenkins-acs-1 -> jenkins_default 315859268aa9: jenkins-zookeeper-1 -> jenkins_monitoring a7229f21f3c5: jenkins-logaggregator-1 -> jenkins_monitoring 8ee8483ad5a0: jenkins-grafana-1 -> jenkins_monitoring 29f552f4d239: jenkins-loki-1 -> jenkins_loki-net,jenkins_monitoring c08294688cec: jenkins-promtail-1 -> jenkins_loki-net,jenkins_monitoring e3cf072659f0: jenkins-sql_acadacdb-1 -> jenkins_default cb78b00c13fe: jenkins-opcuasimulatoraas-1 -> jenkins_default 01d046b685c8: jenkins-opcuasimulatormon-1 -> jenkins_default 7a978478f082: jenkins-logsimulator-1 -> jenkins_default cd981c617974: jenkins-hmi_redis-1 -> jenkins_default 0ef9bee718a4: jenkins-mysql-1 -> jenkins_default 6cc8588a3910: jenkins-mongo-1 -> jenkins_default And the exposed ports for each container: docker ps --format "{{.ID}}: {{.Names}} -> {{.Ports}}" bef94007b3cc: jenkins-schema-registry-1 -> 0.0.0.0:32800->8081/tcp, :::32800->8081/tcp 8c5dd97de847: jenkins-broker-1 -> 0.0.0.0:7072->7072/tcp, :::7072->7072/tcp, 0.0.0.0:9091->9091/tcp, :::9091->9091/tcp, 9092/tcp 525a4d21f146: jenkins-storage-1 -> 0.0.0.0:7000-7001->7000-7001/tcp, :::7000-7001->7000-7001/tcp, 0.0.0.0:7199->7199/tcp, :::7199->7199/tcp, 0.0.0.0:9042->9042/tcp, :::9042->9042/tcp, 0.0.0.0:9100->9100/tcp, :::9100->9100/tcp, 0.0.0.0:9160->9160/tcp, :::9160->9160/tcp 790f5a91013b: jenkins-prometheus-1 -> 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp cd1e964deed8: jenkins-acs-1 -> 315859268aa9: jenkins-zookeeper-1 -> 2888/tcp, 3888/tcp, 0.0.0.0:32796->2181/tcp, :::32796->2181/tcp a7229f21f3c5: jenkins-logaggregator-1 -> 0.0.0.0:32798->5044/tcp, :::32798->5044/tcp 8ee8483ad5a0: jenkins-grafana-1 -> 3000/tcp, 0.0.0.0:3210->3210/tcp, :::3210->3210/tcp 29f552f4d239: jenkins-loki-1 -> 0.0.0.0:3100->3100/tcp, :::3100->3100/tcp c08294688cec: jenkins-promtail-1 -> e3cf072659f0: jenkins-sql_acadacdb-1 -> 3306/tcp, 33060-33061/tcp cb78b00c13fe: jenkins-opcuasimulatoraas-1 -> 0.0.0.0:32799->52522/tcp, :::32799->52522/tcp 01d046b685c8: jenkins-opcuasimulatormon-1 -> 0.0.0.0:32797->52520/tcp, :::32797->52520/tcp 7a978478f082: jenkins-logsimulator-1 -> cd981c617974: jenkins-hmi_redis-1 -> 6379/tcp 0ef9bee718a4: jenkins-mysql-1 -> 3306/tcp, 33060-33061/tcp 6cc8588a3910: jenkins-mongo-1 -> 27017/tcp Here are the prometheus, cassandra and kafka sections from the docker-compose file: storage: image: oci-reg-cta.zeuthen.desy.de/acada/loggingsystem/monstorage:lite ports: - "7000:7000" # Gossip communication - "7001:7001" # Intra-node TLS - "7199:7199" # JMX port - "9042:9042" # Native transport - "9160:9160" # Thrift service - "9100:9100" # Prometheus JMX Exporter volumes: - storage_cassandra:/var/lib/cassandra - ./jmx_prometheus_javaagent-0.15.0.jar:/opt/jmx_prometheus_javaagent.jar - ./cassandra.yml:/opt/cassandra.yml environment: - JVM_OPTS=-javaagent:/opt/jmx_prometheus_javaagent.jar=0.0.0.0:7071:/opt/cassandra.yml -Dlog.level=DEBUG -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcassandra.jmx.remote.port=7199 cap_add: - SYS_ADMIN security_opt: - seccomp:unconfined networks: - monitoring broker: image: oci-reg-cta.zeuthen.desy.de/acada/confluentinc/cp-kafka:5.4.0 depends_on: - zookeeper ports: - "7072:7072" # Porta per Prometheus JMX Exporter - "9091:9091" #Porta RMI environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092 KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 # must be set to 1 when running with a single-node cluster KAFKA_DEFAULT_REPLICATION_FACTOR: 1 # cannot be larger than the number of brokers KAFKA_NUM_PARTITIONS: 3 # default number of partitions per topic KAFKA_OPTS: -javaagent:/opt/jmx_prometheus_javaagent.jar=0.0.0.0:7072:/opt/kafka.yml -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 -Dcom.sun.management.jmxremote.rmi.port=9091 KAFKA_CONFLUENT_SUPPORT_METRICS_ENABLE: "false" #JVM_OPTS: -javaagent:/opt/jmx_prometheus_javaagent.jar=7072:/opt/kafka.yml volumes: - broker_data:/var/lib/kafka/data - broker_secrets:/etc/kafka/secrets - ./jmx_prometheus_javaagent-0.15.0.jar:/opt/jmx_prometheus_javaagent.jar - ./kafka.yml:/opt/kafka.yml networks: - monitoring prometheus: image: prom/prometheus:v2.53.1 ports: - "9090:9090" # Prometheus web interface and API (TCP) volumes: - type: bind source: ./prometheus.yml target: /etc/prometheus/prometheus.yml networks: - monitoring And the .yaml files: cassandra.yml --- startDelaySeconds: 0 hostPort: 0.0.0.0:7199 username: xxxxx password: xxxxx ssl: false lowercaseOutputName: false lowercaseOutputLabelNames: false whitelistObjectNames: ["org.apache.cassandra.metrics:*"] rules: - pattern: 'org.apache.cassandra.metricsValue' name: cassandra_$1_$2 type: GAUGE labels: mylabel: "myvalue" help: "Cassandra metric $1 $2" - pattern: 'org.apache.cassandra.metricsValue' name: cassandra_$1_$2_$3 type: GAUGE help: "Cassandra metric $1 $2 $3" - pattern: 'org.apache.cassandra.metricsValue' name: cassandra_$1_$2_$3_$4 type: GAUGE help: "Cassandra metric $1 $2 $3 $4" prometheus.yml global: scrape_interval: 25s scrape_timeout: 25s scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] # Prometheus server - job_name: 'storage' metrics_path: /metrics static_configs: - targets: ['storage:7071'] # Storage/Cassandra JMX exporter - job_name: 'broker' static_configs: - targets: ['broker:7072'] # Broker/Kafka JMX exporter kafka.yml #jmxUrl: "service:jmx:rmi:///jndi/rmi://localhost:7072/jmxrmi" lowercaseOutputName: true rules: - pattern: kafka.serverValue name: kafka_server_fetcher_bytes_consumed_total labels: client_id: "$1" Querying Prometheus about the status of its targets containers, the situation looks normal: curl -s http://localhost:9090/api/v1/targets | grep '"health":"up"' {"status":"success","data":{"activeTargets":[{"discoveredLabels":{"__address__":"broker:7072","__metrics_path__":"/metrics","__scheme__":"http","__scrape_interval__":"25s","__scrape_timeout__":"25s","job":"broker"},"labels":{"instance":"broker:7072","job":"broker"},"scrapePool":"broker","scrapeUrl":"http://broker:7072/metrics ","globalUrl":"http://broker:7072/metrics ","lastError":"","lastScrape":"2025-01-28T06:13:23.641150824Z","lastScrapeDuration":0.089283854,"health":"up","scrapeInterval":"25s","scrapeTimeout":"25s"},{"discoveredLabels":{"__address__":"localhost:9090","__metrics_path__":"/metrics","__scheme__":"http","__scrape_interval__":"25s","__scrape_timeout__":"25s","job":"prometheus"},"labels":{"instance":"localhost:9090","job":"prometheus"},"scrapePool":"prometheus","scrapeUrl":"http://localhost:9090/metrics","globalUrl":"http://790f5a91013b:9090/metrics ","lastError":"","lastScrape":"2025-01-28T06:13:06.761760372Z","lastScrapeDuration":0.011126742,"health":"up","scrapeInterval":"25s","scrapeTimeout":"25s"},{"discoveredLabels":{"__address__":"storage:7071","__metrics_path__":"/metrics","__scheme__":"http","__scrape_interval__":"25s","__scrape_timeout__":"25s","job":"storage"},"labels":{"instance":"storage:7071","job":"storage"},"scrapePool":"storage","scrapeUrl":"http://storage:7071/metrics ","globalUrl":"http://storage:7071/metrics ","lastError":"","lastScrape":"2025-01-28T06:13:07.393065033Z","lastScrapeDuration":3.497353034,"health":"up","scrapeInterval":"25s","scrapeTimeout":"25s"}],"droppedTargets":[],"droppedTargetCounts":{"broker":0,"prometheus":0,"storage":0}}} Checking for the Cassandra metrics ketch by Prometheus, I get (for example): curl -s http://localhost:9090/api/v1/query?query=jvm_memory_bytes_used | grep 'result' {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"jvm_memory_bytes_used","area":"heap","instance":"broker:7072","job":"broker"},"value":[1738044886.425,"443680528"]},{"metric":{"__name__":"jvm_memory_bytes_used","area":"nonheap","instance":"broker:7072","job":"broker"},"value":[1738044886.425,"67814792"]},{"metric":{"__name__":"jvm_memory_bytes_used","area":"heap","instance":"storage:7071","job":"storage"},"value":[1738044886.425,"438304896"]},{"metric":{"__name__":"jvm_memory_bytes_used","area":"nonheap","instance":"storage:7071","job":"storage"},"value":[1738044886.425,"75872616"]}]}} It seems Prometheus is working and communicating with Cassandra and Kafka. But with the following commands (asking to a specific port) I don't get any result (exception done for port 7072 ... it seems just some java metrics and not SPECIFIC to Kafka): [common@muoni-wn-15 jenkins]$ curl -s http://localhost:7071/metrics [common@muoni-wn-15 jenkins]$ curl -s http://localhost:7199/metrics [common@muoni-wn-15 jenkins]$ curl -s http://localhost:7071/metrics [common@muoni-wn-15 jenkins]$ curl -s http://localhost:9091/metrics [common@muoni-wn-15 jenkins]$ curl -s http://localhost:7072/metrics # HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded. # TYPE jmx_config_reload_success_total counter jmx_config_reload_success_total 0.0 # HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM # TYPE jvm_classes_loaded gauge jvm_classes_loaded 6275.0 # HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution # TYPE jvm_classes_loaded_total counter jvm_classes_loaded_total 6275.0 # HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution # TYPE jvm_classes_unloaded_total counter jvm_classes_unloaded_total 0.0 So, I suspect there is some misconfiguration in the middle, because I am not sure: 1. the JMX are collecting information; 2. the information are what I want And also, more severely... I got the following exceptions: KAFKA --> [common@muoni-wn-15 jenkins]$ docker compose -f docker-compose.yml.prometheus-cassandra.v4.yml exec broker bash -l root@3b8e9d856d3f:/# kafka-topics --bootstrap-server localhost:29092 --list Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386) at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.net.httpserver.ServerImpl.(ServerImpl.java:100) at sun.net.httpserver.HttpServerImpl.(HttpServerImpl.java:50) at sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35) at com.sun.net.httpserver.HttpServer.create(HttpServer.java:130) at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.(HTTPServer.java:179) at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31) ... 6 more FATAL ERROR in native method: processing of -javaagent failed Aborted (core dumped) CASSANDRA --> [common@muoni-wn-15 jenkins]$ docker compose -f docker-compose.yml.prometheus-cassandra.v3.yml exec storage bash -l root@a1c1e2c5e95c:/# nodetool status Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386) at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:461) at sun.nio.ch.Net.bind(Net.java:453) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85) at sun.net.httpserver.ServerImpl.(ServerImpl.java:100) at sun.net.httpserver.HttpServerImpl.(HttpServerImpl.java:50) at sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35) at com.sun.net.httpserver.HttpServer.create(HttpServer.java:130) at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.(HTTPServer.java:179) at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31) ... 6 more FATAL ERROR in native method: processing of -javaagent failed Aborted (core dumped) I hope someone will help me to leave this dark tunnel ... Best regards, Emilio Hello All, recently I decided to simply the docker compose file to identify the error source. This is the storage section: storage: image: oci-reg-cta.zeuthen.desy.de/acada/loggingsystem/monstorage:lite ports: - "1234:1234" # JMX Exporter HTTP port - "7000:7000" # Gossip - "7001:7001" # TLS - "7199:7199" # JMX port (Cassandra native) - "7198:7198" # RMI registry port (newly added) - "9042:9042" # CQL volumes: - storage_cassandra:/var/lib/cassandra - ./jmx_prometheus_javaagent-0.15.0.jar:/opt/jmx_prometheus_javaagent.jar - ./cassandra.v5.yml:/opt/cassandra.v5.yml environment: - JVM_OPTS= -javaagent:/opt/jmx_prometheus_javaagent.jar=1234:/opt/cassandra.v5.yml -Dcassandra.jmx.remote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7198 -Djava.rmi.server.hostname=storage -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false and this is the cassandra.v5.yml startDelaySeconds: 0 lowercaseOutputName: false lowercaseOutputLabelNames: false whitelistObjectNames: ["org.apache.cassandra.metrics:*"] rules: - pattern: 'org.apache.cassandra.metricsValue' name: cassandra_$1_$2 type: GAUGE help: "Cassandra metric $1 $2" - pattern: 'org.apache.cassandra.metricsValue' name: cassandra_$1_$2_$3 type: GAUGE help: "Cassandra metric $1 $2 $3" And this is the prometheus.v5.yml global: scrape_interval: 25s scrape_timeout: 25s scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] # Prometheus server - job_name: 'storage' metrics_path: /metrics static_configs: - targets: ['storage:1234'] # Storage/Cassandra JMX exporter Using this configuration I can scrape the cassandra metrics exposed at http://localhost:1234/metrics , and there's no problem to query the kafka container, as you can see: [common@muoni-wn-15 jenkins]$ docker exec -it jenkins-broker-1 kafka-topics --bootstrap-server localhost:29092 --list __confluent.support.metrics _schemas logStorage But when I try to invoke the nodetool command inside the storage container, I get always the same error ... never mind the port I use (the error is the same even if I don't specify the -p 7197 option of if I change the jmx_prometheus_javaagent.jar port): [common@muoni-wn-15 jenkins]$ docker exec -it jenkins-storage-1 nodetool -p 7197 status Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386) at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:461) at sun.nio.ch.Net.bind(Net.java:453) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85) at sun.net.httpserver.ServerImpl.(ServerImpl.java:100) at sun.net.httpserver.HttpServerImpl.(HttpServerImpl.java:50) at sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35) at com.sun.net.httpserver.HttpServer.create(HttpServer.java:130) at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.(HTTPServer.java:179) at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31) ... 6 more FATAL ERROR in native method: processing of -javaagent failed Aborted (core dumped) Please, I hope someone will help me to leave this tunnel ... Hello again, I checked the logs as suggested by piotr: root@8bf0bcba72c7:/var/log/cassandra# grep 1234 system.log INFO [main] 2025-02-05 07:09:45,973 CassandraDaemon.java:507 - JVM Arguments: [-javaagent:/opt/jmx_prometheus_javaagent.jar=1234:/opt/cassandra.v5.yml, -Dcassandra.jmx.remote.port=7199, -Dcom.sun.management.jmxremote.rmi.port=7198, -Djava.rmi.server.hostname=storage, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.ssl=false, -Xloggc:/opt/cassandra/logs/gc.log, -ea, -XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42, -XX:+HeapDumpOnOutOfMemoryError, -Xss256k, -XX:StringTableSize=1000003, -XX:+AlwaysPreTouch, -XX:-UseBiasedLocking, -XX:+UseTLAB, -XX:+ResizeTLAB, -XX:+UseNUMA, -XX:+PerfDisableSharedMem, -Djava.net.preferIPv4Stack=true, -Xms1G, -Xmx1G, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:CMSWaitDuration=10000, -XX:+CMSParallelInitialMarkEnabled, -XX:+CMSEdenChunksRecordAlways, -XX:+CMSClassUnloadingEnabled, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintHeapAtGC, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -XX:+PrintPromotionFailure, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=10, -XX:GCLogFileSize=10M, -Xmn2048M, -XX:+UseCondCardMark, -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler, -javaagent:/opt/cassandra/lib/jamm-0.3.0.jar, -Dcassandra.jmx.local.port=7199, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password, -Djava.library.path=/opt/cassandra/lib/sigar-bin, -Dcassandra.libjemalloc=/usr/local/lib/libjemalloc.so, -XX:OnOutOfMemoryError=kill -9 %p, -Dlogback.configurationFile=logback.xml, -Dcassandra.logdir=/opt/cassandra/logs, -Dcassandra.storagedir=/opt/cassandra/data, -Dcassandra-foreground=yes] showing the configuration file is loaded (if I am not wrong), and the process looks running and waiting: root@8bf0bcba72c7:/var/log/cassandra# ss -tulp Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process udp UNCONN 0 0 127.0.0.11:43202 0.0.0.0:* tcp LISTEN 0 50 0.0.0.0:7198 0.0.0.0:* tcp LISTEN 0 50 0.0.0.0:7199 0.0.0.0:* tcp LISTEN 0 128 127.0.0.11:36235 0.0.0.0:* tcp LISTEN 0 128 0.0.0.0:9042 0.0.0.0:* tcp LISTEN 0 3 0.0.0.0:1234 0.0.0.0:* tcp LISTEN 0 128 172.31.0.8:7000 0.0.0.0:* Of course, both checks have be done inside the cassandra container.
Emilio Mastriani (1 rep)
Jan 28, 2025, 06:44 AM • Last activity: Feb 6, 2025, 08:35 AM
0 votes
1 answers
65 views
Prometheus cannot connect to Cassandra, both running as containers on the same Docker network
I hope someone will help me. I am trying to run cassandra and prometheus as containers inside of my docker compose, with many other containers. The docker starts up with no problems, but the prometheus and the storage containers are not able to communicate. Here are some logs (JSON output formatted...
I hope someone will help me. I am trying to run cassandra and prometheus as containers inside of my docker compose, with many other containers. The docker starts up with no problems, but the prometheus and the storage containers are not able to communicate. Here are some logs (JSON output formatted for readability):
$ curl -s http://localhost:9090/api/v1/targets | grep '"health":"up"'
{
   "status":"success",
   "data":{
      "activeTargets":[
         {
            "discoveredLabels":{
               "__address__":"broker:7072",
               "__metrics_path__":"/metrics",
               "__scheme__":"http",
               "__scrape_interval__":"25s",
               "__scrape_timeout__":"25s",
               "job":"broker"
            },
            "labels":{
               "instance":"broker:7072",
               "job":"broker"
            },
            "scrapePool":"broker",
            "scrapeUrl":"http://broker:7072/metrics ",
            "globalUrl":"http://broker:7072/metrics ",
            "lastError":"",
            "lastScrape":"2025-01-20T07:23:35.117923579Z",
            "lastScrapeDuration":0.076186592,
            "health":"up",
            "scrapeInterval":"25s",
            "scrapeTimeout":"25s"
         },
         {
            "discoveredLabels":{
               "__address__":"localhost:9090",
               "__metrics_path__":"/metrics",
               "__scheme__":"http",
               "__scrape_interval__":"25s",
               "__scrape_timeout__":"25s",
               "job":"prometheus"
            },
            "labels":{
               "instance":"localhost:9090",
               "job":"prometheus"
            },
            "scrapePool":"prometheus",
            "scrapeUrl":"http://localhost:9090/metrics",
            "globalUrl":"http://e71775d91730:9090/metrics ",
            "lastError":"",
            "lastScrape":"2025-01-20T07:23:32.411864452Z",
            "lastScrapeDuration":0.007709083,
            "health":"up",
            "scrapeInterval":"25s",
            "scrapeTimeout":"25s"
         },
         {
            "discoveredLabels":{
               "__address__":"storage:7200",
               "__metrics_path__":"/metrics",
               "__scheme__":"http",
               "__scrape_interval__":"25s",
               "__scrape_timeout__":"25s",
               "job":"storage"
            },
            "labels":{
               "instance":"storage:7200",
               "job":"storage"
            },
            "scrapePool":"storage",
            "scrapeUrl":"http://storage:7200/metrics ",
            "globalUrl":"http://storage:7200/metrics ",
            "lastError":"Get \"http://storage:7200/metrics\ ": context deadline exceeded",
            "lastScrape":"2025-01-20T07:23:25.989659286Z",
            "lastScrapeDuration":25.001000789,
            "health":"down",
            "scrapeInterval":"25s",
            "scrapeTimeout":"25s"
         }
      ],
      "droppedTargets":[
         
      ],
      "droppedTargetCounts":{
         "broker":0,
         "prometheus":0,
         "storage":0
      }
   }
}
As we can see, prometheus is able to scrape other containers, but not the storage one. Of course, the storage container is up and running: $ docker ps | grep storage 61d26bad890e oci-reg-cta.zeuthen.desy.de/acada/loggingsystem/monstorage:lite "docker-entrypoint.s…" About an hour ago Up 41 minutes 0.0.0.0:7000-7001->7000-7001/tcp, :::7000-7001->7000-7001/tcp, 0.0.0.0:7200->7200/tcp, :::7200->7200/tcp, 0.0.0.0:9042->9042/tcp, :::9042->9042/tcp, 0.0.0.0:9100->9100/tcp, :::9100->9100/tcp, 0.0.0.0:9160->9160/tcp, :::9160->9160/tcp, 7199/tcp jenkins-storage-1 And both the containers belong to the same network: [common@muoni-wn-15 jenkins]$ docker ps --format "{{.ID}}: {{.Names}} -> {{.Networks}}" 61d26bad890e: jenkins-storage-1 -> jenkins_monitoring e71775d91730: jenkins-prometheus-1 -> jenkins_monitoring Reading the jenkins-log file I got the following warning: WARN [main] 2025-01-20 06:40:44,058 StartupChecks.java:169 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info. And, repeated many times, the following error: INFO [main] 2025-01-20 06:40:58,881 CassandraDaemon.java:650 - Startup complete Jan 20, 2025 6:41:55 AM io.prometheus.jmx.shaded.io.prometheus.jmx.JmxCollector collect SEVERE: JMX scrape failed: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.net.SocketTimeoutException: Read timed out] Here are the involved part of the docker-compose file: storage: image: oci-reg-cta.zeuthen.desy.de/acada/loggingsystem/monstorage:lite ports: - "7000:7000" # Gossip communication - "7001:7001" # Intra-node TLS - "7200:7200" # JMX port - "9042:9042" # Native transport - "9160:9160" # Thrift service - "9100:9100" # Prometheus JMX Exporter volumes: - storage_cassandra:/var/lib/cassandra - ./jmx_prometheus_javaagent-0.15.0.jar:/opt/jmx_prometheus_javaagent.jar - ./cassandra.yml:/opt/cassandra.yml environment: - JVM_OPTS=-javaagent:/opt/jmx_prometheus_javaagent.jar=0.0.0.0:7200:/opt/cassandra.yml -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 -Dcom.sun.management.jmxremote.rmi.port=7200 -Dcassandra.jmx.remote.port=7200 cap_add: - SYS_ADMIN security_opt: - seccomp:unconfined networks: - monitoring and the cassandra.yml file: startDelaySeconds: 0 hostPort: 0.0.0.0:7200 username: xxxxxxxxx password: xxxxxxxxx ssl: false lowercaseOutputName: false lowercaseOutputLabelNames: false whitelistObjectNames: ["org.apache.cassandra.metrics:*"] rules: - pattern: 'org.apache.cassandra.metricsValue' name: cassandra_$1_$2 type: GAUGE labels: mylabel: "myvalue" help: "Cassandra metric $1 $2" - pattern: 'org.apache.cassandra.metricsValue' name: cassandra_$1_$2_$3 type: GAUGE help: "Cassandra metric $1 $2 $3" - pattern: 'org.apache.cassandra.metricsValue' name: cassandra_$1_$2_$3_$4 type: GAUGE help: "Cassandra metric $1 $2 $3 $4" After running the storage container, all the files are placed in the expected folder (/opt): root@61d26bad890e:/opt# pwd /opt root@61d26bad890e:/opt# ll total 416 drwxr-xr-x. 1 root root 63 Jan 20 06:19 ./ drwxr-xr-x. 1 root root 62 Jan 20 06:19 ../ drwxr-xr-x. 9 root root 232 Jun 18 2021 cassandra/ -rw-rw-r--. 1 1003 1003 775 Dec 4 03:18 cassandra.yml drwxr-xr-x. 3 root root 21 Jun 18 2021 java/ -rw-rw-r--. 1 1003 1003 418240 Jan 25 2021 jmx_prometheus_javaagent.jar Docker is running on Linux 61d26bad890e 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux. Honestly, I am not expert on JMX communication, so probably I have done some misconfiguration. I am sure someone will help me. Thank you in advance for your support. Emilio According to the Erick comments, I report some information about the networking status of the storage container and other ones. About the storage container, I checked the open ports and connections: [common@muoni-wn-15 ~]$ docker inspect -f '{{.State.Pid}}' jenkins-storage-1 29964 [common@muoni-wn-15 ~]$ sudo nsenter -t 29964 -n netstat [sudo] password for common: Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 muoni-wn-15.oact.:fodms 172.23.0.6:45414 ESTABLISHED tcp 0 0 muoni-wn-15.oact.:51880 muoni-wn-15.oact.:fodms TIME_WAIT tcp 1 0 muoni-wn-15.oact.:fodms 172.23.0.6:45408 CLOSE_WAIT tcp 0 0 muoni-wn-15.oact.:51886 muoni-wn-15.oact.:fodms ESTABLISHED tcp 0 0 muoni-wn-15.oact.:51892 muoni-wn-15.oact.:fodms ESTABLISHED tcp 0 0 muoni-wn-15.oact.:51870 muoni-wn-15.oact.:fodms TIME_WAIT tcp 0 0 muoni-wn-15.oact.:fodms muoni-wn-15.oact.:51892 ESTABLISHED tcp 0 0 muoni-wn-15.oact.:fodms muoni-wn-15.oact.:51886 ESTABLISHED Active UNIX domain sockets (w/o servers) Proto RefCnt Flags Type State I-Node Path unix 2 [ ] STREAM CONNECTED 489307 unix 2 [ ] STREAM CONNECTED 489251 Indeed, I cannot see any service waiting for connection at port 7200 Checking for the network configuration, here is the mapping for every container: [common@muoni-wn-15 ~]$ docker ps --format "{{.ID}}: {{.Names}} -> {{.Ports}}" 291d28956c5b: jenkins-schema-registry-1 -> 0.0.0.0:32772->8081/tcp, :::32772->8081/tcp 2577ff5c41bf: jenkins-broker-1 -> 0.0.0.0:7072->7072/tcp, :::7072->7072/tcp, 0.0.0.0:9091->9091/tcp, :::9091->9091/tcp, 9092/tcp 61d26bad890e: jenkins-storage-1 -> 0.0.0.0:7000-7001->7000-7001/tcp, :::7000-7001->7000-7001/tcp, 0.0.0.0:7200->7200/tcp, :::7200->7200/tcp, 0.0.0.0:9042->9042/tcp, :::9042->9042/tcp, 0.0.0.0:9100->9100/tcp, :::9100->9100/tcp, 0.0.0.0:9160->9160/tcp, :::9160->9160/tcp, 7199/tcp e71775d91730: jenkins-prometheus-1 -> 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp cd1e964deed8: jenkins-acs-1 -> 315859268aa9: jenkins-zookeeper-1 -> 2888/tcp, 3888/tcp, 0.0.0.0:32769->2181/tcp, :::32769->2181/tcp a7229f21f3c5: jenkins-logaggregator-1 -> 0.0.0.0:32771->5044/tcp, :::32771->5044/tcp 8ee8483ad5a0: jenkins-grafana-1 -> 3000/tcp, 0.0.0.0:3210->3210/tcp, :::3210->3210/tcp 29f552f4d239: jenkins-loki-1 -> 0.0.0.0:3100->3100/tcp, :::3100->3100/tcp c08294688cec: jenkins-promtail-1 -> e3cf072659f0: jenkins-sql_acadacdb-1 -> 3306/tcp, 33060-33061/tcp cb78b00c13fe: jenkins-opcuasimulatoraas-1 -> 0.0.0.0:32768->52522/tcp, :::32768->52522/tcp 01d046b685c8: jenkins-opcuasimulatormon-1 -> 0.0.0.0:32770->52520/tcp, :::32770->52520/tcp 7a978478f082: jenkins-logsimulator-1 -> cd981c617974: jenkins-hmi_redis-1 -> 6379/tcp 0ef9bee718a4: jenkins-mysql-1 -> 3306/tcp, 33060-33061/tcp 6cc8588a3910: jenkins-mongo-1 -> 27017/tcp I tought the lines "- JVM_OPTS=-javaagent:/opt/jmx_prometheus_javaagent.jar=0.0.0.0:7200:/opt/cassandra.yml -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 -Dcom.sun.management.jmxremote.rmi.port=7200 -Dcassandra.jmx.remote.port=7200" and ""7200:7200" # JMX Port" in my docker-compose file and the line "hostPort: 0.0.0.0:7200" in the cassandra.yml file should be enough to configure the JMX port for connection. But, unfortunally I not expert on JMX too. Maybe the simplest solution will be to set the JMX port to 7199, the default one (as you mentioned). Anyway, I appreciate any help. Emilio NB.: I deployed both cassandra and prometheus with the docker-compose file. Here is the prometheus section prometheus: image: prom/prometheus:v2.53.1 ports: - "9090:9090" # Prometheus web interface and API (TCP) #volumes: # - ./prometheus.yml:/etc/prometheus/prometheus.yml #command: # - '--config.file=/etc/prometheus.yml' volumes: - type: bind source: ./prometheus.yml target: /etc/prometheus/prometheus.yml networks: - monitoring prometheus.yml Here is the prometheus.yml global: scrape_interval: 25s scrape_timeout: 25s scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] # Prometheus server - job_name: 'storage' static_configs: - targets: ['storage:7200'] # Storage/Cassandra JMX exporter - job_name: 'broker' static_configs: - targets: ['broker:7072'] # Broker/Kafka JMX exporter
Emilio Mastriani (1 rep)
Jan 20, 2025, 07:50 AM • Last activity: Jan 21, 2025, 03:03 AM
Showing page 1 of 2 total questions