I have a cluster of 11 nodes,
Apache Cassandra version 3.11.4
, where certain nodes are restarted at random times.
Nodes characteristics:
CPU(s): 32
Model name: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
RAM: 62G
Disk: 42T
Below, I am attaching system.log file of one node a.k.a node70 before restart:
WARN [CompactionExecutor:47] 2024-01-07 21:07:17,332 BigTableWriter.java:211 - Writing large partition xxx/xxx:128274211 (203.247MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495387-big-Data.db
INFO [HintsDispatcher:7] 2024-01-07 21:07:26,969 HintsStore.java:133 - Deleted hint file d75b9d99-e44b-4b91-8ba1-2b60852699da-1704638444850-1.hints
INFO [HintsDispatcher:7] 2024-01-07 21:07:26,970 HintsDispatchExecutor.java:282 - Finished hinted handoff of file d75b9d99-e44b-4b91-8ba1-2b60852699da-1704638444850-1.hints to endpoint /10.147.49.69: d75b9d99-e44b-4b91-8ba1-2b60852699da
WARN [CompactionExecutor:36] 2024-01-07 21:07:33,280 BigTableWriter.java:211 - Writing large partition xxx/xxx:20755 (172.373MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495376-big-Data.db
WARN [CompactionExecutor:43] 2024-01-07 21:07:55,185 BigTableWriter.java:211 - Writing large partition xxx/xxx:128274211 (116.939MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495383-big-Data.db
...
I am leaving key logs, so you can follow up. Whole file contains a lot of WARN
s about Writing large patition
.
WARN [CompactionExecutor:47] 2024-01-07 21:17:45,303 BigTableWriter.java:211 - Writing large partition xxx/xxx:26300 (191.685MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495387-big-Data.db
WARN [CompactionExecutor:40] 2024-01-07 21:18:00,395 BigTableWriter.java:211 - Writing large partition xxx/xxx:26663181 (145.598MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495380-big-Data.db
WARN [CompactionExecutor:42] 2024-01-07 21:18:14,510 BigTableWriter.java:211 - Writing large partition xxx/xxx:39741453 (111.068MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495416-big-Data.db
INFO [ScheduledTasks:1] 2024-01-07 21:18:41,355 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
WARN [CompactionExecutor:43] 2024-01-07 21:18:55,742 BigTableWriter.java:211 - Writing large partition xxx/xxx:332 (107.752MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495383-big-Data.db
WARN [CompactionExecutor:44] 2024-01-07 21:18:57,684 BigTableWriter.java:211 - Writing large partition xxx/xxx:770 (107.219MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495384-big-Data.db
Also here:
WARN [CompactionExecutor:44] 2024-01-07 21:22:19,816 BigTableWriter.java:211 - Writing large partition xxx/xxx:30096898 (103.169MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495384-big-Data.db
INFO [CompactionExecutor:42] 2024-01-07 21:22:26,206 NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot allocate chunk of 1048576
WARN [CompactionExecutor:43] 2024-01-07 21:22:57,513 BigTableWriter.java:211 - Writing large partition xxx/xxx:26663181 (148.395MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495383-big-Data.db
INFO [IndexSummaryManager:1] 2024-01-07 21:22:59,163 IndexSummaryRedistribution.java:77 - Redistributing index summaries
WARN [CompactionExecutor:42] 2024-01-07 21:23:01,308 BigTableWriter.java:211 - Writing large partition xxx/xxx:20755 (350.542MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495416-big-Data.db
These kind of logs are repeating till **2024-01-7 22:49:00** and node just goes down and ups:
WARN [CompactionExecutor:43] 2024-01-07 22:49:26,242 BigTableWriter.java:211 - Writing large partition xxx/xxx:30269442 (109.637MiB) to sstable /home/user/apache-cassandra-3.11.4/data/data/xxx/xxx-5d4d2290f19811e99de8956ff675a115/md-1495383-big-Data.db
INFO [main] 2024-01-07 22:55:08,817 YamlConfigurationLoader.java:89 - Configuration location: file:/home/user/apache-cassandra-3.11.4/conf/cassandra.yaml
INFO [main] 2024-01-07 22:55:09,194 Config.java:496 - Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=true; auto_snapshot=true; back_pressure_enabled=false; back_pressure_strategy=org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}; batch_size_fail_threshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=null; broadcast_rpc_address=null; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; cdc_enabled=false; cdc_free_space_check_interval_ms=250; cdc_raw_directory=null; cdc_total_space_in_mb=0; client_encryption_options=; cluster_name=LBS Cluster; column_index_cache_size_in_kb=2; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_compression=null; commitlog_directory=null; commitlog_max_compression_buffers_in_pool=3; commitlog_periodic_queue_size=-1; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_batch_window_in_ms=NaN; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=null; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=32; concurrent_compactors=16; concurrent_counter_writes=16; concurrent_materialized_view_writes=32; concurrent_reads=16; concurrent_replicates=null; concurrent_writes=16; counter_cache_keys_to_save=2147483647; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; credentials_cache_max_entries=1000; credentials_update_interval_in_ms=-1; credentials_validity_in_ms=2000; cross_node_timeout=false; data_file_directories=[Ljava.lang.String;@7ec7ffd3; disk_access_mode=auto; disk_failure_policy=stop; disk_optimization_estimate_percentile=0.95; disk_optimization_page_cross_chance=0.1; disk_optimization_strategy=ssd; dynamic_snitch=true; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; enable_materialized_views=false; enable_scripted_user_defined_functions=false; enable_user_defined_functions=false; enable_user_defined_functions_threads=true; encryption_options=null; endpoint_snitch=GossipingPropertyFileSnitch; file_cache_round_up=null; file_cache_size_in_mb=null; gc_log_threshold_in_ms=200; gc_warn_threshold_in_ms=1000; hinted_handoff_disabled_datacenters=[]; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; hints_compression=null; hints_directory=null; hints_flush_period_in_ms=10000; incremental_backups=true; index_interval=null; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=null; inter_dc_stream_throughput_outbound_megabits_per_sec=200; inter_dc_tcp_nodelay=false; internode_authenticator=null; internode_compression=dc; internode_recv_buff_size_in_bytes=0; internode_send_buff_size_in_bytes=0; key_cache_keys_to_save=2147483647; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=10.147.49.70; listen_interface=null; listen_interface_prefer_ipv6=false; listen_on_broadcast_address=false; max_hint_window_in_ms=172800000; max_hints_delivery_threads=2; max_hints_file_size_in_mb=1024; max_mutation_size_in_kb=null; max_streaming_retries=3; max_value_size_in_mb=256; memtable_allocation_type=heap_buffers; memtable_cleanup_threshold=null; memtable_flush_writers=0; memtable_heap_space_in_mb=null; memtable_offheap_space_in_mb=null; min_free_space_per_drive_in_mb=50; native_transport_flush_in_batches_legacy=true; native_transport_max_concurrent_connections=-1; native_transport_max_concurrent_connections_per_ip=-1; native_transport_max_frame_size_in_mb=256; native_transport_max_threads=128; native_transport_port=9042; native_transport_port_ssl=null; num_tokens=256; otc_backlog_expiration_interval_ms=200; otc_coalescing_enough_coalesced_messages=8; otc_coalescing_strategy=DISABLED; otc_coalescing_window_us=200; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_cache_max_entries=1000; permissions_update_interval_in_ms=-1; permissions_validity_in_ms=2000; phi_convict_threshold=8.0; prepared_statements_cache_size_mb=null; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_scheduler_id=null; request_scheduler_options=null; request_timeout_in_ms=10000; role_manager=CassandraRoleManager; roles_cache_max_entries=1000; roles_update_interval_in_ms=-1; roles_validity_in_ms=2000; row_cache_class_name=org.apache.cassandra.cache.OHCProvider; row_cache_keys_to_save=2147483647; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=10.147.49.70; rpc_interface=null; rpc_interface_prefer_ipv6=false; rpc_keepalive=true; rpc_listen_backlog=50; rpc_max_threads=2147483647; rpc_min_threads=16; rpc_port=9160; rpc_recv_buff_size_in_bytes=null; rpc_send_buff_size_in_bytes=null; rpc_server_type=sync; saved_caches_directory=null; seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=10.147.49.69,10.147.49.70,10.147.49.71,10.147.49.74,10.147.49.75,10.147.49.73,10.147.49.72,10.147.49.67,10.147.49.68,10.147.49.64,10.147.49.65}; server_encryption_options=; slow_query_log_timeout_in_ms=500; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=false; storage_port=7000; stream_throughput_outbound_megabits_per_sec=200; streaming_keep_alive_period_in_secs=300; streaming_socket_timeout_in_ms=86400000; thrift_framed_transport_size_in_mb=15; thrift_max_message_length_in_mb=16; thrift_prepared_statements_cache_size_mb=null; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; tracetype_query_ttl=86400; tracetype_repair_ttl=604800; transparent_data_encryption_options=org.apache.cassandra.config.TransparentDataEncryptionOptions@5b239d7d; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; user_defined_function_fail_timeout=1500; user_defined_function_warn_timeout=500; user_function_timeout_policy=die; windows_timer_interval=1; write_request_timeout_in_ms=2000]
INFO [main] 2024-01-07 22:55:09,195 DatabaseDescriptor.java:373 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO [main] 2024-01-07 22:55:09,196 DatabaseDescriptor.java:431 - Global memtable on-heap threshold is enabled at 2508MB
INFO [main] 2024-01-07 22:55:09,196 DatabaseDescriptor.java:435 - Global memtable off-heap threshold is enabled at 2508MB
INFO [main] 2024-01-07 22:55:09,252 RateBasedBackPressure.java:123 - Initialized back-pressure with high ratio: 0.9, factor: 5, flow: FAST, window size: 2000.
INFO [main] 2024-01-07 22:55:09,252 DatabaseDescriptor.java:735 - Back-pressure is disabled with strategy org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}.
INFO [main] 2024-01-07 22:55:09,580 GossipingPropertyFileSnitch.java:64 - Loaded cassandra-topology.properties for compatibility
INFO [main] 2024-01-07 22:55:09,667 JMXServerUtils.java:246 - Configured JMX server at: service:jmx:rmi://127.0.0.1/jndi/rmi://127.0.0.1:7199/jmxrmi
INFO [main] 2024-01-07 22:55:09,675 CassandraDaemon.java:476 - Hostname: server70
INFO [main] 2024-01-07 22:55:09,676 CassandraDaemon.java:483 - JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_222
INFO [main] 2024-01-07 22:55:09,677 CassandraDaemon.java:484 - Heap size: 9.800GiB/9.800GiB
INFO [main] 2024-01-07 22:55:09,678 CassandraDaemon.java:489 - Code Cache Non-heap memory: init = 2555904(2496K) used = 7550016(7373K) committed = 7602176(7424K) max = 251658240(245760K)
INFO [main] 2024-01-07 22:55:09,678 CassandraDaemon.java:489 - Metaspace Non-heap memory: init = 0(0K) used = 20864080(20375K) committed = 21495808(20992K) max = -1(-1K)
INFO [main] 2024-01-07 22:55:09,679 CassandraDaemon.java:489 - Compressed Class Space Non-heap memory: init = 0(0K) used = 2402776(2346K) committed = 2621440(2560K) max = 1073741824(1048576K)
INFO [main] 2024-01-07 22:55:09,679 CassandraDaemon.java:489 - Par Eden Space Heap memory: init = 1718091776(1677824K) used = 481091008(469815K) committed = 1718091776(1677824K) max = 1718091776(1677824K)
INFO [main] 2024-01-07 22:55:09,679 CassandraDaemon.java:489 - Par Survivor Space Heap memory: init = 214695936(209664K) used = 0(0K) committed = 214695936(209664K) max = 214695936(209664K)
INFO [main] 2024-01-07 22:55:09,679 CassandraDaemon.java:489 - CMS Old Gen Heap memory: init = 8589934592(8388608K) used = 0(0K) committed = 8589934592(8388608K) max = 8589934592(8388608K)
**NB!**
When I want to check nodetool status in node70 it shows me: **Cassandra 3.0 and later require Java 8u40 or later**, but others can display even their versions are same.
Also, for some reason Snapshots does not work on this node, and data from 2020 and 2021 is stored there, although I configured it for two years.
I am also getting such logs in logs directory:
hs_err_pid1461756.log
hs_err_pid1502594.log
Here is my **gc.logs**:
2024-01-08T11:38:48.267+0600: 3533.985: Total time for which application threads were stopped: 0.0273467 seconds, Stopping threads took: 0.0017132 seconds
{Heap before GC invocations=5251 (full 22):
par new generation total 1887488K, used 1710170K [0x0000000540000000, 0x00000005c0000000, 0x00000005c0000000)
eden space 1677824K, 100% used [0x0000000540000000, 0x00000005a6680000, 0x00000005a6680000)
from space 209664K, 15% used [0x00000005b3340000, 0x00000005b52d68f0, 0x00000005c0000000)
to space 209664K, 0% used [0x00000005a6680000, 0x00000005a6680000, 0x00000005b3340000)
concurrent mark-sweep generation total 8388608K, used 3626506K [0x00000005c0000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 46509K, capacity 48029K, committed 48604K, reserved 1091584K
class space used 5645K, capacity 5969K, committed 6108K, reserved 1048576K
2024-01-08T11:38:48.753+0600: 3534.471: [GC (Allocation Failure) 2024-01-08T11:38:48.753+0600: 3534.471: [ParNew
Desired survivor size 107347968 bytes, new threshold 1 (max 1)
- age 1: 22025736 bytes, 22025736 total
: 1710170K->30151K(1887488K), 0.0262670 secs] 5336676K->3678067K(10276096K), 0.0266400 secs] [Times: user=0.46 sys=0.01, real=0.02 secs]
Heap after GC invocations=5252 (full 22):
par new generation total 1887488K, used 30151K [0x0000000540000000, 0x00000005c0000000, 0x00000005c0000000)
eden space 1677824K, 0% used [0x0000000540000000, 0x0000000540000000, 0x00000005a6680000)
from space 209664K, 14% used [0x00000005a6680000, 0x00000005a83f1f70, 0x00000005b3340000)
to space 209664K, 0% used [0x00000005b3340000, 0x00000005b3340000, 0x00000005c0000000)
concurrent mark-sweep generation total 8388608K, used 3647915K [0x00000005c0000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 46509K, capacity 48029K, committed 48604K, reserved 1091584K
class space used 5645K, capacity 5969K, committed 6108K, reserved 1048576K
}
2024-01-08T11:38:48.780+0600: 3534.498: Total time for which application threads were stopped: 0.0325483 seconds, Stopping threads took: 0.0013643 seconds
2024-01-08 11:38:49 GC log file has reached the maximum size. Saved as /home/user/apache-cassandra-3.11.4/logs/gc.log.0
Please, tell me if you need additional information. This is my first post in dba.stackexchange.com, so I may not take into account some forum rules
Thank you!
Asked by Midas Salimgali
(11 rep)
Jan 8, 2024, 07:31 AM
Last activity: Jul 16, 2025, 07:05 AM
Last activity: Jul 16, 2025, 07:05 AM