Sample Header Ad - 728x90

My Ceph mon on one node fail and won't start

1 vote
0 answers
282 views
I have A ceph on 3 node working for a year. I get a HEALTH_WARN about : 2 OSD have spurious read erros 1/3 mons down, quorum ceph01,ceph03 I tried to start mon on ceph02. But not working. xxxxxxx@ceph02:~# systemctl status ceph-mon@ceph02 ● ceph-mon@ceph02.service - Ceph cluster monitor daemon Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled) Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d └─ceph-after-pve-cluster.conf Active: active (running) since Sat 2024-02-03 12:27:49 CST; 5 months 12 days ago Main PID: 1450 (ceph-mon) Tasks: 24 Memory: 3.4G CPU: 2w 4d 14h 10min 5.925s CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@ceph02.service └─1450 /usr/bin/ceph-mon -f --cluster ceph --id ceph02 --setuser ceph --setgroup ceph Jul 17 12:17:16 ceph02 ceph-mon: 2024-07-17T12:17:16.574+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied Jul 17 12:17:31 ceph02 ceph-mon: 2024-07-17T12:17:31.590+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied Jul 17 12:17:46 ceph02 ceph-mon: 2024-07-17T12:17:46.603+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied Jul 17 12:18:01 ceph02 ceph-mon: 2024-07-17T12:18:01.615+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied Jul 17 12:18:16 ceph02 ceph-mon: 2024-07-17T12:18:16.627+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied Jul 17 12:18:31 ceph02 ceph-mon: 2024-07-17T12:18:31.644+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied Jul 17 12:18:46 ceph02 ceph-mon: 2024-07-17T12:18:46.660+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied Jul 17 12:1 9:01 ceph02 ceph-mon: 2024-07-17T12:19:01.672+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied Jul 17 12:19:16 ceph02 ceph-mon: 2024-07-17T12:19:16.685+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied Jul 17 12:19:31 ceph02 ceph-mon: 2024-07-17T12:19:31.697+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied And I do have some google about debug it. xxxxxx@ceph02:~# ceph tell mon.1 mon_status Error ENXIO: problem getting command descriptions from mon.1 And tried: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph02.asok mon_status ceph-mon -i ceph02 --debug_mon 10 ls /var/lib/ceph/mon/ceph-ceph02/ Non of them have any output and no respon. My systeam disk still have space and HEALTH is OK no error. It's looks like the folder store for mon on this node have some issue. Should I rm it. Or just reboot the node?
Asked by Abe Xu (11 rep)
Jul 17, 2024, 04:32 AM
Last activity: Jul 17, 2024, 04:44 AM