IO wait/failure timeout on iscsi device with multipath enablement
1
vote
0
answers
68
views
- I'm accessing a remote iscsi based SAN using multipath.
- The network on the server side has known intermittent issues such that there are session failures and path failures/IO failures. I'm not trying to beat this problem as it's already a WIP.
- Now, the issue i have is let's say I'm trying to format or partition the device via a process/service, the parted/mkfs cmd gets hung causing Kernel panic. This value is set to 240 secs.
- Now, what i want to avoid is the kernel panic, i want parted/mkfs command to fail and return than cause kernel panic.
- I have searched and tried changing various parameters ( iscsid, sysfs, multipath ) to no avail.
This is my iscsid config
iscsid.startup = /bin/systemctl start iscsid.socket iscsiuio.socket
node.startup = automatic
node.leading_login = No
node.session.timeo.replacement_timeout = 30
node.conn.timeo.login_timeout = 30
node.conn.timeo.logout_timeout = 15
node.conn.timeo.noop_out_interval = 5
node.conn.timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.initial_login_retry_max = 8
node.session.cmds_max = 128
node.session.queue_depth = 2
node.session.xmit_thread_priority = -20
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 262144
node.conn.iscsi.MaxRecvDataSegmentLength = 262144
node.conn.iscsi.MaxXmitDataSegmentLength = 262144
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.conn.iscsi.HeaderDigest = CRC32C
node.conn.iscsi.DataDigest = CRC32C
node.session.nr_sessions = 1
node.session.reopen_max = 0
node.session.iscsi.FastAbort = Yes
node.session.scan = auto
multipath conf
defaults {
path_checker none
user_friendly_names yes # To create ‘mpathn’ names for multipath devices
path_grouping_policy multibus # To place all the paths in one priority group
path_selector "round-robin 0" # To use round robin algorithm to determine path for next I/O operation
failback immediate # For immediate failback to highest priority path group with active paths
no_path_retry 1 # To disable I/O queueing after retrying once when all paths are down
}
And I've set all sysfs timeout values of all slave paths to be 30 seconds.
But still parted/mkfs never fail and return when there's network issue ( simulated ). What am i missing?
My multipath version is tad old but i can't upgrade as this is supported version on Rocky 8.
multipath-tools v0.8.4 (05/04, 2020)
iscsid version 6.2.1.4-1
Asked by Neetz
(111 rep)
Jan 21, 2025, 09:38 PM