Text processing - Building a slurm topology.conf file from ibnetdiscover output
1
vote
2
answers
640
views
First things first: no knowledge of either slurm or Infiniband is required - this is a purely text processing problem.
Second - I'm aware of ib2slurm - the code is somehow broken and quite possibly outdated - it core dumps each time it runs regardless of the existence or format of a map file.
I can reduce the output of ibnetdiscover to 37 line blocks each of the form:
Switch 36 "S-0002c90200423e70" # "MF0;ibsw20:SX6036/U1" enhanced port 0 lid 3 lmc 0
[1] "H-0002c903000c26f2"[1] (2c903000c26f3) # "compute061 HCA-1" lid 49 4xQDR
"H-0002c903000bf36e"[1] (2c903000bf36f) # "compute060 HCA-1" lid 1 4xQDR
"H-0002c903000bf35a"[1] (2c903000bf35b) # "compute063 HCA-1" lid 28 4xQDR
"H-0002c903000c2646"[1] (2c903000c2647) # "compute062 HCA-1" lid 25 4xQDR
"H-0002c903000bf35e"[1] (2c903000bf35f) # "compute064 HCA-1" lid 31 4xQDR
"H-0002c903000c26de"[1] (2c903000c26df) # "compute065 HCA-1" lid 47 4xQDR
"S-0002c90200423e80" # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
"S-0002c90200423e80" # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
"S-0002c90200423e80" # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
"S-0002c90200423e80" # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
"S-0002c90200423e80" # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
"S-0002c90200423e80" # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
"S-0002c90200423eb8" # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
"S-0002c90200423eb8" # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
"S-0002c90200423eb8" # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
"S-0002c90200423eb8" # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
"S-0002c90200423eb8" # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
"S-0002c90200423eb8" # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
"S-0002c90200423ee0" # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
"S-0002c90200423ee0" # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
"S-0002c90200423ee0" # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
"S-0002c90200423ee0" # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
"S-0002c90200423ee0" # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
"S-0002c90200423ee0" # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
"H-0002c903000c26fa"[1] (2c903000c26fb) # "compute046 HCA-1" lid 112 4xQDR
"H-0002c903000c26e2"[1] (2c903000c26e3) # "compute047 HCA-1" lid 63 4xQDR
"H-0002c903000c263a"[1] (2c903000c263b) # "compute048 HCA-1" lid 59 4xQDR
"H-0002c903000c27c2"[1] (2c903000c27c3) # "compute049 HCA-1" lid 117 4xQDR
"H-0002c903000c27a6"[1] (2c903000c27a7) # "compute051 HCA-1" lid 34 4xQDR
"H-0002c903000c2732"[1] (2c903000c2733) # "compute050 HCA-1" lid 22 4xQDR
"H-0002c903000c265e"[1] (2c903000c265f) # "compute052 HCA-1" lid 29 4xQDR
"H-0002c903000c266a"[1] (2c903000c266b) # "compute055 HCA-1" lid 32 4xQDR
"H-0002c903000c264e"[1] (2c903000c264f) # "compute054 HCA-1" lid 26 4xQDR
"H-0002c903000c26ee"[1] (2c903000c26ef) # "compute056 HCA-1" lid 48 4xQDR
"H-0002c903000bf246"[1] (2c903000bf247) # "compute057 HCA-1" lid 33 4xQDR
"H-0002c903000c27ca"[1] (2c903000c27cb) # "compute053 HCA-1" lid 44 4xQDR
and can extract the node name, e.g. compute061 using awk or sed.
I would like to get a single row for each block starting with switch name followed by node names, i.e:
ibsw20 compute061 compute060 compute063 compute062 compute064 compute065 compute046 compute047 compute048 compute049 compute051 compute050 compute052 compute055 compute054 compute056 compute057 compute053
I plan to use slurm's scontrol show hostlist " ..."
to compress several nodes into a single entity to push into slurm's topology.conf file which must have the form:
SwitchName=ibsw20 Nodes=compute[046-057,060-061]
Any ideas?
I should mention after all the switch mappings, the ibnetdiscover file continues with the reverse - a node-by-node mapping to switches, in the form:
vendid=0x2c9
devid=0x673c
sysimgguid=0x2c903000bf371
caguid=0x2c903000bf36e
Ca 1 "H-0002c903000bf36e" # "compute060 HCA-1"
[1] (2c903000bf36f) "S-0002c90200423e70" # lid 1 lmc 0 "MF0;ibsw20:SX6036/U1" lid 3 4xQDR
Each block separated by empty lines.
A reduced question that can get me started - how do I parse several lines of text into a single row, extracting different parts of each row (treating header and body rows differently) and discarding rows which do not contain relevant data?
EDIT:
The blocks might not be full - if nothing is connected to some of the ports in some of the switches, then the output will skip those line, and can result in something like:
Switch 36 "S-0002c90200423e70" # "MF0;ibsw20:SX6036/U1" enhanced port 0 lid 3 lmc 0
"H-0002c903000bf36e"[1] (2c903000bf36f) # "compute060 HCA-1" lid 1 4xQDR
"H-0002c903000bf35a"[1] (2c903000bf35b) # "compute063 HCA-1" lid 28 4xQDR
"H-0002c903000c2646"[1] (2c903000c2647) # "compute062 HCA-1" lid 25 4xQDR
"S-0002c90200423eb8" # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
"H-0002c903000c264e"[1] (2c903000c264f) # "compute074 HCA-1" lid 26 4xQDR
"H-0002c903000c26ee"[1] (2c903000c26ef) # "compute076 HCA-1" lid 48 4xQDR
So I can't simply rely on there being 36 lines following each switch line or that will always be the last line in a switch block.
Asked by Dani_l
(5157 rep)
Jan 15, 2016, 07:08 AM
Last activity: Feb 7, 2016, 09:55 AM
Last activity: Feb 7, 2016, 09:55 AM