Sample Header Ad - 728x90

Slurm IO error, could not open stdoutfile

0 votes
0 answers
358 views
I am new to Slurm. I have set it up in the cluster and on some nodes of a partition, the job runs perfectly fine but some other nodes of the same partition, the jobs do not run. They get cancelled the moment I submit them. When I ssh into the node on which the jobs fail and look at the slurmd.log I see:
[2024-08-30T01:49:30.986] [43095.batch] error: Could not open stdout file /data/vmahajan/larchtest/slurm-43095.out: No such file or directory
[2024-08-30T01:49:30.986] [43095.batch] error: _fork_all_tasks: IO setup failed: Slurmd could not connect IO
[2024-08-30T01:49:30.987] [43095.batch] get_exit_code task 0 died by signal: 53
[2024-08-30T01:49:30.989] [43095.batch] done with job
I can only conclude that Slurm is not able to write the out file, but I fail to understand why so only on some nodes it fails to do so.
Asked by Sanji Vinsmoke (1 rep)
Aug 30, 2024, 08:12 AM
Last activity: Sep 7, 2024, 09:27 PM