Sample Header Ad - 728x90

fuser -v /dev/nvidia* and lsof not responding

0 votes
1 answer
561 views
On our A100 machine, we frequently have zombie processes that still allocate memory when already stopped. I usually used fuser -v /dev/nvidia* to determine the PIDs of all processes and kill them either with kill or fuser -k /dev/nvidia*. fuser always took a short while to return the result. But now, the command fuser -v and fuser -k are hanging indefinitely, not responding within any reasonable amount of time. For example, last time, it ran a weekend without returning. I ended up restarting the server. fuser -v /dev/nvidia0 shows the same abnormal behavior, as does lsof /dev/nvidia0. When looking online for this issue, I only get answers to the zombie process problem mentioned above, but no problems specifically for the case when fuser or lsof are hanging. How to debug/solve this problem ideally without restarting the machine? The system runs Ubuntu 20.04.
Asked by Green 绿色 (331 rep)
Mar 13, 2024, 06:41 AM
Last activity: Jun 5, 2024, 01:12 AM