fuser -v /dev/nvidia* and lsof not responding
0
votes
1
answer
561
views
On our A100 machine, we frequently have zombie processes that still allocate memory when already stopped. I usually used
fuser -v /dev/nvidia*
to determine the PIDs of all processes and kill them either with kill
or fuser -k /dev/nvidia*
.
fuser
always took a short while to return the result. But now, the command fuser -v
and fuser -k
are hanging indefinitely, not responding within any reasonable amount of time. For example, last time, it ran a weekend without returning. I ended up restarting the server.
fuser -v /dev/nvidia0
shows the same abnormal behavior, as does lsof /dev/nvidia0
. When looking online for this issue, I only get answers to the zombie process problem mentioned above, but no problems specifically for the case when fuser
or lsof
are hanging.
How to debug/solve this problem ideally without restarting the machine?
The system runs Ubuntu 20.04.
Asked by Green 绿色
(331 rep)
Mar 13, 2024, 06:41 AM
Last activity: Jun 5, 2024, 01:12 AM
Last activity: Jun 5, 2024, 01:12 AM