Sample Header Ad - 728x90

A process using CUDA gets stuck, then all others get stuck as well - what do I do?

0 votes
0 answers
168 views
I’m writing some program using CUDA CUDA 12.1, running on a Linux system (Devuan Daedalus, kernel version 6.1.27). For some reason (which may be a bug of mine, although I kind of doubt it) - the process gets stuck at some point. Sending it SIGINT, SIGTERM or SIGKILL has no effect. The details of what this process does shouldn’t really matter, but - it doesn’t do file I/O, it doesn’t use the network, it doesn’t use any other peripherals - it just uses CUDA APIs (specifically, execution graphs), does some computation in-memory, and prints messages to its standard output. So, first part of the question question: How can I kill such a process (other than by rebooting the machine)? Now, after this process gets stuck - any process using CUDA APIs seems to also get stuck, (almost) immediately when starting to run. Thus, a second part of the question: Can I avoid other processes getting stuck as well?
Asked by einpoklum (10753 rep)
Jul 13, 2023, 12:00 PM