Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

33 votes

12 answers

29787 views

Process descendants

I'm trying to build a process container. The container will trigger other programs. For example - a bash script that launches running background tasks with '&' usage. The important feature I'm after is this: when I kill the container, everything that has been spawned under it should be killed. Not j...

                                  I'm trying to build a process container. The container will trigger other programs. For example - a bash script that launches running background tasks with '&' usage.

The important feature I'm after is this: when I kill the container, everything that has been spawned under it should be killed. Not just direct children, but their descendants too.

When I started this project, I mistakenly believed that when you killed a process its children were automatically killed too. I've sought advice from people who had the same incorrect idea. While it's possible to catch a signal and pass the kill on to children, that's not what I'm looking for here.

I believe what I want to be achievable, because when you close an xterm, anything that was running within it is killed unless it was nohup'd. This includes orphaned processes. That's what I'm looking to recreate.

I have an idea that what I'm loooking for involves unix sessions.

If there was a reliable way to identify all the descendants of a process, it would be useful to be able to send them arbitrary signals, too. e.g. SIGUSR1.

Craig Turner (430 rep)

Jun 11, 2011, 10:59 AM • Last activity: Jul 24, 2025, 05:07 PM

0 votes

1 answers

43 views

flock not working between forks on the same fd

c debugging fork flock

I am working on a process that forks several times. To debug it, I am using a debug file for which I open a fd and which stays the same for all child forks. Then I have a function print_debug, that will print to that fd. Now when the main process and the childs are printing at the same time, the output is intertwined in the debug file. I tried to solve that via a flock(fd, LOCK_EX), but that seems to have no effect. Here is an excerpt from the print_debug function:

void	print_debug(int fd, char *msg)
{
	
	flock(fd, LOCK_EX);
	print_debug2("... LOCK ACQUIRED ...");
    
    ... printing msg ...
    flock(fd, LOCK_UN);
}

Now when several forks print at the same time, the output looks like this:

--12692-- ... LOCK ACQUIRED ...
--12692-- fork no wait with fd_in 4, fd_out 1 and fd_close
----121269694-2-- - ..... . LOLOCKCK A ACQCQUIUIRERED D .....
.
----121269694-2-- - exriecghutt e sitrdeee o
f pipe started
--12694---- 12.69..2- - LO..CK.  ALOCQCUIK RACED Q..UI.
RE--D 12..69.4-
- --fd12 69ou2-t-  ifs orck urwreaintt
ly: 1
----121269692-4-- - ...... L LOCOCK K ACACQUQUIRIREDED . .....

----121269692-4-- - fofdrk o nuto  owan itcm wdit ch atfd i_is n cu0,rr fend_tlouy:t  15 
and fd_close
--126--9412--69 .6-..-  L..OC. K LOACCKQU AICQREUD IR..ED. .
.--.
12--69124-69- 6-er- rnexo ec2
ute tree
--1269--412--69 6-..- . ..LO.CK  ALOCCKQU IRACEQUDIR ED.. ..
.--.
12--6129694-6- --c fmdd_e oxutec iuts edcu rrfrenomtl y:ex ec5

Clearly, the printing of "lock acquired" overlap. I also controlled the return value of flock, which is always 0 (success). Does flock work in a situation like this? Why do I still have the issues of garbled messages in the log file?

Bastian (25 rep)

May 6, 2025, 08:09 PM • Last activity: May 11, 2025, 03:36 PM

3 votes

1 answers

2349 views

SSH fork kills connection

ssh ssh-tunneling background-process port-forwarding fork

I am using a Linux script which has the task of forwarding control of the system to remote support. In this script one of the commands is a ssh port forward command that will forward the port of the Video Live Stream of a remote camera. On the system with the remote camera, that system is an unknown...

                                  I am using a Linux script which has the task of forwarding control of the system to remote support.  In this script one of the commands is a ssh port forward command that will forward the port of the Video Live Stream of a remote camera.  On the system with the remote camera, that system is an unknown and thus assumed always behind a firewall and also has a user whom lacks the knowledge to port forward their router and also acquire a dynamic DNS. To overcome this the "CLIENT" system or the camera computer executes the command below:

    ssh -R 8089:dc-bb7925e7:8085 -p 2250 user@remoteserver.com -fNT

which is forwarding the CLIENT port for the camera feed 8085 to the remote support server 8089.  Remote support is supposed to be able to go to localhost:8089 and be able to view the live stream.  The problem is that this does not work.  Once I insert the -f flag into the command, this command breaks and forwards nothing.

Regardless of the flag, the problem is that when this ssh command executes, all other scripts and processes which are supposed to be running, get put on hold because of the TTY which does not allow the script to exit until the connection is broken.  So I tried using the -f to fork the ssh into the background.  This does not work as the port does not get forwarded.  I can not figure out why.

What I need is for the port to be forwarded and then forgotten about while the connection remains open.  It is important that remote support has control over ssh while the client system still operates normally.  What am I doing wrong?

If is do not use the -fNT then this functions normally, only all other scripts are not executed.

This is a Debian system.

RootWannaBe (131 rep)

Oct 19, 2014, 09:58 AM • Last activity: Apr 28, 2025, 02:05 AM

8 votes

2 answers

609 views

fork() Causes DMA Buffer in Physical Memory to Retain Stale Data on Subsequent Writes

linux cache virtual-memory fork

I'm working on a C++ application on Ubuntu 20.04 that uses PCIe DMA to transfer data from a user-space buffer to hardware. The buffer is mapped to a fixed 1K physical memory region via a custom library (plib->copyDataToBuffer). After calling fork() and running a child process (which just calls an ex...

                                  I'm working on a C++ application on Ubuntu 20.04 that uses PCIe DMA to transfer data from a user-space buffer to hardware. The buffer is mapped to a fixed 1K physical memory region via a custom library (plib->copyDataToBuffer). After calling fork() and running a child process (which just calls an external program and exits), I notice that subsequent writes to the buffer by the parent process do not reflect in physical memory—the kernel still sees the old data from before the fork.

**Key Details:**
The 1K buffer is mapped specifically for DMA; it’s pinned and mapped to a known physical address.

Before the fork(), a call to plib->copyDataToBuffer correctly updates physical memory.

After the fork(), the parent process calls plib->copyDataToBuffer again with new data, and msync returns success, but the physical memory contents remain unchanged.

The child process does not touch the buffer; it only runs an unrelated command via execvp.

**My Assumptions & Concerns:**
fork() causes COW (Copy-on-Write), but since only the parent writes to the buffer, I expected the updated contents to reflect in physical memory.

Could the COW behavior or memory remapping post-fork be interfering with DMA-mapped memory regions?

I confirmed that plib->copyDataToBuffer performs the write correctly from a software perspective, but the actual physical memory contents (verified from kernel space) remain stale.

**Question:**
Why does the physical memory backing my DMA buffer retain stale data after a fork() + exec in a child process, even though the parent writes new data afterward?

What are the best practices to ensure consistent physical memory updates for DMA buffers across fork() calls?

Nungesser Mcmindes (83 rep)

Apr 18, 2025, 01:28 AM • Last activity: Apr 18, 2025, 01:55 PM

0 votes

1 answers

715 views

can two running processes share the complete process image in physical memory, not just part of it?

process memory process-management fork copy-on-write

can two running processes share the complete process image in physical memory, not just part of it? Here I am talking about the Linux operating systems(eg. Ubuntu). **My thinking:** I think it is **False** in general because the only time it is possible is with copy-on-write during fork() and before...

                                  can two running processes share the complete process image in physical memory, not just part of it?

Here I  am talking about the Linux operating systems(eg. Ubuntu).

**My thinking:**
I think it is  **False** in general because the only time it is possible is with copy-on-write during fork() and before any writes have been made.

**Que:** Can someone explain to me am I correct or not?
If I am wrong please give me some examples

Deepesh Meena (101 rep)

Sep 8, 2018, 08:22 PM • Last activity: Jan 22, 2025, 08:00 AM

0 votes

0 answers

80 views

GDB doesn't hit catchpoint on the child process forked off from debuggee

debugging shared-library fork gdb elf

I was re-doing what described [here][1] about multiprocessing debugging in `GDB`. The weird thing is that `GDB` doesn't hit the `exec` catchpoint on the child process running `cat` command (the latter is forked off from `bash`). ubuntu@ubuntu:~$ echo $$ 670639 ubuntu@ubuntu:~$ cat /etc/issue root@ub...

                                  I was re-doing what described here  about multiprocessing debugging in GDB.
The weird thing is that GDB doesn't hit the exec catchpoint on the child process running cat command (the latter is forked off from bash).

    ubuntu@ubuntu:~$ echo $$
    670639
    ubuntu@ubuntu:~$ cat /etc/issue
    

    root@ubuntu:~# gdb -q -p 670639
    Attaching to process 670639
    Reading symbols from /usr/bin/bash...
    (No debugging symbols found in /usr/bin/bash)
    Reading symbols from /lib/x86_64-linux-gnu/libtinfo.so.6...
    (No debugging symbols found in /lib/x86_64-linux-gnu/libtinfo.so.6)
    Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
    Reading symbols from /usr/lib/debug/.build-id/49/0fef8403240c91833978d494d39e537409b92e.debug...
    Reading symbols from /lib64/ld-linux-x86-64.so.2...
    Reading symbols from /usr/lib/debug/.build-id/41/86944c50f8a32b47d74931e3f512b811813b64.debug...
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    pselect64_syscall (sigmask=0x564025a0c820 , timeout=, exceptfds=0x0, writefds=0x0, readfds=0x7ffe468736e0, nfds=1) at ../sysdeps/unix/sysv/linux/pselect.c:34
    34      ../sysdeps/unix/sysv/linux/pselect.c: No such file or directory.
    (gdb) catch fork
    Catchpoint 1 (fork)
    (gdb) catch exec 
    Catchpoint 2 (exec)
    (gdb) c
    Continuing.
    
    Catchpoint 1 (forked process 719946), arch_fork (ctid=0x7fbccdc6fa10) at ../sysdeps/unix/sysv/linux/arch-fork.h:52
    52      ../sysdeps/unix/sysv/linux/arch-fork.h: No such file or directory.
    (gdb) info inferiors 
      Num  Description       Connection           Executable        
    * 1    process 670639    1 (native)           /usr/bin/bash     
    (gdb) set detach-on-fork off
    (gdb) nexti
    [New inferior 2 (process 719946)]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    52      in ../sysdeps/unix/sysv/linux/arch-fork.h
    (gdb) c
    Continuing.

What is the reason behind it? Thanks.
                                

CarloC (385 rep)

Nov 26, 2024, 01:20 PM • Last activity: Nov 26, 2024, 01:52 PM

0 votes

1 answers

47 views

Are reads of /proc/pid/environ atomic in Linux 6.x (e.g: 6.1.99)?

linux process proc exec fork

When a process execs, looking at kernel code for environ_read(), it seems that if the mm_struct doesn't yet exist / is null or the env_end member of that mm_struct is null, environ_read() will return 0 ~immediately. My question is, are there protections WRT fork/exec races such that (pseudo-code ahe...

                                  When a process execs, looking at kernel code for environ_read(), it seems that if the mm_struct doesn't yet exist / is null or the env_end member of that mm_struct is null, environ_read() will return 0 ~immediately.

My question is, are there protections WRT fork/exec races such that (pseudo-code ahead)

    if ((pid = fork) != 0)
      execv*("/bin/exe", {"exe", "-l"}, &envp)
      read("/proc/${pid}/environ")

Cannot:

**A:** erroneously read a zero-length env due to races with execve and the subsequent read (e.g: assuming the user space program that issues the read is multi-threaded or performing asynchronous IO)

**B:** erroneously read a partial env (assuming user-space code is not causing a short read due to a bug in user space code)

**C:** erroneously read the parent's env

Are reads from /p/p/environ atomic?

Gregg Leventhal (109 rep)

Nov 3, 2024, 02:51 PM • Last activity: Nov 19, 2024, 02:23 PM

6 votes

3 answers

6170 views

Why forking is used in a unit file of a service?

linux systemd fork

My nginx unitfile is following, ``` [root@arif ~]# cat /usr/lib/systemd/system/nginx.service [Unit] Description=The nginx HTTP and reverse proxy server After=network.target remote-fs.target nss-lookup.target [Service] Type=forking PIDFile=/run/nginx.pid # Nginx will fail to start if /run/nginx.pid a...

My nginx unitfile is following,

[root@arif ~]# cat /usr/lib/systemd/system/nginx.service
[Unit]
Description=The nginx HTTP and reverse proxy server
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
PIDFile=/run/nginx.pid
# Nginx will fail to start if /run/nginx.pid already exists but has the wrong
# SELinux context. This might happen when running nginx -t from the cmdline.
# https://bugzilla.redhat.com/show_bug.cgi?id=1268621 
ExecStartPre=/usr/bin/rm -f /run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t
ExecStart=/usr/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
KillSignal=SIGQUIT
TimeoutStopSec=5
KillMode=process
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Here, in the [Service] portion, the value of Type is equal to forking which means from here , > The process started with ExecStart spawns a child process that becomes the main process of the service. The parent process exits when the startup is complete. My questions are, - Why a service does that? - What are the advantages for doing this? - What's wrong is Type=simple or other similar options?

arif (1589 rep)

Mar 4, 2019, 07:43 PM • Last activity: Jul 22, 2024, 09:49 AM

0 votes

0 answers

74 views

Understanding AFL behaviour for fork and execv; Why `/proc/<pid>/maps` does not show the loaded binary

linux exec fork

TL;DR Why process map in `/proc/ /maps` does not show where the executed binary is loaded? I am trying to do some post-mortem analysis of the fuzzed program once it finishes. Basically what I am seeing is that `/proc/ /maps` shows what looks to be the memory map of the parent instead of the child. I...

TL;DR Why process map in /proc//maps does not show where the executed binary is loaded? I am trying to do some post-mortem analysis of the fuzzed program once it finishes. Basically what I am seeing is that /proc//maps shows what looks to be the memory map of the parent instead of the child. I was unable to replicate the behavior on a smaller scale but I'll provide a patch to [github.com/google/AFL](https://github.com/google/AFL) . (Add an empty raw after the else last row if patch fails)

diff --git a/afl-fuzz.c b/afl-fuzz.c
index 46a216c..e31125f 100644
--- a/afl-fuzz.c
+++ b/afl-fuzz.c
@@ -2283,6 +2283,14 @@ EXP_ST void init_forkserver(char** argv) {

 }

+static void read_map(int pid, char *map) {
+    FILE *proc;
+    char path;
+    sprintf(path, "/proc/%d/maps", pid);
+    proc = fopen(path, "r");
+    fread(map, 4096, 1, proc);
+    fclose(proc);
+}

 /* Execute target application, monitoring for timeouts. Return status
    information. The called program will update trace_bits[]. */
@@ -2423,7 +2431,14 @@ static u8 run_target(char** argv, u32 timeout) {

   if (dumb_mode == 1 || no_forkserver) {

-    if (waitpid(child_pid, &status, 0)  input/test1
./afl-fuzz -i input -o output -n -- /bin/ls

What you'll probably is the map for afl-fuzz? The patch is waiting for the execv to finish while reading the /proc//maps, once it's finished prints the last map that was read. I'm curious what I am missing in me reading the map or what triggers this. (Also I don't think this is regarding AFL itself is just where I've seen this, it would be nice to see a smaller program that replicates the behavior since I was unable to replicate it)

sorin the turtle (1 rep)

May 19, 2024, 05:31 PM • Last activity: May 22, 2024, 09:37 AM

0 votes

1 answers

53 views

What Causes PHP Forks to Consolidate on a Single CPU Core in FreeBSD 13.3?

freebsd fork

I'm using a PHP 8.x script to process a series of images, performing various conversions and resizing tasks. To make the most of the server's multiple cores, I employ the pcntl_fork() function to create child processes that can simultaneously handle different images. This means instead of processing...

                                  I'm using a PHP 8.x script to process a series of images, performing various conversions and resizing tasks. To make the most of the server's multiple cores, I employ the pcntl_fork() function to create child processes that can simultaneously handle different images. This means instead of processing images sequentially, each image can be processed concurrently on separate cores.

For instance, if I have 10 images to process and each takes 3 seconds individually, without parallel processing, it would take a total of 30 seconds. However, with parallel processing, all 10 images can finish processing simultaneously in just 3 seconds.

This approach has been effective until we updated to FreeBSD 13.3. After the update, the forked processes no longer distribute across different cores; instead, they all run on a single core. Consequently, if I have 10 forked processes running, each is constrained to using only 10% of a single core, resulting in a 10-fold increase in processing time.

We've conducted tests with FreeBSD versions ranging from 9.x up to 13.2-RELEASE-p11 and found that the issue doesn't occur. Additionally, when using a 13.2 userland and temporarily booting the 13.3 kernel, the problem still doesn't manifest. However, when both the userland and kernel are updated to 13.3, the problem consistently occurs.

Further tests with a fresh installation of FreeBSD 14.0 on a separate system confirm that the issue persists there as well.

We've also ruled out PHP version as a factor, as testing across versions 8.0 to 8.3 yields the same results.

Do you have any insights into what might be causing this issue, or suggestions for resolving it?

Edit: Adding MRE code, as suggested in the comments:

    readImage($fullLocalImagePath);
        echo " → Finished reading image $i into Imagick.\n";
        $imagick->setImageCompressionQuality(88);
        $imagick->resizeImage(4800, 4800, imagick::FILTER_LANCZOS, .9, false);
        $imagick->writeImage($finalImagePath);
        echo " → → Finished resizing and saving image $i into Imagick.\n";
        $imagick->clear();
    
        exit(0); // Exit the child process after processing the image
      }
    }
    
    // Wait for the forked processes to finish
    while ($childPID = pcntl_waitpid(0, $status)) {
      if ($childPID == -1) {
        // No child processes left to wait for
        break;
      }
    
      echo " → → → Child process $childPID has finished.\n";
    
      // Handle the exit status based on the child process PID
      if (in_array($childPID, $forkedProcessIds)) {
        // Remove the child process ID from the tracking array
        $forkedProcessIds = array_diff($forkedProcessIds, array($childPID));
      }
    }
    ?>
                                

Adam Ellis (31 rep)

May 2, 2024, 10:21 PM • Last activity: May 8, 2024, 10:02 AM

-3 votes

1 answers

130 views

why don't we create processes from scratch and make it a "fork", in linux?

linux process fork

why in Linux do we not create a process from scratch, as "init" is created, but create a "fork" of it by the method of branching?

                                  why in Linux do we not create a process from scratch, as "init" is created, but create a "fork" of it by the method of branching?
                                

Шмига Дарина (5 rep)

Apr 17, 2024, 08:54 PM • Last activity: Apr 18, 2024, 05:13 AM

0 votes

0 answers

140 views

fork() is very slow

ubuntu performance fork clone

I have a Linux server running Ubuntu 18.04 on a VM. Executing any task like `ls -l` or `w` frequently takes several seconds to finish. `strace -c ls -l` says `ls` only takes a few milliseconds, but running `strace -c strace -c ls -l` a bunch of times until the problem occurred told me that the clone...

I have a Linux server running Ubuntu 18.04 on a VM. Executing any task like ls -l or w frequently takes several seconds to finish. strace -c ls -l says ls only takes a few milliseconds, but running strace -c strace -c ls -l a bunch of times until the problem occurred told me that the clone() syscall is causing the problem:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.33    1.374858      687429         2           clone
  0.31    0.004273           8       548         1 wait4
  0.20    0.002701           2      1088           ptrace
  0.10    0.001416           1      1096           rt_sigprocmask
  0.01    0.000130           6        22           mmap
...
------ ----------- ----------- --------- --------- ----------------
100.00    1.384096                  2893        15 total

Every diagnostic I've run seemed OK: RAM/CPU usage, process/thread count, IO performance, etc. Everything is up to date. The problem randomly started occurring a few weeks ago and persisted after rebooting. Does anybody know of anything that could be causing this issue? The server is pretty much useless in this current state.

Call of Guitar (1 rep)

Jan 17, 2024, 12:33 AM • Last activity: Jan 17, 2024, 12:37 AM

1 votes

4 answers

10497 views

What after exec() in ls command: Is the parent process printing the output to the console or the child?

process fork ipc exec

I have a simple doubt on execution of the command `ls`. As per my understanding from the research I have done on the internet, I understood the below points. 1. When we type `ls` command shell interprets that command. 2. Then the shell process forks and creates the child process and the parent (shel...

                                  I have a simple doubt on execution of the command ls. As per my understanding from the research I have done on the internet, I understood the below points.

1. When we type ls command shell interprets that command.

2. Then the shell process forks and creates the  child process and the parent (shell) executes the wait() system call, effectively putting itself to sleep until the child exits.

3. Child process inherits all the open file descriptors and the environment.

4. The child process (shell) executes an exec() of the ls program, causing the ls binary being loaded from the disk (filesystem) and being executed in the same process.

5. When the ls program runs to completion, it calls exit(), and the kernel sends a signal to its parent indicating the child has terminated.

My doubt starts from here onwards, as soon as ls finishes its tasks; does it send the result back to the parent process, or does it display the output to the screen?
If it sends the output back to parent, then is it using pipe() implicitly?

Subi Suresh (513 rep)

Mar 14, 2013, 06:28 PM • Last activity: Nov 28, 2023, 04:46 PM

26 votes

5 answers

32719 views

After fork(), where does the child begin its execution?

process c fork api

I'm trying to learn UNIX programming and came across a question regarding fork(). I understand that fork() creates an identical process of the currently running process, but where does it start? For example, if I have code int main (int argc, char **argv) { int retval; printf ("This is most definite...

                                  I'm trying to learn UNIX programming and came across a question regarding fork(). I understand that fork() creates an identical process of the currently running process, but where does it start? For example, if I have code

    int main (int argc, char **argv)
    {
        int retval;
        printf ("This is most definitely the parent process\n");
        fflush (stdout);
        retval = fork ();
        printf ("Which process printed this?\n");
    
        return (EXIT_SUCCESS);
    }

The output is:

~~~lang-text
This is most definitely the parent process  
Which process printed this?  
Which process printed this?
~~~

I thought that fork() creates a same process, so I initially thought that in that program, the fork() call would be recursively called forever. I guess that new process created from fork() starts after the fork() call?

If I add the following code, to differentiate between a parent and child process,

    if (child_pid = fork ())
        printf ("This is the parent, child pid is %d\n", child_pid);
    else
        printf ("This is the child, pid is %d\n", getpid ());

after the fork() call, where does the child process begin its execution?
                                

thomas1234

Nov 28, 2010, 02:36 AM • Last activity: Nov 24, 2023, 06:23 AM

0 votes

3 answers

3702 views

Parent process always printing output after child

solaris c system-calls exec fork

Consider the following code running under Solaris 11.3: int main(void) { pid_t pid = fork(); if (pid > 0) { printf("[%ld]: Writing from parent process\n", getpid()); } if (pid == 0) { execl("/usr/bin/cat", "/usr/bin/cat", "file.c", (char *) 0); perror("exec failed"); exit(1); } } Whenever I run it,...

                                  Consider the following code running under Solaris 11.3:

    int main(void) {
        pid_t pid = fork();
        if (pid > 0) {
            printf("[%ld]: Writing from parent process\n", getpid());
        }
        if (pid == 0) {
            execl("/usr/bin/cat", "/usr/bin/cat", "file.c", (char *) 0);
            perror("exec failed");
            exit(1);
        }
     }

Whenever I run it, the "Writing from parent" line is always output last. I would not be surprised with that result if my school task wasn't to use wait(2) in order to print that line only after the child process has finished.
Why does this happen and how to ensure that this line is printed before the child process executes cat (or the order is at least undefined), so I can safely use wait(2) or waitpid(2) to tackle that?
                                

Dmitry Serov (103 rep)

Mar 9, 2016, 05:21 PM • Last activity: Nov 20, 2023, 02:48 PM

4 votes

0 answers

58 views

Perl's `kill` is using `$! == Errno::EINTR` unexpectedly

perl kill signals fork

I wrote a network daemon that forks off children to handle TCP connections. On `SIGINT` the main process triggers a `kill` for each child in order to clean up and to collect some final statistics. In almost all cases that works fine, and the child processes terminate really fast. However occasionall...

                                  I wrote a network daemon that forks off children to handle TCP connections.
On SIGINT the main process triggers a kill for each child in order to clean up and to collect some final statistics.

In almost all cases that works fine, and the child processes terminate really fast.
However occasionally a child process just refuses to die within a short timeout (like 5 seconds).

I had no idea what happened then, so I added some verbose output to diagnose that case.
I found out that using netcat to open a connection, then suspending that netcat process, *sometimes* causes the effect.

When I was able to reproduce the effect the debug output was:
~~~lang-text
REST-server(cleanup_queue): deleting children
REST-server(cleanup_queue): deleting PID 23344 handling localhost:48114
child_delete: Killing child 23344
child_delete: killed child with PID 23344
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting up to 5 seconds for condition
_limited_wait(PID 23344 terminated): waiting 0.02 (of 5 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 0.04 (of 4.98 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 0.08 (of 4.94 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 0.16 (of 4.86 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 0.32 (of 4.7 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 0.64 (of 4.38 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 1.28 (of 3.74 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 2.46 (of 2.46 remaining) seconds
(r1, r2) = (1, Interrupted system call)
child_delete: PID 23344 refused to terminate within 5s
failed to delete child PID 23344
~~~
The "condition" to wait for in that case was the result of this closure:
~~~lang-perl
sub {
    my $r1 = kill(0, $child_pid);
    my $r2 = $!;
    print "(r1, r2) = ($r1, $r2)\n";
    $r1 != 1 && $r2 == Errno::ESRCH;
}
~~~
So the expected outcome would be that the main process is unable to "kill" the PID, because it does no longer exist (and not because of a "permission denied").

However for some reasons I get an "Interrupted system call" repeatedly.

The main process uses signal handlers like this:
~~~lang-perl
$SIG{'INT'} = $SIG{'TERM'} = sub ($) {
    my $signal = 'SIG' . $_;
    my $me = "signal handler[$$, $signal]";

    print "$me: cleaning up\n"
        if ($verbose > 0);
    cleanup();
    print "$me: executing default action\n"
        if ($verbose > 1);
    $SIG{$_} = 'DEFAULT';
    kill($_, $$);                    # execute default action
};
~~~
And when forking a child process, I reset the signal handlers like this:
~~~lang-perl
sub child_create($)
{
    my ($child) = @_;
    my $pid;

    reaper(0);                          # disable for the child
    if ($pid = fork()) {                # parent
        reaper(1);                      # enable for the parent
    } elsif (defined($pid)) {           # child
        my ($child_fun, @child_param) = @$child;
        my $ret;

        # prevent double-cleanup
        $SIG{'INT'} = $SIG{'TERM'} = $SIG{'__DIE__'} = 'DEFAULT';
        $ret = $child_fun->(@child_param);
        exit($ret);                     # avoid returning from function call
    } else {                            # error
        print STDERR "child_create: fork(): $!\n";
    }
    return $pid;
}
~~~
The reaper() just handles SIGCHLD.

What could cause the effect seen?
The child processes basically do a while (defined(my $req = $conn->get_request)) {...} (using HTTP::Daemon), so they should be waiting for input in the netcat case.

Additional info
---------------
Just in case it might matter: OS is SLES12 SP5 (using Perl 5.18.2) running on VMware.

The code in the main server loop looks like this:
~~~lang-perl
while (defined(my $conn = $daemon->accept) || $! == Errno::EINTR) {
    my $errno = $!;

    if ($quit_flag != 0) {
        last;
    }
    if ($errno == Errno::EINTR) {
        next;
    }
    #... handle $req->uri->path()
}
~~~

                                

U. Windl (1715 rep)

Nov 20, 2023, 12:32 PM • Last activity: Nov 20, 2023, 12:57 PM

0 votes

0 answers

53 views

Forking from a systemd service

systemd daemon fork

I have a python program that runs as a systemd user service. From that program, I launch external commands via `subprocess.Popen(cmd, close_fds=True, start_new_session=True)`. My intention is for these new processes to stay running even after the parent service stops or restarts. This works complete...

                                  I have a python program that runs as a systemd user service. From that program, I launch external commands via subprocess.Popen(cmd, close_fds=True, start_new_session=True). My intention is for these new processes to stay running even after the parent service stops or restarts. This works completely fine when I run my program directly from the terminal, but when it's run as a systemd service, these processes get killed along with the parent program upon service restart.

I tried nohup and double forking to no avail. I noticed that when double forking, the child process ends up having the systemd --user as a parent, not init as is the case when run simply from a terminal.

Am I missing something obvious? Is there a better way for a systemd service to launch external programs so that they're independent from it? Let me know if there's more info I should provide.

czert (1 rep)

Nov 8, 2023, 04:07 PM

55 votes

4 answers

23975 views

What's the difference between running a program as a daemon and forking it into background with '&'?

shell process daemon background-process fork

What are the practical differences from a sysadmin point of view when deploying services on a unix based system?

                                  What are the practical differences from a sysadmin point of view when deploying services on a unix based system?

                                

user1561108 (1081 rep)

Nov 23, 2012, 07:24 PM • Last activity: Oct 15, 2023, 02:38 PM

62 votes

4 answers

98942 views

Creating threads fails with “Resource temporarily unavailable” with 4.3 kernel

linux docker limit fork thread

I am running a docker server on Arch Linux (kernel 4.3.3-2) with several containers. Since my last reboot, both the docker server and random programs within the containers crash with a message about not being able to create a thread, or (less often) to fork. The specific error message is different d...

                                  I am running a docker server on Arch Linux (kernel 4.3.3-2) with several containers. Since my last reboot, both the docker server and random programs within the containers crash with a message about not being able to create a thread, or (less often) to fork. The specific error message is different depending on the program, but most of them seem to mention the specific error Resource temporarily unavailable. See at the end of this post for some example error messages.

Now there are plenty of people who have had this error message, and plenty of responses to them. What’s really frustrating is that everyone seems to be speculating how the issue could be resolved, but no one seems to point out how to _identify_ which of the many possible causes for the problem is present.

I have collected these 5 possible causes for the error and how to verify that they are not present on my system:

1. There is a system-wide limit on the number of threads configured in /proc/sys/kernel/threads-max ([source](https://stackoverflow.com/a/22570554/242365)) . In my case this is set to 60613.
2. Every thread takes some space in the stack. The stack size limit is configured using ulimit -s ([source](https://stackoverflow.com/a/9211891/242365)) . The limit for my shell used to be 8192, but I have increased it by putting * soft stack 32768 into /etc/security/limits.conf, so it ulimit -s now returns 32768. I have also increased it for the docker process by putting LimitSTACK=33554432 into /etc/systemd/system/docker.service ([source](https://sskaje.me/systemd-ulimit/) , and I verified that the limit applies by looking into /proc//limits and by running ulimit -s inside a docker container.
3. Every thread takes some memory. A virtual memory limit is configured using ulimit -v. On my system it is set to unlimited, and 80% of my 3 GB of memory are free.
4. There is a limit on the number of processes using ulimit -u. Threads count as processes in this case ([source](https://superuser.com/a/568943/23403)) . On my system, the limit is set to 30306, and for the docker daemon and inside docker containers, the limit is 1048576. The number of currently running threads can be found out by running ls -1d /proc/*/task/* | wc -l or by running ps -elfT | wc -l ([source](http://rudametw.github.io/blog/posts/2014.04.10/not-enough-threads.html)) . On my system they are between 700 and 800.
5. There is a limit on the number of open files, which according to some [source](http://dimitrik.free.fr/blog/archives/2010/11/mysql-performance-hitting-error-cant-create-a-new-thread-errno-11-on-a-high-number-of-connections.html)s  is also relevant when creating threads. The limit is configured using ulimit -n. On my system and inside docker, the limit is set to 1048576. The number of open files can be found out using lsof | wc -l ([source](http://www.cyberciti.biz/tips/linux-procfs-file-descriptors.html)) , on my system it is about 30000.

It looks like before the last reboot I was running kernel 4.2.5-1, now I’m running 4.3.3-2. Downgrading to 4.2.5-1 fixes all the problems. Other posts mentioning the problem are [this](https://lkml.org/lkml/2015/11/24/203)  and [this](https://bbs.archlinux.org/viewtopic.php?pid=1593364) . I have opened a [bug report for Arch Linux](https://bugs.archlinux.org/task/47662) .

What has changed in the kernel that could be causing this?

----

Here are some example error messages:

    Crash dump was written to: erl_crash.dump
    Failed to create aux thread
 

    Jan 07 14:37:25 edeltraud docker: runtime/cgo: pthread_create failed: Resource temporarily unavailable
 

    dpkg: unrecoverable fatal error, aborting:
     fork failed: Resource temporarily unavailable
    E: Sub-process /usr/bin/dpkg returned an error code (2)
 

    test -z "/usr/include" || /usr/sbin/mkdir -p "/tmp/lib32-popt/pkg/lib32-popt/usr/include"
    /bin/sh: fork: retry: Resource temporarily unavailable
     /usr/bin/install -c -m 644 popt.h '/tmp/lib32-popt/pkg/lib32-popt/usr/include'
    test -z "/usr/share/man/man3" || /usr/sbin/mkdir -p "/tmp/lib32-popt/pkg/lib32-popt/usr/share/man/man3"
    /bin/sh: fork: retry: Resource temporarily unavailable
    /bin/sh: fork: retry: No child processes
    /bin/sh: fork: retry: Resource temporarily unavailable
    /bin/sh: fork: retry: No child processes
    /bin/sh: fork: retry: No child processes
    /bin/sh: fork: retry: Resource temporarily unavailable
    /bin/sh: fork: retry: Resource temporarily unavailable
    /bin/sh: fork: retry: No child processes
    /bin/sh: fork: Resource temporarily unavailable
    /bin/sh: fork: Resource temporarily unavailable
    make: *** [install-man3] Error 254
 

    Jan 07 11:04:39 edeltraud docker: time="2016-01-07T11:04:39.986684617+01:00" level=error msg="Error running container:  System error: fork/exec /proc/self/exe: resource temporarily unavailable"
 

    [Wed Jan 06 23:20:33.701287 2016] [mpm_event:alert] [pid 217:tid 140325422335744] (11)Resource temporarily unavailable: apr_thread_create: unable to create worker thread
                                

cdauth (1487 rep)

Jan 7, 2016, 03:16 PM • Last activity: Aug 17, 2023, 10:48 AM

0 votes

1 answers

203 views

What's the difference between "-dm" and "-Dm" in GNU Screen?

shell-script shell process gnu-screen fork

The GNU Screen manual says: ``` `-d -m' Start `screen' in _detached mode. This creates a new session but doesn't attach to it. This is useful for system startup scripts. `-D -m' This also starts `screen' in _detached_ mode, but doesn't fork a new process. The command exits if the session terminates....

The GNU Screen manual says:

`-d -m'
          Start `screen' in _detached mode. This creates a new session
          but doesn't attach to it. This is useful for system startup
          scripts.

    `-D -m'
          This also starts `screen' in _detached_ mode, but doesn't fork
          a new process. The command exits if the session terminates.

-dm is pretty clear to me: - screen forks a new process to run the provided command (or a shell if nothing was specified). - By "fork" it means that weird Schrödinger's system call in which the source code doesn't know if it's the parent or the child until the return value is observed. - And this new process is recognized by screen as something that can be attached. I noticed that -dm returns control of the shell, but -Dm blocks. So my question is: - Why does -Dm block? And how is that related to its lack of forking? - What does it do instead of forking? I think it still creates a new process, because "detached mode" suggests a process identifiable by a PID which can be attached. - What's the use case of -Dm instead of -dm? Thanks!

Sebastian Carlos (262 rep)

Jul 31, 2023, 09:08 PM • Last activity: Aug 1, 2023, 09:24 AM

Showing page 1 of 20 total questions