Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
3
votes
1
answers
49
views
File acess permissions missing after setuid() system call
I have a file access problem in a self developed daemon process after a setuid() system call. I already post this question to [SO][1] but the impression is that the problem is not C++ related but Linux related and so maybe there is someone here who could help me solving it. My daemon program cannot...
I have a file access problem in a self developed daemon process after a setuid() system call. I already post this question to SO but the impression is that the problem is not C++ related but Linux related and so maybe there is someone here who could help me solving it.
My daemon program cannot access a configuration file after a setuid(iUid) systemcall even though iUid is owner of the configuration file. Why?
I am writing a controller daemon in C++ for home automation which finally will run on an raspberry pi with Raspberry Pi OS. It is started with root permissions as after start it should read an SSL certifacate which only root is granted read access. After the SSL certifacte is read the daemon should switch to user 'pvmonitor' as root permissions are no longer needed. This is done by
setuid( iUid );
and I have checked with ps that the process runs as user 'pvmonitor'.
The configuration file for this daemon is located at /etc/SmartHome/converd.conf and is owned by user pvmonitor.
ls -la /etc/SmartHome/
total 24
drwxrwx---+ 2 pvmonitor www-data 4096 Jul 17 20:07 .
drwxr-xr-x+ 107 root root 4096 Jul 17 20:07 ..
-rw-r-----+ 1 pvmonitor www-data 705 Jul 17 20:07 coverd.conf
The raspberry pi is booted from network and the file system is mounted from a NAS which provides an ACL. Also ACL grants access permission to user pvmonitor:
getfacl /etc/
getfacl: Removing leading '/' from absolute path names
# file: etc/
# owner: root
# group: root
user::rwx
[...]
group::---
group:users:rwx #effective:r-x
group:www-data:r-x
mask::r-x
other::r-x
[...]
getfacl /etc/SmartHome/
getfacl: Removing leading '/' from absolute path names
# file: etc/SmartHome/
# owner: pvmonitor
# group: www-data
user::rwx
[...]
user:pvmonitor:rwx
[...]
group::---
[...]
group:www-data:r-x
mask::rwx
other::---
[...]
getfacl /etc/SmartHome/coverd.conf
getfacl: Removing leading '/' from absolute path names
# file: etc/SmartHome/coverd.conf
# owner: pvmonitor
# group: www-data
user::rw-
[...]
user:pvmonitor:rwx #effective:r--
[...]
group::---
[...]
group:www-data:r-x #effective:r--
mask::r--
other::---
In addition the output of stat:
stat /etc
File: /etc
Size: 4096 Blocks: 16 IO Block: 4096 directory
Device: 0,22 Inode: 74579976 Links: 107
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2024-12-03 22:14:03.809660810 +0100
Modify: 2025-07-17 20:07:13.645754180 +0200
Change: 2025-07-17 20:07:13.645754180 +0200
Birth: -
stat /etc/SmartHome/
File: /etc/SmartHome/
Size: 4096 Blocks: 16 IO Block: 4096 directory
Device: 0,22 Inode: 74581572 Links: 2
Access: (0770/drwxrwx---) Uid: ( 1004/pvmonitor) Gid: ( 133/www-data)
Access: 2025-07-17 20:06:03.525754180 +0200
Modify: 2025-07-17 20:07:08.395754180 +0200
Change: 2025-07-17 20:35:52.235754180 +0200
Birth: -
stat /etc/SmartHome/coverd.conf
File: /etc/SmartHome/coverd.conf
Size: 705 Blocks: 16 IO Block: 131072 regular file
Device: 0,22 Inode: 74581810 Links: 1
Access: (0640/-rw-r-----) Uid: ( 1004/pvmonitor) Gid: ( 133/www-data)
Access: 2025-07-17 20:07:08.395754180 +0200
Modify: 2025-07-17 20:07:08.395754180 +0200
Change: 2025-07-18 09:33:38.783696180 +0200
Birth: -
With
sudo -u pvmonitor less /etc/SmartHome/coverd.conf
I can read the configuration file without any problem.
But when I try to open the configuration file in my daemon process after the setuid(); command I get an "permission denied" error. Here is a minimum reproducable example which is based on excerpts of my daemons code:
#include
#include
#include
#include
const char *ptConfigFile = "/etc/SmartHome/coverd.conf";
void printConfig( void )
{
std::cout << "Try to open file " << ptConfigFile << std::endl;
FILE *ptfTest;
ptfTest = fopen( ptConfigFile, "r" );
if (ptfTest != nullptr)
{
char sLine;
while (!feof(ptfTest))
{
fgets(sLine,1023,ptfTest);
std::cout << sLine;
}
fclose( ptfTest );
}
else
perror( "Failed to open file" );
}
int main(int argc, char **argv )
{
int iUid = 1004;
std::cout << "User id is now " << getuid() << std::endl;
printConfig();
std::cout << "Switch to user id " << iUid << std::endl;
if (iUid == 0 || setuid(iUid)== 0)
{
std::cout << "User id is now " << getuid() << std::endl;
printConfig();
return 0;
}
std::cerr << "Could not switch user id." << std::endl;
return -1;
}
1004 is the user id of user pvmonitor. The output of this example is:
sudo ./test
User id is now 0
Try to open file /etc/SmartHome/coverd.conf
CERTFILE=[...]
[...]
Switch to user id 1004
User id is now 1004
Try to open file /etc/SmartHome/coverd.conf
Failed to open file: Permission denied
In addition here is the output when I run the test program with strace:
sudo strace ./test
execve("./test", ["./test"], 0x7fc90538b0 /* 13 vars */) = 0
[...]
setuid(1004) = 0
getuid() = 1004
write(1, "User id is now 1004\n", 20User id is now 1004
) = 20
write(1, "Try to open file /etc/SmartHome/"..., 44Try to open file /etc/SmartHome/coverd.conf
) = 44
openat(AT_FDCWD, "/etc/SmartHome/coverd.conf", O_RDONLY) = -1 EACCES (Permission denied)
dup(2) = 3
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
newfstatat(3, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x2), ...}, AT_EMPTY_PATH) = 0
write(3, "Failed to open file: Permission "..., 39Failed to open file: Permission denied
) = 39
close(3) = 0
exit_group(0) = ?
What am I doing wrong?
Holger
(33 rep)
Jul 17, 2025, 06:37 PM
• Last activity: Jul 18, 2025, 12:24 PM
-1
votes
1
answers
2182
views
How does an open(at) syscall result in a file being written to disk?
I'm trying to learn as as much as I can about about the interplay between syscalls, the VFS, device driver handling and ultimately, having the end device do some operation. I thought I would look at a fairly trivial example - creating a file - and try to understand the underlying process in as much...
I'm trying to learn as as much as I can about about the interplay between syscalls, the VFS, device driver handling and ultimately, having the end device do some operation. I thought I would look at a fairly trivial example - creating a file - and try to understand the underlying process in as much detail as possible.
I created a bit of C, to do little more than open a (non-existing) file for writing, compiled this (without optimization), and took a peek at it with strace when I ran it. In particular, I wanted to focus on the
openat
syscall, and why and how this call was ultimately able to not only create the file object / file description, but also actually do the writing to disk (for reference, EXT4 file system, SATA HDD).
Broadly speaking, excluding some of the checks and ancilliary bits and pieces, my understanding of the process is as follows (and please correct me if I'm way off!):
- ELF is mapped into memory
- libc is mapped into memory
- fopen
is called
- libc does its open
- openat
syscall is called, with the O_CREAT
flag among others
- Syscall number is put into RAX register
- Syscall args (e.g. file path, etc.) are put into RDI register (and RSI, RDX, etc. as appropriate)
- Syscall instruction issue, and CPU transition to ring 0
- System_call code pointed to by MSR_LSTAR register invoked
- registers pushed to kernel stack
- Function pointer from RAX called at offset into sys_call_table
- asmlinkage
wrapper for the actual openat
syscall code is invoked
And at that point my understanding is lacking, but ultimately, I know that:
1. The open call returns a file descriptor, which is unique to the process, and maintained globally within the kernel's file descriptor table
2. The FD maps to a file description file object
3. The file object is populated with, among other structures, inode structure, inode_operations, file_operations, etc.
4. The file operations table should map generic syscalls to the respective device drivers to handle the respective calls (such that, for example, when a write
syscall is called, the respective driver write call is called instead for the device on which the file resides, e.g. a SCSI driver)
5. This mapping is based on the major/minor numbers for that file/device
6. Somewhere along the line, code is invoked which causes a instructions to be send to the device drive for the hard drive, which gets send to the disk controller, which causes a file to be written to the hard disk, though whether this is via interrupts or DMA, or some other method of I/O I'm not sure
7. Ultimately, the disk controller sends a message back to the kernel to say it's done, and the kernel returns control back to use space.
I'm not too good at following the kernel source, though I've tried a little, but feel there's a lot I'm missing. My questions are as follows:
I've found some functions which return, and destroy FDs in the kernel source, but can't find where the code is which actually populates the file object / file description for the file.
A) On an open
or openat
syscall, when a new file is created, how is the file structure populated? Where does the data come from? Specifically, how are the file_operations and inode_operations, etc. for that file populated? how does the kernel know, when populating that structure, that the file operations for this particular file need to be that of the SCSI driver, for instance?
B) Where in the process - and particularly with reference to the source - does the switch to the device driver happen? For example, if an ioctl
or similar was called, I would expect to some reference to the instruction to be called for the respective device, and some memory address for the data to be passed on, but I can't find where that happens.
From looking at the kernel source, all I can really find is code that assigns a new FD, but nothing that populates the file structure, no anything that calls the respective file operations to transfer control to a device driver.
Apologies that this is a really long-winded description, but I'm really trying to learn as much as possible, and although I have a basic grasp of C, I really struggle to understand others' code.
Hoping someone with greater knowledge than I can help clarify some of these things for me, as I seem to have hit a figurative brick wall. Any advice would be greatly appreciated.
**Edit:**
Hopefully the following points will clarify what technical detail I'm after.
- The open
or openat
syscalls take a file path, and flags (with the latter also being passed an FD pointing to a directory)
- When the O_CREAT
flag is also passed, the file is 'created' if it doesn't exist
- Based on the file path, the kernel is able to identify the device type this file should be
- The device type is identified from the major/minor numbers ordinarily - for a file that already exists, these are stored in the inode structure for the file (as member i_rdev
) and the stat structure for the file (as members st_dev
for the device type of the file system on which the file resides, and st_rdev
for the device type of the file itself)
So really, my questions are:
1. When a file is created with either of the open syscalls, the respective inode and stat structure must also be created and populated - how do the open syscalls do this (when all they have to go on at this point is the file path, and flags? Do they look at the inode or stat structure of the parent directory, and copy the relevant structure members from this?
2. At which point (i.e. where in the source) does this happen?
3. It's my understanding that when these open syscalls are invoked, it needs to know the device type, in order for the VFS to know what device driver code to invoke. On creating a new file, where the device type has yet to be set in the file object structures, what happens? What code is called?
4. Is the sequence more like:
user process tries to open new file -> open('/tmp/foo', O_CREAT)
open -> look up structure for '/tmp', get its device type -> get unused FD -> populate inode/stat structure, including setting device type to that of parent -> based on device type, map file operations / inode operations to device driver code -> call device driver code for open
syscall -> send appropriate instruction to disk controller to write new file to disk -> tidy up things, do checks, etc. -> return FD to user calling process?
genericuser99
(119 rep)
Aug 25, 2020, 10:19 PM
• Last activity: Jul 6, 2025, 03:04 PM
7
votes
1
answers
473
views
Syscalls required by glibc calls
Are there any lists compiled that provide a list of linux system calls used per function in a standard glibc build? For example, `free()` requires `mmap`, `munmap`, `mprotect`, `prlimit64`, and `brk`. If necessary I can figure this out by grepping the source code or some strace wizardry, but I don't...
Are there any lists compiled that provide a list of linux system calls used per function in a standard glibc build?
For example,
free()
requires mmap
, munmap
, mprotect
, prlimit64
, and brk
.
If necessary I can figure this out by grepping the source code or some strace wizardry, but I don't want to reinvent the wheel. I've been searching the web for about a week with no avail, mostly just turning up info on system call wrappers.
I am aware that officially there is no such certainty, but I know from practical experience that this info changes for most functions very rarely.
I asked this on Stack Overflow, but this forum seems more appropriate, since I am looking for documentation (which may or may not exist).
Thanks
user30972097
(73 rep)
Jul 5, 2025, 11:20 PM
• Last activity: Jul 6, 2025, 01:56 PM
178
votes
6
answers
663363
views
How to find application's path from command line?
For example, I have `git` installed on my system. But I don't remember where I installed it, so which command is fit to find this out?
For example, I have
git
installed on my system.
But I don't remember where I installed it, so which command is fit to find this out?
Anders Lind
(2525 rep)
Jan 7, 2012, 07:33 PM
• Last activity: Jun 17, 2025, 10:08 PM
0
votes
1
answers
2501
views
Get executable name in syscall?
So I am writing a system call in Linux. I want to print a message that looks like ``` printk(KERN_DEBUG "PROC_DEBUG [%s, %s]: %s", executable, current->pid, message); ``` Where executable is the name of the executable that is created when I link a source file against the library used to call the sys...
So I am writing a system call in Linux. I want to print a message that looks like
printk(KERN_DEBUG "PROC_DEBUG [%s, %s]: %s", executable, current->pid, message);
Where executable is the name of the executable that is created when I link a source file against the library used to call the syscall. So if I run the command "cc -o sourcefile.c -L ./a -lfilename", is what I want to print for the executable. (The pid is the process ID of the process that is running the executable, and message is a parameter I pass to the system call.)
I was trying to use this code to get the executable name
struct task_struct *task = get_current();
char task_com[TASK_COM_LEN];
get_task_com(task_com, task);
But I am getting an error that 'TASK_COM_LEN' is undeclared, so what am I missing?
Is there an easier way to get the executable name? Something like "current->executable"? I'm not finding any great matches on the web.
TheIncrediblyStupidOne
(1 rep)
Oct 1, 2022, 02:39 PM
• Last activity: May 27, 2025, 01:08 AM
0
votes
2
answers
142
views
What is the difference between user-space and kernel-space program/application?
I am currently learning about Kernels in operating system and I often come across the terms "user-space applications" and "programs"—especially in the context of the kernel's System Call Interface (SCI), which provides services to these user-space entities. Additionally, while studying the Linux boo...
I am currently learning about Kernels in operating system and I often come across the terms "user-space applications" and "programs"—especially in the context of the kernel's System Call Interface (SCI), which provides services to these user-space entities.
Additionally, while studying the Linux boot process, I read that the kernel starts the
init
process (or systemd
in modern distributions), which in turn starts user-space applications.
I'm a bit confused about what exactly qualifies as a user-space application or user-space program, also the kernel-space application or program too
lost_decimal
(9 rep)
May 21, 2025, 12:56 PM
• Last activity: May 21, 2025, 01:43 PM
40
votes
4
answers
8053
views
Why are Linux system call numbers in x86 and x86_64 different?
I know that the system call interface is implemented on a low level and hence architecture/platform dependent, not "generic" code. Yet, I cannot clearly see the reason why system calls in Linux 32-bit x86 kernels have numbers that are not kept the same in the similar architecture Linux 64-bit x86_64...
I know that the system call interface is implemented on a low level and hence architecture/platform dependent, not "generic" code.
Yet, I cannot clearly see the reason why system calls in Linux 32-bit x86 kernels have numbers that are not kept the same in the similar architecture Linux 64-bit x86_64? What is the motivation/reason behind this decision?
My first guess has been that a backgrounding reason has been to keep 32-bit applications runnable on a x86_64 system, so that via an reasonable offset to the system call number the system would know that user-space is 32-bit or 64-bit respectively. This is however not the case. At least it seems to me that read() being system call number 0 in x86_64 cannot be aligned with this thought.
Another guess has been that changing the system call numbers might have a security/hardening background, something I was not able to confirm myself.
Being ignorant to the challenges of implementation the architecture-dependent code parts, I still wonder **how changing the system call numbers**, when there seems no need (as even a 16-bit register would store largely more then the currently ~346 numbers to represent all calls), **would help to achieve anything, other than break compatibility** (though using the system calls through a library, libc, mitigates it).
humanityANDpeace
(15072 rep)
Jan 19, 2017, 04:12 PM
• Last activity: May 19, 2025, 02:53 AM
16
votes
4
answers
4052
views
Call a Linux syscall from a scripting language
I want to call a Linux syscall (or at least the libc wrapper) directly from a scripting language. I don't care what scripting language - it's just important that it not be compiled (the reason basically has to do with not wanting a compiler in the dependency path, but that's neither here nor there)....
I want to call a Linux syscall (or at least the libc wrapper) directly from a scripting language. I don't care what scripting language - it's just important that it not be compiled (the reason basically has to do with not wanting a compiler in the dependency path, but that's neither here nor there). Are there any scripting languages (shell, Python, Ruby, etc) that allow this?
In particular, it's the [getrandom](http://man7.org/linux/man-pages/man2/getrandom.2.html) syscall.
joshlf
(395 rep)
Mar 24, 2017, 08:58 PM
• Last activity: Apr 24, 2025, 01:39 PM
2
votes
2
answers
1943
views
select(2) on FIFO on macOS
On Linux the included program returns from `select` and exits: $ gcc -Wall -Wextra select_test.c -o select_test $ ./select_test reading from read end closing write end first read returned 0 second read returned 0 selecting with read fd in fdset select returned On OS X, the `select` blocks forever an...
On Linux the included program returns from
select
and exits:
$ gcc -Wall -Wextra select_test.c -o select_test
$ ./select_test
reading from read end
closing write end
first read returned 0
second read returned 0
selecting with read fd in fdset
select returned
On OS X, the select
blocks forever and the program does not exit. The Linux behavior matches my expectation and appears to conform to the following bit of the POSIX manual page for select:
> A descriptor shall be considered ready for reading when a call to an input function with O_NONBLOCK clear would not block, whether or not the function would transfer data successfully. (The function might return data, an end-of-file indication, or an error other than one indicating that it is blocked, and in each of these cases the descriptor shall be considered ready for reading.)
Since read(2) on the read end of the fifo will always return EOF
, my reading says that it should always be considered ready by select.
Is macOS's behavior here well-known or expected? Is there something else in this example that leads to the behavior difference?
A further note is that if I remove the read
calls then macOS's select
returns. This and some other experiments seem to indicate that once an EOF has been read from the file, it will no longer be marked as ready if select
is called on it later.
## Example Program
#include
#include
#include
#include
#include
#include
#include
#define FILENAME "select_test_tmp.fifo"
int main()
{
pid_t pid;
int r_fd, w_fd;
unsigned char buffer;
fd_set readfds;
mkfifo(FILENAME, S_IRWXU);
pid = fork();
if (pid == -1)
{
perror("fork");
exit(1);
}
if (pid == 0)
{
w_fd = open(FILENAME, O_WRONLY);
if (w_fd == -1)
{
perror("open");
exit(1);
}
printf("closing write end\n");
close(w_fd);
exit(0);
}
r_fd = open(FILENAME, O_RDONLY);
if (r_fd == -1)
{
perror("open");
exit(1);
}
printf("reading from read end\n");
if (read(r_fd, &buffer, 10) == 0)
{
printf("first read returned 0\n");
}
else
{
printf("first read returned non-zero\n");
}
if (read(r_fd, &buffer, 10) == 0)
{
printf("second read returned 0\n");
}
else
{
printf("second read returned non-zero\n");
}
FD_ZERO(&readfds);
FD_SET(r_fd, &readfds);
printf("selecting with read fd in fdset\n");
if (select(r_fd + 1, &readfds, NULL, NULL, NULL) == -1)
{
perror("select");
exit(1);
}
printf("select returned\n");
unlink(FILENAME);
exit(0);
}
Steven D
(47418 rep)
Aug 2, 2017, 12:48 AM
• Last activity: Apr 19, 2025, 02:01 AM
35
votes
7
answers
84230
views
Where do you find the syscall table for Linux?
I see a lot of people online referencing arch/x86/entry/syscalls/syscall_64.tbl for the syscall table, that works fine. But a lot of others reference /include/uapi/asm-generic/unistd.h which is commonly found in the headers package. How come `syscall_64.tbl` shows, 0 common read sys_read The right a...
I see a lot of people online referencing
arch/x86/entry/syscalls/syscall_64.tbl
for the syscall table, that works fine. But a lot of others reference
/include/uapi/asm-generic/unistd.h
which is commonly found in the headers package. How come
syscall_64.tbl
shows,
0 common read sys_read
The right answer, and unistd.h
shows,
#define __NR_io_setup 0
__SC_COMP(__NR_io_setup, sys_io_setup, compat_sys_io_setup)
And then it shows __NR_read
as
#define __NR_read 63
__SYSCALL(__NR_read, sys_read)
Why is that 63, and not 1? How do I make sense of out of /include/uapi/asm-generic/unistd.h
? Still in /usr/include/asm/
there is
/usr/include/asm/unistd_x32.h
#define __NR_read (__X32_SYSCALL_BIT + 0)
#define __NR_write (__X32_SYSCALL_BIT + 1)
#define __NR_open (__X32_SYSCALL_BIT + 2)
#define __NR_close (__X32_SYSCALL_BIT + 3)
#define __NR_stat (__X32_SYSCALL_BIT + 4)
/usr/include/asm/unistd_64.h
#define __NR_read 0
#define __NR_write 1
#define __NR_open 2
#define __NR_close 3
#define __NR_stat 4
/usr/include/asm/unistd_32.h
#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
Could someone tell me the difference between these unistd
files. Explain how unistd.h
works? And what the best method for finding the syscall table?
Evan Carroll
(34663 rep)
Feb 4, 2018, 04:40 AM
• Last activity: Mar 27, 2025, 04:03 AM
2
votes
0
answers
41
views
Wrong attributes bitmask in READDIR requests on NFSv4.1
I'm struggling the following problem. I have an NFS v4.1 mount, where I have a directory with a couple of thousands files. I'm trying to list their names and types. Even with a minimal example program taken from the `getdents` man page, I see a strange behaviour of the NFS client: - first few `READD...
I'm struggling the following problem.
I have an NFS v4.1 mount, where I have a directory with a couple of thousands files. I'm trying to list their names and types. Even with a minimal example program taken from the
getdents
man page, I see a strange behaviour of the NFS client:
- first few READDIR
RPCs have an attribute bitmask set to what you would expect, i.e. to all the attributes that the server supports
- after the second call to getdents
(or readdir
, doesn't matter), the NFS READDIR
RPCs change - attributes bitmask is set to just RDAttr_Error
and FileId
Can some explain, what could cause the change and why ls
seems not to cause it?
dmk
(21 rep)
Feb 17, 2025, 05:53 PM
• Last activity: Feb 17, 2025, 05:55 PM
0
votes
0
answers
35
views
How to trace recvfrom and sendto syscall each time apache2/httpd handle incoming http request?
So, I decided to start learn about system call with `strace` and want to observe network-related system call on apache2 processes, here's how I attach it: ``` pidof -s apache2 pstree -sTp strace -f -e trace=%network -p ``` and while observing, I notice that strace print some syscall, however I can't...
So, I decided to start learn about system call with
strace
and want to observe network-related system call on apache2 processes, here's how I attach it:
pidof -s apache2
pstree -sTp
strace -f -e trace=%network -p
and while observing, I notice that strace print some syscall, however I can't find the associated recvfrom
or sendto
syscall with file descriptor that correspond to the accept
syscall in which contain the ip address of client (my browser) when I make a http request
my assumption is that when a request is handled by apache, it spawn new processes as a child process, and since I attach the strace to the parent apache2 process, why the strace not follow its child despite I specify -f
option?
ReYuki
(33 rep)
Feb 6, 2025, 07:24 AM
0
votes
1
answers
176
views
How to better understand and reverse-engineer system calls within processes given a specific example
I am very new to linux and as such would appreciate any pointers with respect to understanding system calls and having the ability, knowledge and tools to reverse-engineer their origin or their process flow. As the title suggests, i present an example, being my analysis of an Xorg process that i tra...
I am very new to linux and as such would appreciate any pointers with respect to understanding system calls and having the ability, knowledge and tools to reverse-engineer their origin or their process flow.
As the title suggests, i present an example, being my analysis of an Xorg process that i traced in my linux desktop environment. As such, i am attempting to understand the process flow of DRM_IOCTL calls, in this case a specific DRM_IOCTL_CURSOR2 system call that takes place within the process. My goal is to understand what triggers this call within this desktop environment, or rather what steps I can take in general to investigate inquiries like this
From my limited understanding I am aware that Xorg is spawned as a subprocess of SDDM but aside from initiating the Xorg server, I am at a blank in trying to figure out how to walk through the process and identify triggers for certain process calls or perhaps the use of tools to do so. As such this is a conceptual question on how to approach analyses such as this in general. Would this require specific knowledge of the process at hand and its architecture. Would there be any general approaches I can take on my system to trace systemcalls much like deducing ppids of processes for my own interest.
As of now I have vague familiarity using tools like strace, bpftrace and general command line tools like ps & lsof. Apologies if this is a broad question, if so I will be happy to narrow it further
N S
(1 rep)
Dec 28, 2024, 02:32 PM
• Last activity: Dec 28, 2024, 05:28 PM
0
votes
0
answers
29
views
BPF program attached to `getname` won't get called when calling the `renameat2` syscall
I'm fiddling with a BPF program that needs to attach to the two "getname" functions that are being called from the `renameat2` syscall, defined in [linux/fs/namei.c][1] as: ```c SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname, int, newdfd, const char __user *, newname, unsigned...
I'm fiddling with a BPF program that needs to attach to the two "getname" functions that are being called from the
renameat2
syscall, defined in linux/fs/namei.c as:
SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
int, newdfd, const char __user *, newname, unsigned int, flags)
{
return do_renameat2(olddfd, getname(oldname), newdfd, getname(newname),
flags);
}
getname
calls getname_flags
, which in turn calls strncpy_from_user
. I need to access the char __user * name
parameter, thus I tried creating kprobes, fentries and fexits (with a simple "print" program) to try and intercept all three of those functions.
With getname*, I get a lot of output meaning that my BPF program are actually being runned. Although, when calling "renameat2" (e.g. when using the linux mv
command), I get no output at all.
This is, in essence, the program I'm currently using, which doesn't get called when using the mv
command:
SEC("fentry/getname_flags")
int BPF_PROG(hijack_getname, char *filename) {
uid_t uid = bpf_get_current_uid_gid() & 0xFFFFFFFF;
if (uid == 1002) { //hardcoded uid
bpf_printk(" [%s]", filename);
}
}
If I create a BPF tracepoint program that attaches to the entry and exit of renameat2, I can clearly see that there's no "getname" call between entry and exit.
As I said, I also tried with kprobe and fexit. I can't manage to attach to strncpy_from_user
without getting some weird errors about "Os: 22 - invalid argument"
I really can't figure out what's happening, thus any help would be appreciated :,)
(P.S. I also posted this on stackoverflow)
Dennis Orlando
(81 rep)
Dec 4, 2024, 05:18 PM
3
votes
1
answers
466
views
Why doesn't Linux support mmap by path?
The `mmap` syscall needs a fd as parameter, but when you close that fd, the mmap is still alive in the process's memory address space. Therefore keeping an mmap doesn't need an opened fd, so why dose Linux only support creating an mmap of a file using a fd, but not a file-name-path? Wouldn't it be n...
The
mmap
syscall needs a fd as parameter, but when you close that fd, the mmap is still alive in the process's memory address space.
Therefore keeping an mmap doesn't need an opened fd, so why dose Linux only support creating an mmap of a file using a fd, but not a file-name-path? Wouldn't it be nice if we can have a mmapat
syscall just like openat
and execveat
?
If mmap
creates an extra reference to that file, why can't we have a mmapat
which atomically creates such an reference at the first time without take an fd of the process then release it later.
Is there any historical or security reason for not having such syscall on Linux kernel?
炸鱼薯条德里克
(1435 rep)
Feb 2, 2019, 01:32 AM
• Last activity: Oct 17, 2024, 08:51 AM
7
votes
3
answers
7174
views
Filter out failed syscalls from strace log
I can run `strace` on a command like `sleep 1` and see what files it's accessing like this: strace -e trace=file -o strace.log sleep 1 However, on my machine, many of the calls have a return value of -1 indicating that the file does not exist. For example: $ grep '= -1 ENOENT' strace.log | head acce...
I can run
strace
on a command like sleep 1
and see what files it's accessing like this:
strace -e trace=file -o strace.log sleep 1
However, on my machine, many of the calls have a return value of -1
indicating that the file does not exist. For example:
$ grep '= -1 ENOENT' strace.log | head
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/en_US.UTF-8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/en_US.UTF-8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/en_US.UTF-8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/en_US.UTF-8/LC_NAME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/en_US.UTF-8/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
I'm not really interested in the files that don't exist,
I want to know what files the process actually found and read from.
Aside from grep -v '=-1 ENOENT'
,
how can I reliably filter out failed calls?
# Addendum #
I was surprised to learn
that strace
has had this feature in the works since 2002
in the form of the -z
flag, which is an alias for -e status=successful
,
fully functional [since version 5.2](https://github.com/strace/strace/commit/e45a594cb08394c96f71105db9bacf08aa4c734d)
([2019-07-12](https://github.com/strace/strace/releases/tag/v5.2)) ,
also available as --successful-only
[since version 5.6](https://github.com/strace/strace/commit/092724f8041cdfb64dcaf68a2d8ba877b509ea83) ([2020-04-07](https://github.com/strace/strace/releases/tag/v5.6)) .
Also available since version 5.2 is the complement of -z
, the -Z
flag,
which is an alias for -e status=failed
,
available as --failed-only
since version 5.6.
The -z
flag was [first added in a commit from 2002](https://github.com/strace/strace/commit/17f8fb3484e94976882f65b7a3aaffc6f24cd75d) and released in version 4.5.18 ([2008-08-28](https://github.com/strace/strace/releases/tag/v4.5.18)) ,
bit it had never been [documented](https://github.com/strace/strace/commit/de6e53308ca58da7d357f8114afc74fff7a18043) because it was not working properly.
Relevant links:
- only seeing successful system calls
Sat Nov 2 23:07:23 UTC 2002
> When using strace I sometimes like to see the system calls
which work (instead of all the system calls).
>
> I've been porting this patch for years, it seems very useful.
>
> With the -z option, you don't see opens on files which aren't there
(very useful tracking down what a program actually does, instead of
trying to do).
https://lists.strace.io/pipermail/strace-devel/2002-November/000232.html
- strace: -z option doesn't work properly
Date: Sun, 12 Jan 2003 09:33:01 UTC
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=176376
- tracing only failing syscalls
Created: 2004-03-19
https://sourceforge.net/p/strace/feature-requests/3/
- [strace-4.15] Proposal: Output Staging for -z Option (print successful syscalls only) / Patch included
Tue Jan 17 09:35:54 UTC 2017
https://lists.strace.io/pipermail/strace-devel/2017-January/005941.html
- [PATCH v1] Implemented output staging for failed/successful syscalls
Wed Jan 18 16:01:20 UTC 2017
https://lists.strace.io/pipermail/strace-devel/2017-January/005950.html
- Fix -z option
Feb 28, 2018
https://github.com/strace/strace/issues/49
- [PATCH 0/3] Stage output for -z and new -Z options
Mon Apr 1 21:13:02 UTC 2019
https://lists.strace.io/pipermail/strace-devel/2019-April/008706.html
- strace -z flag
Mon Jun 10 05:29:19 UTC 2019
https://lists.strace.io/pipermail/strace-devel/2019-June/008808.html
Nathaniel M. Beaver
(1398 rep)
Apr 6, 2018, 08:26 PM
• Last activity: Sep 13, 2024, 04:18 PM
1
votes
0
answers
24
views
Retrieving the process descriptor during syscall
In Linux, there is a per-process kernel stack that stores at the bottom of it (or top if the stack grows upwards) a small struct named thread_info, which in turn points to the task_struct of the related process. This way it is easy to retrieve the pointer to the process's descriptor when handling a...
In Linux, there is a per-process kernel stack that stores at the bottom of it (or top if the stack grows upwards) a small struct named thread_info, which in turn points to the task_struct of the related process. This way it is easy to retrieve the pointer to the process's descriptor when handling a syscall in kernel-mode.
But how does the kernel even switch to this per-process stack? In which step during the context switch, does the kernel check/store data about the underlying process?
Can someone please provide a good explaination of the steps involved during such user-space -> syscall -> kernel-space context switch?
A lot of sources online try to explain the workings of context switching, but most of them mainly say general concepts and not detailed explainations of the procedure.
Idan Rosenzweig
(11 rep)
Sep 4, 2024, 06:07 PM
1
votes
2
answers
420
views
Is systemd the first process that runs in user mode in linux?
I know that switching from user mode to kernel mode occurs continuously via system calls. My question is if systemd is the exact point during the starting of a linux system where the first foundamental transition from kernel mode to user mode occurs. If this is the point where the operating system s...
I know that switching from user mode to kernel mode occurs continuously via system calls. My question is if systemd is the exact point during the starting of a linux system where the first foundamental transition from kernel mode to user mode occurs. If this is the point where the operating system says : "From now on ( starting from systemd and all its direct or indirect children ) every process except me that wants to run in kernel mode must do so via system calls".
Kode1000
(11 rep)
Aug 19, 2024, 07:22 AM
• Last activity: Aug 19, 2024, 09:56 AM
378
votes
8
answers
54084
views
How can I find the implementations of Linux kernel system calls?
I am trying to understand how a function, say `mkdir`, works by looking at the kernel source. This is an attempt to understand the kernel internals and navigate between various functions. I know `mkdir` is defined in `sys/stat.h`. I found the prototype: /* Create a new directory named PATH, with per...
I am trying to understand how a function, say
mkdir
, works by looking at the kernel source. This is an attempt to understand the kernel internals and navigate between various functions. I know mkdir
is defined in sys/stat.h
. I found the prototype:
/* Create a new directory named PATH, with permission bits MODE. */
extern int mkdir (__const char *__path, __mode_t __mode)
__THROW __nonnull ((1));
Now I need to see in which C file this function is implemented. From the source directory, I tried
ack "int mkdir"
which displayed
security/inode.c
103:static int mkdir(struct inode *dir, struct dentry *dentry, int mode)
tools/perf/util/util.c
4:int mkdir_p(char *path, mode_t mode)
tools/perf/util/util.h
259:int mkdir_p(char *path, mode_t mode);
But none of them matches the definition in sys/stat.h
.
*Questions*
1. Which file has the mkdir
implementation?
2. With a function definition like the above, how can I find out which file has the implementation? Is there any pattern which the kernel follows in defining and implementing methods?
NOTE: I am using kernel 2.6.36-rc1 .
Navaneeth K N
(3998 rep)
Aug 19, 2010, 02:26 PM
• Last activity: Aug 16, 2024, 01:13 PM
4
votes
1
answers
839
views
How to get the current cgroup ID from C/C++?
The [eBPF helper functions](https://man7.org/linux/man-pages/man7/bpf-helpers.7.html) define `bpf_get_current_cgroup_id` for eBPF programs, which does the obvious thing ``` u64 bpf_get_current_cgroup_id(void) Return A 64-bit integer containing the current cgroup id based on the cgroup within which t...
The [eBPF helper functions](https://man7.org/linux/man-pages/man7/bpf-helpers.7.html) define
bpf_get_current_cgroup_id
for eBPF programs, which does the obvious thing
u64 bpf_get_current_cgroup_id(void)
Return A 64-bit integer containing the current cgroup id
based on the cgroup within which the current task
is running.
However I can't find an equivalent system call (something similar to [getpid](https://man7.org/linux/man-pages/man2/getppid.2.html)) that I can use in a regular old C program
Am I just completely missing the relevant function? Or does userspace need to do something different to get the cgroup ID for the current task?
user547386
(41 rep)
Oct 31, 2022, 08:15 PM
• Last activity: Aug 16, 2024, 09:24 AM
Showing page 1 of 20 total questions