How does an open(at) syscall result in a file being written to disk?
-1
votes
1
answer
2182
views
I'm trying to learn as as much as I can about about the interplay between syscalls, the VFS, device driver handling and ultimately, having the end device do some operation. I thought I would look at a fairly trivial example - creating a file - and try to understand the underlying process in as much detail as possible.
I created a bit of C, to do little more than open a (non-existing) file for writing, compiled this (without optimization), and took a peek at it with strace when I ran it. In particular, I wanted to focus on the
openat
syscall, and why and how this call was ultimately able to not only create the file object / file description, but also actually do the writing to disk (for reference, EXT4 file system, SATA HDD).
Broadly speaking, excluding some of the checks and ancilliary bits and pieces, my understanding of the process is as follows (and please correct me if I'm way off!):
- ELF is mapped into memory
- libc is mapped into memory
- fopen
is called
- libc does its open
- openat
syscall is called, with the O_CREAT
flag among others
- Syscall number is put into RAX register
- Syscall args (e.g. file path, etc.) are put into RDI register (and RSI, RDX, etc. as appropriate)
- Syscall instruction issue, and CPU transition to ring 0
- System_call code pointed to by MSR_LSTAR register invoked
- registers pushed to kernel stack
- Function pointer from RAX called at offset into sys_call_table
- asmlinkage
wrapper for the actual openat
syscall code is invoked
And at that point my understanding is lacking, but ultimately, I know that:
1. The open call returns a file descriptor, which is unique to the process, and maintained globally within the kernel's file descriptor table
2. The FD maps to a file description file object
3. The file object is populated with, among other structures, inode structure, inode_operations, file_operations, etc.
4. The file operations table should map generic syscalls to the respective device drivers to handle the respective calls (such that, for example, when a write
syscall is called, the respective driver write call is called instead for the device on which the file resides, e.g. a SCSI driver)
5. This mapping is based on the major/minor numbers for that file/device
6. Somewhere along the line, code is invoked which causes a instructions to be send to the device drive for the hard drive, which gets send to the disk controller, which causes a file to be written to the hard disk, though whether this is via interrupts or DMA, or some other method of I/O I'm not sure
7. Ultimately, the disk controller sends a message back to the kernel to say it's done, and the kernel returns control back to use space.
I'm not too good at following the kernel source, though I've tried a little, but feel there's a lot I'm missing. My questions are as follows:
I've found some functions which return, and destroy FDs in the kernel source, but can't find where the code is which actually populates the file object / file description for the file.
A) On an open
or openat
syscall, when a new file is created, how is the file structure populated? Where does the data come from? Specifically, how are the file_operations and inode_operations, etc. for that file populated? how does the kernel know, when populating that structure, that the file operations for this particular file need to be that of the SCSI driver, for instance?
B) Where in the process - and particularly with reference to the source - does the switch to the device driver happen? For example, if an ioctl
or similar was called, I would expect to some reference to the instruction to be called for the respective device, and some memory address for the data to be passed on, but I can't find where that happens.
From looking at the kernel source, all I can really find is code that assigns a new FD, but nothing that populates the file structure, no anything that calls the respective file operations to transfer control to a device driver.
Apologies that this is a really long-winded description, but I'm really trying to learn as much as possible, and although I have a basic grasp of C, I really struggle to understand others' code.
Hoping someone with greater knowledge than I can help clarify some of these things for me, as I seem to have hit a figurative brick wall. Any advice would be greatly appreciated.
**Edit:**
Hopefully the following points will clarify what technical detail I'm after.
- The open
or openat
syscalls take a file path, and flags (with the latter also being passed an FD pointing to a directory)
- When the O_CREAT
flag is also passed, the file is 'created' if it doesn't exist
- Based on the file path, the kernel is able to identify the device type this file should be
- The device type is identified from the major/minor numbers ordinarily - for a file that already exists, these are stored in the inode structure for the file (as member i_rdev
) and the stat structure for the file (as members st_dev
for the device type of the file system on which the file resides, and st_rdev
for the device type of the file itself)
So really, my questions are:
1. When a file is created with either of the open syscalls, the respective inode and stat structure must also be created and populated - how do the open syscalls do this (when all they have to go on at this point is the file path, and flags? Do they look at the inode or stat structure of the parent directory, and copy the relevant structure members from this?
2. At which point (i.e. where in the source) does this happen?
3. It's my understanding that when these open syscalls are invoked, it needs to know the device type, in order for the VFS to know what device driver code to invoke. On creating a new file, where the device type has yet to be set in the file object structures, what happens? What code is called?
4. Is the sequence more like:
user process tries to open new file -> open('/tmp/foo', O_CREAT)
open -> look up structure for '/tmp', get its device type -> get unused FD -> populate inode/stat structure, including setting device type to that of parent -> based on device type, map file operations / inode operations to device driver code -> call device driver code for open
syscall -> send appropriate instruction to disk controller to write new file to disk -> tidy up things, do checks, etc. -> return FD to user calling process?
Asked by genericuser99
(119 rep)
Aug 25, 2020, 10:19 PM
Last activity: Jul 6, 2025, 03:04 PM
Last activity: Jul 6, 2025, 03:04 PM