Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

3 votes

2 answers

133 views

What explains this very odd behavior of GNU grep interacting with buffering and pipes and how to stop it?

This is best illustrated with an example I feel: { printf 'foo\nbar\n' ; sleep 2 ; } | grep -m1 foo { printf 'foo\n' ; sleep 2 ; printf 'bar\n' ; sleep 2 ; } | grep -m1 foo Both of these commands, ran in Bash with GNU coreutils, for whatever reason behave in exactly the same way: 1. First grep print...

                                  This is best illustrated with an example I feel:

    { printf 'foo\nbar\n' ; sleep 2 ; } | grep -m1 foo
    { printf 'foo\n' ; sleep 2 ; printf 'bar\n' ; sleep 2 ; } | grep -m1 foo

Both of these commands, ran in Bash with GNU coreutils, for whatever reason behave in exactly the same way:

1. First grep prints out “foo” with a newline behind it but keeps blocking.
2. Then grep waits for 2 seconds, and exits.

The expected behavior for me is both cases that grep should print “foo” with a newline and then immediately exit without waiting two seconds. After all, it has already satisfied its condition of exactly one match and it knows whatever further input it receives can't change this any further. Indeed if I do this:

    { printf 'foo' ; sleep 2 ; } | grep -m1 foo

Without the newline after “foo”, it first waits for two seconds doing nothing, and then it exits printing “foo” with a newline behind it. This makes sense to me, grep has not yet received any newline so it doesn't know yet what could follow after those two seconds so it can't print yet what would be on the matching line.

But I in particular don't understand why the first two commands function as they do. In the second case, GNU grep immediately exits upon receiving the second line and does not wait out the further 2 seconds of sleep for the command before the pipe to exit, whereas in the first command, it receives foo\nbar\n right away over two lines, and yet, it does not exit immediately once receiving the second line. I assume this has something to do with how the buffering works.

If you want my real use case and why I'm investigating this. I'm using this in a script with udevadm monitor to grep for a specific event and when that event is reached I want to stop blocking so I use udevadm monitor | egrep -m1  so far so good except I noticed that it does not stop blocking when the specifically searched for event is the last one reported by udevadm, then it only stops blocking after a new event is sent. For whatever reason in that case grep only exits and prints the first match upon receiving the line after or receiving an end of file. Why does it do that and how do I make it stop doing that/

Zorf (171 rep)

Mar 9, 2025, 09:25 PM • Last activity: Mar 10, 2025, 11:59 AM

6 votes

2 answers

766 views

Why is mawk's output (STDOUT) buffered even though it is the terminal?

shell awk stdout buffer

I am aware that `STDOUT` is usually buffered by commands like `mawk` (but not `gawk`), `grep`, `sed`, and so on, unless used with the appropriate options (i.e. `mawk --Winteractive`, or `grep --line-buffered`, or `sed --unbuffered`). But the buffering doesn't happen when `STDOUT` is a terminal/tty,...

I am aware that STDOUT is usually buffered by commands like mawk (but not gawk), grep, sed, and so on, unless used with the appropriate options (i.e. mawk --Winteractive, or grep --line-buffered, or sed --unbuffered). But the buffering doesn't happen when STDOUT is a terminal/tty, in which case it is line buffered. Now, what I don't get is why STDOUT is buffered outside of a loop send to a pipe, even though the final destination is the terminal. A basic example :

$ while sleep 3; do echo -n "Current Time is ";date +%T; done | mawk '{print $NF}'
^C

Nothing happens for a long time, because mawk seems to be buffering it's output. I wasn't expecting that. **mawk's output is the terminal, so why is its STDOUT buffered ?** Indeed, with the -Winteractive option the output is rendering every 3 seconds :

$ while sleep 3; do echo -n "Current Time is ";date +%T; done | mawk -Winteractive '{print $NF}'
10:57:05
10:57:08
10:57:11
^C

Now, this behavior is clearly mawk related, because it isn't reproduced if I use for example grep. Even without its --line-buffered option, grep doesn't buffer its STDOUT, which is the expected behavior given that grep's STDOUT is the terminal :

$ while sleep 3; do echo -n "Current Time is ";date +%T; done | grep Current
Current Time is 11:01:44
Current Time is 11:01:47
Current Time is 11:01:50
^C

ChennyStar (1969 rep)

Jun 25, 2021, 11:46 AM • Last activity: Jan 18, 2025, 03:56 PM

2 votes

3 answers

1110 views

File order on FAT/FAT32/VFAT file systems

file-copy sd-card buffer fat

I have several audio devices (car radio, portable radio, MP3 player) that take SD cards and USB sticks with a FAT file system on it. Because these devices have limited intelligence they do not sort filenames on the FAT FS by name but merely play them in the order in which they have been copied to th...

                                  I have several audio devices (car radio, portable radio, MP3 player) that take SD cards and USB sticks with a FAT file system on it. Because these devices have limited intelligence they do not sort filenames on the FAT FS by name but merely play them in the order in which they have been copied to the SD card.

In MS DOS and MS Windows this was not a problem; using a simple utility that sorted files alphabetically and then copied them across in that order did the trick. However, on Linux the files copied from the ext4 file system do not end up on the FAT FS in the same order as in which they were read and copied across, presumably because there is a buffering mechanism in the way which improves efficiency but does not worry too much about the physical order in which the files end up on the target device.

I have also tried to use Windows in a Virtual Box VM but still the files end up being written in a different order than the one they were read from the Linux file system.

Is there a way (short of copying them across manually one by one and waiting for all write buffers to be flushed) to ensure that files end up on the FAT SD target in the order in which they were read from the ext4 file system?

Frank van Wensveen (123 rep)

Feb 18, 2020, 10:29 AM • Last activity: Dec 21, 2024, 03:06 PM

4 votes

1 answers

594 views

Can I make a pipe save data to disk instead of blocking until the reader can continue?

linux pipe buffer

Are there any established tools where the pipe spills to disk rather than blocking the upstream process? As an example, in a traditional pipeline A | B, we get the following behavior when B does not read from stdin: - A emits output until stdout fills up - Then A is blocked until B reads from stdin...

                                  Are there any established tools where the pipe spills to disk rather than blocking the upstream process?

As an example, in a traditional pipeline A | B, we get the following behavior when B does not read from stdin:

 - A emits output until stdout fills up
 - Then A is blocked until B reads from stdin

I'd like the following behavior instead:

 - A emits output until its stdout buffer fills up
 - Further output from A is written to an on-disk cache file, so A is not blocked
 - When B ingests data from stdin, new data from the on-disk cache file is read (FIFO) into the buffer

Is there any existing tool that accomplishes this?

Thank you!

Autodidactyle (151 rep)

Nov 26, 2024, 04:53 PM • Last activity: Nov 27, 2024, 09:33 AM

7 votes

2 answers

1686 views

Configure Linux to regularly sync cached data to disk

linux disk buffer

The [sync](https://linux.die.net/man/8/sync) command "writes any data buffered in memory out to disk". As far as I understand, data may be buffered in memory for very long time, even if the disks have no activity. How can I properly configure Linux to automatically sync regularly? The scenario is a...

                                  The [sync](https://linux.die.net/man/8/sync)  command "writes any data buffered in memory out to disk". As far as I understand, data may be buffered in memory for very long time, even if the disks have no activity.

How can I properly configure Linux to automatically sync regularly?

The scenario is a storage server, where I would like to prevent cached data being prevented from being written to disks for too long, which leads to corrupted data in case of system crash or power loss.

xuhdev (597 rep)

Nov 26, 2024, 03:43 AM • Last activity: Nov 26, 2024, 03:22 PM

1 votes

0 answers

51 views

Radix Tree vs Multi-level bitmap

linux linux-kernel memory cache buffer

The Linux kernel (at least before using XArrays as far as I'm aware, which to my knowledge are wrappers around Radix Trees anyway) uses radix trees in its address_space structs which every file has. This allows a file to efficiently find all the dirty buffers which it has written to for easy flushin...

                                  
The Linux kernel (at least before using XArrays as far as I'm aware, which to my knowledge are wrappers around Radix Trees anyway) uses radix trees in its address_space structs which every file has. This allows a file to efficiently find all the dirty buffers which it has written to for easy flushing.

However, what is the advantage of a Radix Tree over a Multi-level Bitmap in this scenario? They seem so much easier to implement, is there a difference?

Mike (11 rep)

Nov 4, 2024, 04:08 PM

38 votes

4 answers

11606 views

Is there truth to the philosophy that you should sync; sync; sync; sync?

synchronization buffer

When I was first introduced to Linux, working at Cisco Systems in 2000, I was taught the merits of the `sync` command, used to flush buffers to disk to prevent filesystem corruption / data loss. I was told not only by coworkers there, but by friends in college to always run `sync` "a few" or "a bunc...

                                  When I was first introduced to Linux, working at Cisco Systems in 2000, I was taught the merits of the sync command, used to flush buffers to disk to prevent filesystem corruption / data loss. I was told not only by coworkers there, but by friends in college to always run sync "a few" or "a bunch" of times, that is, maybe 5 - 10 times, instead of just once.

I've continued this habit ever since, but, is there any merit to this? Has anyone else ever heard this? And most importantly, can anyone provide good rationale / empirical evidence for/against the idea that you need to run sync more than once for it to be effective?

Josh (8728 rep)

Dec 30, 2010, 10:07 PM • Last activity: Oct 20, 2024, 01:53 PM

0 votes

0 answers

33 views

How can I get the _real_ progress of dd, ignoring fast/staging buffers?

dd usb-drive io buffer progress-information

I'm using the `dd` command to write to a USB stick. My command is pretty straightforward: ``` dd if=myimage.iso of=/dev/sdd bs=1M status=progress ``` and indeed, I seem to be getting the progress reported: ``` 1234567890 bytes (1.2 GB, 1.2 GiB) copied, 15 s, 101 MB/s ``` ... but after a few seconds,...

I'm using the dd command to write to a USB stick. My command is pretty straightforward:

dd if=myimage.iso of=/dev/sdd bs=1M status=progress

and indeed, I seem to be getting the progress reported:

1234567890 bytes (1.2 GB, 1.2 GiB) copied, 15 s, 101 MB/s

... but after a few seconds, I'm told that the full amount of data has been copied - without dd actually concluding. Likely, the data has been written into some temporary buffer (in memory perhaps?) - but the process of physically writing it to the USB drive is ongoing. Only when it is actually done, does the dd command return. Is there a way to obtain the real progress, in terms of amount of bytes really written and/or estimated time remaining, for such dd executions?

einpoklum (10753 rep)

Jul 12, 2024, 06:54 PM

0 votes

1 answers

332 views

Is my SSD dead? can't mount. I/O errors, 29k+ fsck errors

io ssd fsck buffer

So I work with a couple of people and we ssh into a Linux server for work. There are several SSDs, and one of the non-boot Samsung 870 SSDs crashed the other day in the morning. This led to an inability to SSH into the server for those whose home directory are in this SSD. I did `dmesg and saw the f...

                                  So I work with a couple of people and we ssh into a Linux server for work. There are several SSDs, and one of the non-boot Samsung 870 SSDs crashed the other day in the morning. This led to an inability to SSH into the server for those whose home directory are in this SSD. I did `dmesg and saw the following errors. what can I do at this point? besides restoring to a back up (I didn't make any recently... oops)

    [Thu Jun 27 09:50:12 2024] RTL8226 2.5Gbps PHY r8169-0-4600:00: attached PHY driver (mii_bus:phy_addr=r8169-0-4600:00, irq=MAC)
    [Thu Jun 27 09:50:13 2024] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [Thu Jun 27 09:50:13 2024] ata3.00: irq_stat 0x40000001
    [Thu Jun 27 09:50:13 2024] ata3.00: failed command: FLUSH CACHE EXT
    [Thu Jun 27 09:50:13 2024] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 2
                                        res 51/04:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
    [Thu Jun 27 09:50:13 2024] ata3.00: status: { DRDY ERR }
    [Thu Jun 27 09:50:13 2024] ata3.00: error: { ABRT }
    [Thu Jun 27 09:50:13 2024] ata3.00: supports DRM functions and may not be fully accessible
    [Thu Jun 27 09:50:13 2024] ata3.00: failed to enable AA (error_mask=0x1)
    [Thu Jun 27 09:50:13 2024] ata3.00: supports DRM functions and may not be fully accessible
    [Thu Jun 27 09:50:13 2024] ata3.00: failed to enable AA (error_mask=0x1)
    [Thu Jun 27 09:50:13 2024] ata3.00: configured for UDMA/133 (device error ignored)
    [Thu Jun 27 09:50:13 2024] ata3.00: device reported invalid CHS sector 0
    [Thu Jun 27 09:50:13 2024] ata3: EH complete
    [Thu Jun 27 09:50:13 2024] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [Thu Jun 27 09:50:13 2024] ata3.00: irq_stat 0x40000001
    [Thu Jun 27 09:50:13 2024] ata3.00: failed command: FLUSH CACHE EXT
    [Thu Jun 27 09:50:13 2024] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 10
                                        res 51/04:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
    [Thu Jun 27 09:50:14 2024] ata3: EH complete
    [Thu Jun 27 09:50:14 2024] ata3.00: Enabling discard_zeroes_data
    [Thu Jun 27 09:50:16 2024] atlantic 0000:44:00.0 enp68s0: atlantic: link change old 0 new 1000
    [Thu Jun 27 09:50:16 2024] IPv6: ADDRCONF(NETDEV_CHANGE): enp68s0: link becomes ready
    [Thu Jun 27 09:50:23 2024] rfkill: input handler disabled
    [Thu Jun 27 09:50:35 2024] Bluetooth: RFCOMM TTY layer initialized
    [Thu Jun 27 09:50:35 2024] Bluetooth: RFCOMM socket layer initialized
    [Thu Jun 27 09:50:35 2024] Bluetooth: RFCOMM ver 1.11
    [Thu Jun 27 09:50:36 2024] rfkill: input handler enabled
    [Thu Jun 27 09:50:39 2024] rfkill: input handler disabled
    [Thu Jun 27 09:51:06 2024] EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
    [Thu Jun 27 09:51:06 2024] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
    [Thu Jun 27 09:51:11 2024] EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
    [Thu Jun 27 09:51:16 2024] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [Thu Jun 27 09:51:16 2024] ata3.00: irq_stat 0x40000001
    [Thu Jun 27 09:51:16 2024] ata3.00: failed command: WRITE DMA
    [Thu Jun 27 09:51:16 2024] ata3.00: cmd ca/00:08:00:00:00/00:00:00:00:00/e0 tag 9 dma 4096 out
                                        res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
    [Thu Jun 27 09:51:06 2024] EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
    [Thu Jun 27 09:51:06 2024] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

    [Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 0, lost async page write
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Sense not available.
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 80 00 00 00 08 00 00
    [Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
    ss 0
    ...(repeats until sector 800)...
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#11 access beyond end of device
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#25 access beyond end of device
    [Thu Jun 27 09:51:28 2024] JBD2: recovery failed
    [Thu Jun 27 09:51:28 2024] EXT4-fs (sdb): error loading journal
   
    
    [Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 0, lost async page write
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Sense not available.
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 80 00 00 00 08 00 00
    [Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
    ss 0
    [Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 80, lost async page write
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#21 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#21 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 90 00 00 00 08 00 00
    [Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 656 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
    ss 0
    [Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 82, lost async page write
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#22 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#22 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 b0 00 00 00 08 00 00
    [Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 688 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
    ss 0
    [Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 86, lost async page write
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#23 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 e0 00 00 00 10 00 00
    [Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 736 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio cla
    ss 0
    [Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 92, lost async page write
    [Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 93, lost async page write
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#26 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#26 CDB: Write(16) 8a 00 00 00 00 00 00 00 03 10 00 00 00 08 00 00
    [Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 784 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
    ss 0
    [Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 98, lost async page write
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#27 CDB: Write(16) 8a 00 00 00 00 00 00 00 03 20 00 00 00 08 00 00
    [Thu Jun 27 09:51:28 2024] sdb: detected capacity change from 7814037168 to 0
    [Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 800 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
    ss 0
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#11 access beyond end of device
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#25 access beyond end of device
    [Thu Jun 27 09:51:28 2024] JBD2: recovery failed
    [Thu Jun 27 09:51:28 2024] EXT4-fs (sdb): error loading journal
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#27 CDB: Write(16) 8a 00 00 00 00 00 00 00 03 20 00 00 00 08 00 00
    [Thu Jun 27 09:51:28 2024] sdb: detected capacity change from 7814037168 to 0
    [Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 800 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#11 access beyond end of device
    [Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#25 access beyond end of device
    ......
    [Thu Jun 27 09:52:19 2024] ata2.00: Enabling discard_zeroes_data
    [Thu Jun 27 09:52:19 2024] ata7.00: Enabling discard_zeroes_data
    [Thu Jun 27 09:56:11 2024] EXT4-fs (sda): error count since last fsck: 29674
    [Thu Jun 27 09:56:11 2024] EXT4-fs (sda): initial error at time 1677717172: __ext4_get_inode_loc_noinmem:4410: inode 179610110: block 718277183

and then there was one of these every 10 minutes throughout the day 

    [Thu Jun 27 13:30:14 2024] sd 2:0:0:0: [sdb] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
    [Thu Jun 27 13:30:14 2024] sd 2:0:0:0: [sdb] tag#6 CDB: ATA command pass through(16) 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00


As far as I’m aware, the things to avoid are:

- fsck

Things I’m not sure I should avoid (please let me know what proper safe procedure is if I want to recover data)

 - fsck -n 
 - Smartctl 

My options (which should I choose?)

 - ddrescue with the right options to make it fast and read good sectors first. Can this damage the ssd further? 
 - Send to data recovery people 




                                

Derek Xiao (1 rep)

Jul 1, 2024, 07:56 PM • Last activity: Jul 1, 2024, 10:03 PM

21 votes

2 answers

8934 views

`unbuffer` or `stdbuf` for removing stdout buffering?

utilities io stdout buffer

Is there a difference between unbuffer(1) and stdbuf(1)? From what I gather, unbuffer makes more than the "best effort" of calling the libc function set(X)buf at the beginning, and then letting things be?

                                  Is there a difference between unbuffer(1) and stdbuf(1)? From what I gather, unbuffer makes more than the "best effort" of calling the libc function set(X)buf at the beginning, and then letting things be?
                                

dan3 (700 rep)

Oct 23, 2013, 05:25 PM • Last activity: Jun 27, 2024, 09:39 AM

0 votes

2 answers

139 views

Terminating the whole pipeline at once when its last process ends

bash pipe buffer

Given the pipeline printf '%s\n' 1 2 3 4 5 6 7 8 9 | while read -r num do echo "$num" > /dev/stderr echo "$num" done | while read -r num do echo $(( $num * 10 )) [ "$num" -eq 5 ] && break done bash will output something like 1 2 10 3 20 4 30 5 40 50 6 7 8 9 to the terminal. The places in the single...

                                  Given the pipeline

    printf '%s\n'  1 2 3 4 5 6 7 8 9 |
    while read -r num
    do
      echo "$num" > /dev/stderr
      echo "$num"
    done |
    while read -r num
    do
      echo $(( $num * 10 ))
      [ "$num" -eq 5 ] && break
    done

bash will output something like

    1
    2
    10
    3
    20
    4
    30
    5
    40
    50
    6
    7
    8
    9

to the terminal. The places in the single digit sequence where the two-digit numbers are inserted will be different each time you run the pipeline because of how buffering and/or scheduling work.

I would like to have the first while-read loop terminate at once when the second loop terminates; that is, I want the pipeline to not output any single digits after it has printed the number 50.

I tried using the command stdbuf -o0 to set the buffer to 0,
as suggested by this article 
and Force line-buffering of stdout in a pipeline .

    stdbuf -o0 printf '%s\n'  1 2 3 4 5 6 7 8 9 |
    while read -r num
    do
      echo "$num" > /dev/stderr
      echo "$num"
    done |
    while read -r num
    do
      echo $(( $num * 10 ))
      [ "$num" -eq 5 ] && break
    done

hoping to achieve this, but the printf command and the first loop still stay alive and can print their output after the last loop has printed the number 50.

I also tried set -o pipefail and turning the second loop into the function:

    g ()
    {
      while read -r num
      do
        echo $(( $num * 10 ))
        [ "$num" -eq 5 ] && break
      done
      return 1
    }

and modifying the pipeline accordingly:

    printf '%s\n'  1 2 3 4 5 6 7 8 9 |
    while read -r num
    do
      echo "$num" > /dev/stderr
      echo "$num"
    done |
    g

(and then calling set +o pipefail afterwards),
but that also fails to achieve what I want.

Apparently there is some fundamental error in my understanding of how pipelines and buffering work. Please explain to me what I am missing and how I can get the output I want.

      "Turning Off Buffer in Pipe With stdbuf at Baeldung on Linux"
                                

fuumind (449 rep)

Jun 4, 2024, 08:58 AM • Last activity: Jun 17, 2024, 07:37 AM

5 votes

3 answers

3079 views

Flush the pipe/printf buffers externally for already running process with known PID

shell pipe file-descriptors buffer

I am writing a data logging app, all programs are started like: ``` ./program > out.bin ``` The data collector periodically pools the stdout output files and reads the data. The issue is that the IO streams are buffered and if some program outputs data in like 1 byte per second, it takes a lot of ti...

I am writing a data logging app, all programs are started like:

./program > out.bin

The data collector periodically pools the stdout output files and reads the data. The issue is that the IO streams are buffered and if some program outputs data in like 1 byte per second, it takes a lot of time (up too 4k seconds with default 4kB buffer size) before the data are actually flushed out. My question is how to force the stdout/pipe/printf buffer to flush externally, i.e. call externally something like fflush(stdout). I have read various sites like https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe/25378 , but I can not disable the buffers as it have huge IO performace impact (measured). I am looking for high performace solution for production and these following conditions are always met: - the program (data producer) PID is always known - the output is always a file with known path - the data logging process has full root access

mvorisek (251 rep)

Jun 10, 2019, 09:22 AM • Last activity: Apr 19, 2024, 11:35 AM

0 votes

1 answers

107 views

Is it possible to set larger buffers for file access on linux

kernel memory disk buffer

I have a process thats reading the whole filesystem and hashing files. It slows down (by 4x or so) because the reads are causing a lot of seeking. Small chunks of each file are being read by each of 4 threads but if I test a sequential read by copying (cp) I can read far faster. CPU utilisation is a...

                                  I have a process thats reading the whole filesystem and hashing files. It slows down (by 4x or so) because the reads are causing a lot of seeking. Small chunks of each file are being read by each of 4 threads but if I test a sequential read by copying (cp) I can read far faster. CPU utilisation is at 25% so its not cpu bound. I am fairly sure seeking is the problem.

I've read that the kernel has quite sophisticated disk reading strategies to speed up access so I wonder if kernel buffers are restricting their use here, and if they could be increased to allow it to buffer more. I assume the program I am using is only requesting a fairly small chunk of data with each read call so I don't know if this would be effective.

I imagine reading each file into memory fully one by one would be most effective but I can't rewrite the application at the moment (its not mine, and its large and bloated imo). But could I get the OS to read each file as it is opened into buffers entirely (or even partially, like 100 - 500MB at a time), sequentially, so that the application threads are only making a call to memory with each of their small reads rather than a call to disk (causing a seek)?

ADDED LATER:

@Artem cache does not seem to do the job here and I guess I can understand why. The kernel is trying to be 'sensible' and saying "I'm not going to read a whole 500MB file into memory, just because the user has requested the first MB". Which makes sense. What is loaded in will be cached indeed so if its used again (for example by another process) it can be fetched from memory. But what I want is the kernel to load the whole file in to cache on the first read (that first read being what, 2MB maybe?).

So the system call is read(fd, buf, size). If I'm programming C I would never put a huge buffer as size and I doubt many programmers would. So it was probably written using a more normal sort of buffer size, a meg or two.

So the user process gets a MB or two and enters the hashing function which keeps it busy for a while, and it stops pestering the kernel for a disk read. Meanwhile theres another disk read queued by a different thread for a read of a different part of the disk. So the kernel services that now, and the disk seeks to a different part of the disk taking ~15ms. 

Whats a shame is that files are generally held in quite large extents of sequential blocks on disk. So a sustained read of the disk for that first file would probably have read hundreds of thousands, even a million blocks, tens or hundreds of MB, without any seeking. That there is high performance disk reading and its what I want to encourage.

But no the way things are working, the processes are requesting small chunks of data, the kernel is trying to be sensible and not read massive amounts of data that no one has even asked for (holding up waiting processes in by doing so), and as a result its seeking around like mad and spending all its time seeking.

Contrast this with 'cp -r' - only 1 thread is asking the kernel to read files. So theres no one telling the disk head to seek to a different part of the disk every MB or two, so when subsequent reads come in to the kernel the drive is in a position to get the data quickly.

The code could be rewritten with much larger buffers so thats one option for me. But as I say I was wondering if it was possible to instruct the kernel to buffer 'ahead' much more. Kind of like 'read ahead caching'. Predicting that files, once opened, are going to be read in their entirety, so filling read buffers with at least n bytes before stopping the physical disk read to kernel buffers for each file read.

Pete (153 rep)

Apr 7, 2024, 10:00 AM • Last activity: Apr 8, 2024, 09:27 AM

0 votes

1 answers

108 views

How do Kernel use pagecache?

linux-kernel buffer

I have a problem with pagecache that I don't understand. As I understand, the pagecache will serve as disk cache for reading from and writing to disk. But I don't know how the kernel can map 10G memory pagecache to 200GB disk. When reading, the kernel can only cache data that is regularly read from...

                                  I have a problem with pagecache that I don't understand.

As I understand, the pagecache will serve as disk cache for reading from and writing to disk. But I don't know how the kernel can map 10G memory pagecache to 200GB disk.

When reading, the kernel can only cache data that is regularly read from disk, but when writing to disk, data will be written to memory buffer and then the kernel will write to disk.

But how can memory have enough buffer when the data written to disk is very large?

Tai Nguyen Huu (11 rep)

Mar 19, 2024, 08:59 AM • Last activity: Mar 20, 2024, 10:43 AM

5 votes

3 answers

52546 views

How to reduce buffers/cache

rhel mysql cache ram buffer

A monitoring system keeps alerting that my machine is reaching/breaking through its RAM utilization threshold which is **15 GBs**. I've done some reading and understood that the apparent RAM utilization is not actual and that the extra RAM is used for caching/buffering of disk I/O operation to impro...

                                  A monitoring system keeps alerting that my machine is reaching/breaking through its RAM utilization threshold which is **15 GBs**.

I've done some reading and understood that the apparent RAM utilization is not actual and that the extra RAM is used for caching/buffering of disk I/O operation to improve the performance of the server. I'm running MySQL on that server, that's the only notable service running.

 - So how can I reduce the disk I/O caching/buffering RAM as not to
   break through the threshold? Could this be a MySQL issue and not
   Linux's?

That's the output of free -gt

    [root@ipk ~]# free -gt
                 total       used       free     shared    buffers     cached
    Mem:            15         15          0          0          0          9
    -/+ buffers/cache:          5         10
    Swap:            5          0          5
    Total:          21         15          6


Linux version is:

    [root@ipk ~]# uname -rmo
    2.6.32-220.el6.x86_64 x86_64 GNU/Linux
                                

Muhammad Gelbana (1683 rep)

Sep 15, 2013, 01:39 PM • Last activity: Feb 18, 2024, 05:26 AM

7 votes

1 answers

8064 views

What should be the buffer size for the sort command?

sort buffer

I have a machine with 2 TB of RAM and I am running a sort command on a file of size 150G where I have specified the buffer-size as 1000G, after doing my bit of reasearch on google, I got this piece of information "the more is the buffer size, the better is the performance". This is the command that...

                                  I have a machine with 2 TB of RAM and I am running a sort command on a file of size 150G where I have specified the buffer-size as 1000G, after doing my bit of reasearch on google, I got this piece of information "the more is the buffer size, the better is the performance". This is the command that I ran

    sort -rk2 --buffer-size=1000G master_matrix_unsorted.csv > master_matrix_sorted.csv

But this is taking a lot of time and I have no clue on the progress of the task.

Any idea on what should be the best buffer size for this operation? I am planning to re-run this task with a new buffer-size.

Sambit Tripathy (171 rep)

Jul 22, 2014, 10:55 PM • Last activity: Feb 14, 2024, 06:51 PM

556 votes

15 answers

339351 views

How to turn off stdout buffering in a pipe?

shell pipe stdout buffer

I have a script which calls two commands: long_running_command | print_progress The `long_running_command` prints progress but I'm unhappy with it. I'm using `print_progress` to make it nicer (namely, I print the progress in a single line). **The problem:** Connection a pipe to stdout also activates...

                                  I have a script which calls two commands:

    long_running_command | print_progress

The long_running_command prints progress but I'm unhappy with it. I'm using print_progress to make it nicer (namely, I print the progress in a single line).

**The problem:** Connection a pipe to stdout also activates a 4K buffer, so the nice print program gets nothing ... nothing ... nothing ... *a whole lot* ... :)

How can I disable the 4K buffer for the long_running_command (no, I do not have the source)?

Aaron Digulla (6288 rep)

Jun 16, 2009, 10:27 AM • Last activity: Jan 11, 2024, 06:39 PM

20 votes

1 answers

1646 views

Why does unbuffer -p mangle its input?

expect buffer

``` $ seq 10 | unbuffer -p od -vtc 0000000 1 \n 2 \n 3 \n 4 \n 5 \n 6 \n 7 \n 8 \n ``` Where did `9` and `10` go? ``` $ printf '\r' | unbuffer -p od -An -w1 -vtc \n ``` Why was `\r` changed to `\n`? ``` $ : | unbuffer -p printf '\n' | od -An -w1 -vtc \r \n $ unbuffer -p printf '\n' | od -An -w1 -vtc...

$ seq 10 | unbuffer -p od -vtc
0000000   1  \n   2  \n   3  \n   4  \n   5  \n   6  \n   7  \n   8  \n

Where did 9 and 10 go?

$ printf '\r' | unbuffer -p od -An -w1 -vtc
  \n

Why was \r changed to \n?

$ : | unbuffer -p printf '\n' | od -An -w1 -vtc
  \r
  \n
$ unbuffer -p printf '\n' | od -An -w1 -vtc
  \r
      \n

WTF?

$ printf foo | unbuffer -p cat
$

Why no output (and a one second delay)?

$ printf '\1\2\3foo bar\n'  | unbuffer -p od -An -w1 -vtc
$

Why no output?

$ (printf '\23'; seq 10000) | unbuffer -p cat

Why does it hang with no output?

$ unbuffer -p sleep 10

Why can't I see what I type (and why is it discarded even though sleep didn't read it)? Incidentally, also:

$ echo test | unbuffer -p grep foo && echo found foo
found foo

How come grep found foo but didn't print the lines that contain it?

$ unbuffer -p ls /x 2> /dev/null
ls: cannot access '/x': No such file or directory

Why didn't the error go to /dev/null? See also https://unix.stackexchange.com/q/521212

$ echo ${(l[foo])} | unbuffer -p cat | wc -c
4095

That's with:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux trixie/sid
Release:        n/a
Codename:       trixie
$ uname -rsm
Linux 6.5.0-3-amd64 x86_64
$ expect -c 'puts "expect [package require Expect] tcl [info patchlevel]"'
expect 5.45.4 tcl 8.6.13
$ /proc/self/exe --version
zsh 5.9 (x86_64-debian-linux-gnu)

Same on Ubuntu 22.04 or on FreeBSD 12.4-RELEASE-p5 (except the od commands have to be adapted there, and I get 2321 (all BEL characters there) instead of 4095 above).

Stéphane Chazelas (579282 rep)

Oct 29, 2023, 03:20 PM • Last activity: Oct 31, 2023, 08:58 PM

189 votes

17 answers

112475 views

How can I do a 'change word' in Vim using the current paste buffer?

vim vi replace buffer

I have some text in my paste buffer, e.g. I did a `yw` (yank word) and now I have 'foo' in my buffer. I now go to the word 'bar', and I want to replace it with my paste buffer. To replace the text manually I could do `cw` and then type the new word. How can I do a 'change word', but use the contents...

                                  I have some text in my paste buffer, e.g. I did a yw (yank word) and now I have 'foo' in my buffer.

I now go to the word 'bar', and I want to replace it with my paste buffer.

To replace the text manually I could do cw and then type the new word.

How can I do a 'change word', but use the contents of my paste buffer instead of manually typing out the replacement word?

The best option I have right now is to go to the beginning of the word I want to replace and do dw (delete word), go to the other place, and do the yw (yank word). Then go back to the replacement area and do p (paste) which is kind of clumsy, especially if they are not on the same screen.

Michael Durrant (43563 rep)

Aug 29, 2013, 02:45 PM • Last activity: Oct 3, 2023, 11:04 AM

0 votes

1 answers

833 views

Linux: Getting the kernel buffer size for a socket

networking linux-kernel buffer

I have a C application which receives a lot of data over a TCP socket. Is it somehow possible to get the kernel buffer size for that file descriptor / socket? I would like to know how much data is still in the kernel for the file-descriptor I have. Thanks a lot

                                  I have a C application which receives a lot of data over a TCP socket. Is it somehow possible to get the kernel buffer size for that file descriptor / socket? I would like to know how much data is still in the kernel for the file-descriptor I have.

Thanks a lot

Kevin Meier (213 rep)

Sep 27, 2023, 03:45 PM • Last activity: Sep 27, 2023, 05:42 PM

Showing page 1 of 20 total questions