kworker/u16:4+flush-252:4 slows down the system, is it a fragmentation issue?

0 votes
0 answers
166 views
                          If I restore a tar archive or do any other bulky filesystem operation I observe kworker running at close to 100% CPU utilization and operations which was done in 10-15 minutes takes a week. When the operation is complete the system appears as normal.

This is an ext4 filesystem on top of lvm on top of mdadm.

I did a few:


    echo l > /proc/sysrq-trigger

And I observed the following:


    [15858776.231727] Sending NMI from CPU 3 to CPUs 0-2,4-7:
    [15858776.231738] NMI backtrace for cpu 6
    [15858776.231744] CPU: 6 PID: 32516 Comm: kworker/u16:4 Tainted: G                T  6.6.13-gentoo #1
    [15858776.231751] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a 03/06/2012
    [15858776.231755] Workqueue: writeback wb_workfn (flush-252:4)
    [15858776.231769] RIP: 0010:ext4_get_group_info+0x12/0x60
    [15858776.231780] Code: 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 8b 87 38 05 00 00 3b 70 40 73 49 41 54 41 89 f4 55  89 fd 53 8b 88 b0 00 00 00 89
     f3 48 8b 40 38 41 d3 ec 48 83 e8
    [15858776.231785] RSP: 0018:ffffa91005d7f6e8 EFLAGS: 00000283
    [15858776.231790] RAX: ffff901dd559f000 RBX: 0000000000000002 RCX: 0000000000000002
    [15858776.231794] RDX: 0000000000000002 RSI: 000000000001dad5 RDI: ffff901dd55b1000
    [15858776.231798] RBP: 000000000001dad5 R08: 0000000000000009 R09: 0000000000000300
    [15858776.231801] R10: 0000000000000001 R11: 0000000000000000 R12: 000000000001dad5
    [15858776.231804] R13: ffff901dc36fe098 R14: 0000000000000002 R15: ffff901dc8d4fc38
    [15858776.231808] FS:  0000000000000000(0000) GS:ffff902cffd80000(0000) knlGS:0000000000000000
    [15858776.231813] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [15858776.231817] CR2: 000018c403504000 CR3: 0000000af6c2c005 CR4: 00000000000606e0
    [15858776.231821] Call Trace:
    [15858776.231825]  
    [15858776.231828]  ? nmi_cpu_backtrace+0x84/0xf0
    [15858776.231837]  ? nmi_cpu_backtrace_handler+0x8/0x10
    [15858776.231846]  ? nmi_handle+0x58/0x150
    [15858776.231853]  ? ext4_get_group_info+0x12/0x60
    [15858776.231860]  ? default_do_nmi+0x69/0x170
    [15858776.231870]  ? exc_nmi+0xfe/0x130
    [15858776.231878]  ? end_repeat_nmi+0x16/0x67
    [15858776.231889]  ? ext4_get_group_info+0x12/0x60
    [15858776.231896]  ? ext4_get_group_info+0x12/0x60
    [15858776.231903]  ? ext4_get_group_info+0x12/0x60
    [15858776.231910]  
    [15858776.231912]  
    [15858776.231914]  ext4_mb_good_group+0x24/0xf0
    [15858776.231922]  ext4_mb_find_good_group_avg_frag_lists+0x89/0xe0
    [15858776.231929]  ext4_mb_regular_allocator+0x44e/0xe60
    [15858776.231939]  ext4_mb_new_blocks+0x9db/0x1040
    [15858776.231948]  ? ext4_find_extent+0x3bd/0x410
    [15858776.231955]  ext4_ext_map_blocks+0x382/0x1890
    [15858776.231962]  ? release_pages+0x122/0x3e0
    [15858776.231971]  ? filemap_get_folios_tag+0x1c5/0x1f0
    [15858776.231983]  ext4_map_blocks+0x18a/0x610
    [15858776.231990]  ? ext4_alloc_io_end_vec+0x15/0x50
    [15858776.231997]  ext4_do_writepages+0x74d/0xc80
    [15858776.232007]  ? preempt_count_add+0x65/0xa0
    [15858776.232016]  ext4_writepages+0xbd/0x1a0
    [15858776.232026]  do_writepages+0xc6/0x1a0
    [15858776.232032]  ? __schedule+0x2fb/0x890
    [15858776.232041]  __writeback_single_inode+0x3b/0x360
    [15858776.232048]  ? _raw_spin_lock+0xe/0x30
    [15858776.232055]  writeback_sb_inodes+0x1f9/0x4d0
    [15858776.232064]  __writeback_inodes_wb+0x47/0xe0
    [15858776.232072]  wb_writeback+0x265/0x2d0
    [15858776.232080]  wb_workfn+0x32c/0x4b0
    [15858776.232087]  ? _raw_spin_unlock+0xd/0x30
    [15858776.232095]  ? finish_task_switch.isra.0+0x8c/0x270
    [15858776.232106]  process_one_work+0x134/0x2f0
    [15858776.232115]  worker_thread+0x2f2/0x410
    [15858776.232123]  ? preempt_count_add+0x65/0xa0
    [15858776.232130]  ? _raw_spin_lock_irqsave+0x12/0x40
    [15858776.232139]  ? __pfx_worker_thread+0x10/0x10
    [15858776.232146]  kthread+0xf1/0x120
    [15858776.232153]  ? __pfx_kthread+0x10/0x10
    [15858776.232158]  ret_from_fork+0x2b/0x40
    [15858776.232164]  ? __pfx_kthread+0x10/0x10
    [15858776.232169]  ret_from_fork_asm+0x1b/0x30
    [15858776.232180]  


Is the call to ext4_mb_find_good_group_avg_frag_lists an indication of a fragmentation issue, if not how can I debug this issue?

No problems are reported in /proc/mdstat and I see no /dev/sdX /dev/mdX related errors in the kernel log.

                        
Asked by d.signer (1 rep)
Dec 3, 2024, 09:39 AM
kworker/u16:4+flush-252:4 slows down the system, is it a fragmentation issue?

Related Questions