Bug 217700

Summary: Regression in 6.4.4 regarding high iowait percentage
Product: IO/Storage Reporter: Mike Cloaked (mike.cloaked)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: RESOLVED DUPLICATE    
Severity: high CC: oleksandr
Priority: P3    
Hardware: All   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=217699
Kernel Version: 6.4.4 Subsystem:
Regression: Yes Bisected commit-id: bd4f737b145d85c7183ec879ce46b57ce64362e1
Attachments: journal with 6.4.4 where the high iowait time is present

Description Mike Cloaked 2023-07-23 07:30:48 UTC
Created attachment 304686 [details]
journal with 6.4.4 where the high iowait time is present

After update to kernel 6.4.4 one of the  cores is showing high iowait percentage.

$ mpstat -P ALL
Linux 6.4.4-stable-1 (lenovo2)  22/07/23        _x86_64_        (8 CPU)

21:55:25     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
21:55:25     all    2.31    0.00    0.76   11.98    0.19    0.04    0.00    0.00    0.00   84.71
21:55:25       0    2.51    0.00    0.77    0.08    0.11    0.04    0.00    0.00    0.00   96.50
21:55:25       1    1.89    0.00    0.59    0.05    0.38    0.04    0.00    0.00    0.00   97.05
21:55:25       2    2.43    0.00    0.80    0.07    0.09    0.03    0.00    0.00    0.00   96.59
21:55:25       3    2.05    0.00    0.57   94.46    0.07    0.04    0.00    0.00    0.00    2.81
21:55:25       4    2.57    0.01    0.77    0.06    0.09    0.03    0.00    0.00    0.00   96.47
21:55:25       5    2.62    0.00    1.20    0.08    0.14    0.04    0.00    0.00    0.00   95.92
21:55:25       6    2.29    0.00    0.83    0.98    0.13    0.11    0.00    0.00    0.00   95.66
21:55:25       7    2.14    0.00    0.59    0.05    0.48    0.02    0.00    0.00    0.00   96.71

Downgrading the kernel back to 6.4.3 the problem disappears.  

I am running Arch Linux, and a thread on the Arch Linux forums yielded a bisection that identified the commit that is the likely cause of this issue:

A 6.4.4 kernel built with  bd4f737b145d85c7183ec879ce46b57ce64362e1 reverted also fixes the issue.
Comment 1 Mike Cloaked 2023-07-23 07:44:24 UTC
This is the same issue as in https://bugzilla.kernel.org/show_bug.cgi?id=217699
Comment 2 Oleksandr Natalenko 2023-07-23 09:41:57 UTC
Reported as https://lore.kernel.org/lkml/12251678.O9o76ZdvQC@natalenko.name/
Comment 3 Mike Cloaked 2023-07-23 12:33:13 UTC
I confirm this is resolved for me running kernel 6.4.4 built with commit bd4f737b145d85c7183ec879ce46b57ce64362e1 reverted.
Comment 4 Mike Cloaked 2023-07-23 19:59:50 UTC
The LKML thread linked to this BZ indicates that the high cpu utilisation in cpu monitors is incorrect display and that the kernel is performing correctly even when iowait is high for mariadbd running.
Comment 5 Mike Cloaked 2023-07-23 20:01:03 UTC
The LKML thread linked to this BZ indicates that the high cpu utilisation in cpu monitors is incorrect display and that the kernel is performing correctly even when iowait is high for mariadbd running. In that case the original commit should be retained.
Comment 6 Oleksandr Natalenko 2023-07-24 17:10:36 UTC
Would you be able to test the patch from here: [1]?

[1] https://lore.kernel.org/lkml/11ded843-ac08-2306-ad0f-586978d038b1@kernel.dk/raw
Comment 7 Jens Axboe 2023-07-24 18:23:08 UTC

*** This bug has been marked as a duplicate of bug 217699 ***
Comment 8 Mike Cloaked 2023-07-24 18:36:46 UTC
Tested kernel 6.4.6 built with the referenced patch, and the anomalous 100% cpu activity on a single core is no longer displayed in the System Monitor widget, and mpstat no longer shows >94% io-wait. So this resolves the issue for my systems.