Bug 217699 - Linux 6.4.4 Regression: MariaDB (mysqld) causes one core of the CPU to use 100% with io-wait operations.
Summary: Linux 6.4.4 Regression: MariaDB (mysqld) causes one core of the CPU to use 10...
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: Intel Linux
: P3 high
Assignee: Jens Axboe
URL: https://bbs.archlinux.org/viewtopic.p...
Keywords:
: 217700 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-07-23 07:26 UTC by Fijxu
Modified: 2023-08-04 06:04 UTC (History)
6 users (show)

See Also:
Kernel Version: 6.4.4 6.1.39
Subsystem:
Regression: Yes
Bisected commit-id: bd4f737b145d85c7183ec879ce46b57ce64362e1


Attachments

Description Fijxu 2023-07-23 07:26:18 UTC
OS: Arch Linux

Linux 6.4.4 introduces 100% usage in one core of the CPU with io-wait operations when MariaDB 11.0.2 (mysqld process) is running, even if the database is completely empty, this problem still occurs.

To prevent this bug for happening it is necessary to downgrade to Linux <6.4.3. Linux 6.4.4 without Arch Linux patches, Linux Mainline 6.5-rc1 and 6.5-rc2 don't fix the problem either.

Strangely there is also no sign that the mysqld process is using an absurd amount of io-wait on the processor core, it was found by purely discarding and killing processes until the processor core usage was normal and without io-wait operations. No tool to monitor I/O operations to find was able to find that mysqld was the cause of this bug (not even top). 

I also used different commits prior to the 6.4.4 release using `git bisect` and none of them introduced this bug. There is no journal logs nor dmesg logs related to this bug. It independent of the file system used (I guess, it happens on EXT4 and BTRFS, but it doesn't seem to be related with the file system either)

How to reproduce it?:
- Install and use the Linux 6.4.4 Release
- Install MariaDB, initialize a empty database and run the process with your preferred init system
- Use htop or top to see the high io-wait usage in one core of the CPU

Arch Linux forums thread (Where we tested different kernel commits and there is other people with the same problem): https://bbs.archlinux.org/viewtopic.php?id=287343 

If i need to provide more information, just ask for it...

(This bug was found originally because i use Akonadi, a PIM layer made by KDE Plasma developers that uses MariaDB/MySQL to store data)
Comment 1 Fijxu 2023-07-23 07:59:46 UTC
I have to note that i just tested Linux LTS 6.1.39
Comment 2 Fijxu 2023-07-23 08:08:12 UTC
Sorry, ignore the the previous message...

I just tested Linux LTS 6.1.39 and it also introduces the same io-wait bug but if i downgrade back to Linux LTS 6.1.38, the bug doesn't exist and it works as intended.
Comment 3 Oleksandr Natalenko 2023-07-23 09:42:05 UTC
Reported as https://lore.kernel.org/lkml/12251678.O9o76ZdvQC@natalenko.name/
Comment 4 Gene 2023-07-23 09:45:18 UTC
I confirm the same issue is in linus' tree as of git HEAD at 

    commit c2782531397f5cb19ca3f8f9c17727f1cdf5bee8.

The Arch forum suggests 6.44 stable is fixed by reverting   

    commit bd4f737b145d85c7183ec879ce46b57ce64362e1

The corresponding commit in mainline is 

    8a796565cec3601071cbbd27d6304e202019d014
Comment 5 Mike Cloaked 2023-07-23 12:32:47 UTC
I confirm this is resolved for me running kernel 6.4.4 built with commit bd4f737b145d85c7183ec879ce46b57ce64362e1 reverted.
Comment 6 Oleksandr Natalenko 2023-07-24 17:10:42 UTC
Would you be able to test the patch from here: [1]?

[1] https://lore.kernel.org/lkml/11ded843-ac08-2306-ad0f-586978d038b1@kernel.dk/raw
Comment 7 Gene 2023-07-24 18:14:25 UTC
tested and acked on lkml.

I tested 6.4.6 from stable with Jens' patch applied all works fine now.
Comment 8 Jens Axboe 2023-07-24 18:23:08 UTC
*** Bug 217700 has been marked as a duplicate of this bug. ***
Comment 9 Mike Cloaked 2023-07-24 18:37:22 UTC
Tested kernel 6.4.6 built with the referenced patch, and the anomalous 100% cpu activity on a single core is no longer displayed in the System Monitor widget, and mpstat no longer shows >94% io-wait. So this resolves the issue for my systems.
Comment 10 Fijxu 2023-07-25 18:03:46 UTC
I tested https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=io_uring-6.5&id=7b72d661f1f2f950ab8c12de7e2bc48bdac8ed69

It works without problems, now there is no false io-wait reflected into the CPU.
Comment 11 Fijxu 2023-07-25 18:06:31 UTC
Also, this is my first Bugzilla report, since it's fixed in the "Linux 4.x block layer trees" but not in mainline, I should mark this bug as RESOLVED?
Comment 12 Shmerl 2023-07-27 23:55:30 UTC
I tested the fix on top of  6.5.0-rc3 and it helps. Is this going to be added to  6.5.0-rc4?
Comment 13 Shmerl 2023-07-30 01:37:29 UTC
Looks like it landed in the mainline, so the fix should be included 6.5.0-rc4.

Note You need to log in before you can comment on or make changes to this bug.