Bug 202353 - CPU softlocks with 100% IOWAIT but no disk activity since 4.19.1
Summary: CPU softlocks with 100% IOWAIT but no disk activity since 4.19.1
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-21 00:38 UTC by candyangel
Modified: 2019-08-18 01:15 UTC (History)
5 users (show)

See Also:
Kernel Version: Introduced 4.19.1
Tree: Mainline
Regression: Yes


Attachments
dmesg of blocked tasks during softlock (15.88 KB, text/plain)
2019-01-21 00:38 UTC, candyangel
Details
list-pending-block-requests script (1.47 KB, text/plain)
2019-01-23 04:53 UTC, Bart Van Assche
Details
list-pending-block-requests script output during stall (2.38 KB, text/plain)
2019-01-24 11:39 UTC, candyangel
Details
lsscsi -L (787 bytes, text/plain)
2019-01-25 00:10 UTC, candyangel
Details
lsscsi -v (532 bytes, text/plain)
2019-01-25 00:10 UTC, candyangel
Details
Journal entries during freeze state (22.01 KB, text/plain)
2019-01-25 22:12 UTC, brian
Details

Description candyangel 2019-01-21 00:38:25 UTC
Created attachment 280605 [details]
dmesg of blocked tasks during softlock

Since upgrading from 4.18.16 to 4.19.1, the affected machine will periodically stop responding on any processes that requires disk access for some time (3 minutes and up).

Checking 'top' show that every core is in 100% IOWAIT but there is no disk activity at all (according to 'iotop', HDD activity indicator on the machine and just listening to the drives themselves, they are clearly idle).

The issue persists to the latest kernel available (4.20.3) for this distro (Arch Linux). It appears the issue got worse since its introduction as the first time it happened (on 4.20.2), it stalled for over 30 minutes, but 4.19.1 stalls for much shorter times (e.g. 3-5 minutes).

The setup is 2 SATA HDDs, set up in RAID10f2 with ext4 filesystems. I don't believe it to be a hardware error as the fault can be reliably "toggled" by down/up-grading between 4.18.16 and 4.19.1 kernel versions.
Comment 1 brian 2019-01-21 13:33:44 UTC
I'm also having this issue. Also on Arch Linux with ext4 on a RAID array.
Comment 2 Bart Van Assche 2019-01-21 16:13:45 UTC
Please have a look at https://bugzilla.kernel.org/show_bug.cgi?id=201685 and close this bug report if it is a duplicate of that bug report.
Comment 3 candyangel 2019-01-21 21:36:47 UTC
I read through the comments on #201685 and I don't believe it is the same bug.

There is no mention of the machine experiencing stall/pause/slowdown and I am not seeing any filesystem corruption as reported there (with the exception of systemd's journald complaining the log is corrupt after it killed the process during a stall, but that's expected and not filesystem caused..).

This is a fairly low IO machine though in its general usage (mostly just playing music, browsing the internet, watching youtube) so it may not be triggering any corruption.

If I try some IO operations (touch a file, write contents to existing, append to file) during a stall and see if those changes make it to disk when the stall ends, would that help narrow it down?
Comment 4 Bart Van Assche 2019-01-21 21:52:26 UTC
There have been reports of I/O stalls caused by write-back throttling (CONFIG_BLK_WBT). Is write-back throttling enabled in your kernel config? If so, have you already tried to disable write-back throttling in the kernel config?
Comment 5 candyangel 2019-01-23 00:28:46 UTC
I compiled 4.19.1 with CONFIG_BLK_WBT=n (it is enabled in the default configuration for this distro) and the stalling happened there.

I suspected the latency controller that was introduced in 4.19.1 may be a cause so I compiled with that disabled instead (CONFIG_BLK_CGROUP_IOLATENCY=n) and the issue still occurs with that too.

I appreciate your patience with me. This is the first time I've had a kernel issue (that I can remember) so I've never had to debug it. I'm totally willing to learn how though!

What is the next step?
Comment 6 Bart Van Assche 2019-01-23 04:53:07 UTC
Created attachment 280685 [details]
list-pending-block-requests script

Are you using the legacy block layer or blk-mq? If you are using the legacy block layer, can you try to reproduce this issue with blk-mq? If this hang is reproducible with blk-mq, can you run the list-pending-block-requests script after having reproduced the hang and share the output of that script? This will reveal in which block driver a request got stuck. Note: CONFIG_BLK_DEBUG_FS must be enabled in the kernel config for the attached script to work.
Comment 7 candyangel 2019-01-24 10:13:37 UTC
It is using blk-mq as far as I can tell (CONFIG_SCSI_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=y are set).

I updated the machine back to one of the stalling kernels last night so I could use the script (CONFIG_BLK_DEBUG_FS=y is set) but the issue didn't occur before I went to bed.

However, this morning it was stalled but I wasn't able to run the script due to being remote (SSH connection works, but shell can't start) so I asked someone local to try opening a terminal to run the script (expecting it to also not start) but the opposite happened. Opening the terminal freed the stall!

This isn't the first time that taking certain actions (opening or closing an application) seems to clear the stall. I didn't originally mention it as I thought it was just co-incidence..

I will still try to get some output from this script during a stall, but thought that this (seeming) behaviour might be a clue to the problem?
Comment 8 candyangel 2019-01-24 11:39:15 UTC
Created attachment 280711 [details]
list-pending-block-requests script output during stall

Just caught it in the stalled state and got the following output from the list-pending-block-requests script given.

Again, the system was stalled until I ran wgetpaste, which seemed to clear the stall..
Comment 9 Bart Van Assche 2019-01-24 16:34:59 UTC
This makes me wonder what kind of device /dev/sda is ... can you share the output of lsscsi -L?
Comment 10 Bart Van Assche 2019-01-24 21:43:29 UTC
In addition to lsscsi -L, please also provide the lsscsi -v output.
Comment 11 candyangel 2019-01-25 00:10:25 UTC
Created attachment 280745 [details]
lsscsi -L

sda is just a standard, run-of-the-mill consumer SATA HDD (WD Blue 3TB).
There's nothing special about it (or sdb, the other drive in the RAID array, which is the same make/model).
Comment 12 candyangel 2019-01-25 00:10:52 UTC
Created attachment 280747 [details]
lsscsi -v
Comment 13 Bart Van Assche 2019-01-25 00:54:29 UTC
Thank you for having provided all this information. This information shows that /dev/sda is controlled by the ATA driver, a driver I'm not familiar with. I hope that someone who is familiar with ATA can take over. If not, you will have to run a bisect to identify the patch that introduced the soft lockup problem.
Comment 14 candyangel 2019-01-25 15:33:11 UTC
No, thank *you* for your guidance, it is very appreciated!

I had a quick look at what might have changed in the ATA driver and I can see some things that may be a cause (to my untrained eye :]). Just knowing it is at the ATA driver level (not LVM, RAID or FS) really narrows down what commits to bisect against, so hopefully it will be relatively easy to find.
Comment 15 brian 2019-01-25 22:12:18 UTC
Created attachment 280775 [details]
Journal entries during freeze state

Attached are some journal entries from freezes today. I created a wrapper to run the list-pending-block-requests script and it wrote every minute prior and then after the freeze, but not during the freeze. It writes to a ramdisk but the system was unrecoverable and had to be restarted.
Comment 16 candyangel 2019-01-26 17:29:56 UTC
I didn't have any problem running the script during the stall. Did you put the script itself in the ramdisk?
Comment 17 Bart Van Assche 2019-01-30 16:04:13 UTC
Can you try the following patch: https://marc.info/?l=linux-kernel&m=154883900500866 ?
Comment 18 brian 2019-01-31 13:32:48 UTC
(In reply to Bart Van Assche from comment #17)
> Can you try the following patch:
> https://marc.info/?l=linux-kernel&m=154883900500866 ?

I compiled 4.20.5 with that patch and using blk-mq. I have had a few high IOWAIT events but not that locked up the machine or approached 100%. I will continue to monitor.
Comment 19 brian 2019-02-07 13:20:25 UTC
(In reply to Bart Van Assche from comment #17)
> Can you try the following patch:
> https://marc.info/?l=linux-kernel&m=154883900500866 ?

This patch seems to fix the problem. I have not experienced any of the stalls with high IO wait since.
Comment 20 candyangel 2019-02-11 10:15:19 UTC
I applied the patch to 4.19.1 and had no issues for a while, so I went ahead and compiled the latest kernel for Arch Linux (4.20.7) and that has been running the past 3 days without issue!

So it looks like this patch resolves it. I guess using RAID triggered the race condition more often..?

Thanks so much for directing me(/us) to this patch!
Comment 21 marshalled 2019-08-17 23:41:48 UTC
Hi, can someone explain to me (not a coder) what the above patch actually does?  A growing number of us are having ongoing issues that sound similar to this on Unraid (Kernel 4.19.56-unraid).  Specifically, while large amounts of data are being read from SSD and written to HDD via, other processes are stopping /stuttering and generally not running or not running well.

The previous version on kernel 4.18.x does not exhibit these symptoms.

From what I can understand that patch resolves an I/o hang cause by data entering a block queue, then being flushed via fsync?  Is there a way of testing this without compiling into the kernel, i.e. a temporary flag that can be set that's lost on reboot?

Many thanks.
Comment 22 marshalled 2019-08-17 23:44:37 UTC
I should add, unraid has a beta available with kernel 4.19.60, I can't quite understand if this patch is contained within that.  I believe this has been applied in 4.19.1 which would be prior to 4.19.60?  Many thanks.
Comment 23 Bart Van Assche 2019-08-18 01:15:07 UTC
The patch mentioned above has been assigned commit ID 85bd6e61f34d ("blk-mq: fix a hung issue when fsync") and went upstream in kernel v5.0. It has been backported as commit 80c8452ad45f to the v4.19.x kernel series and that commit is present in kernel v4.19.24 and later v4.19.x kernel versions. Marshalled, I think that you hit another issue. Please report the issue that you encountered on the linux-raid@vger.kernel.org mailing list.

Note You need to log in before you can comment on or make changes to this bug.