Created attachment 305822 [details] Screenshot of hung task message Root volume is MD RAID1 consisting of two SATA devices. Seems to hang indefinitely during boot (screenshot of log attached). Bisection identified commit 1b0a2d950ee2a54aa04fb31ead32144be0bbf690 as first appearance of problem. All kernels I've tried prior to that commit start array and then mount volume without issue.
There is a patch that wanted to fix something in 1b0a2d950ee2 but never was applied: https://lore.kernel.org/all/20231221071109.1562530-3-linan666@huaweicloud.com/ I just asked what's up there and pointed developers here.
I'm quite happy to provide additional information or try out fixes myself when I am able to do so. I'll see if I can give that patch a try in the near future and report back as to whether it seems to help.
We had some discussions on that patch set. Matthew, could you please try with that set and see whether it fixes the problem? https://patchwork.kernel.org/project/linux-raid/list/?series=812045
The patch did not seem to affect the problem. The boot process hung as it did before, eventually indicating a hung task with the same stack context that I had previously observed.
It seems that mddev_suspend_and_lock is waiting for io to complete. Are there any other processes hung? Can you provide commands for triggering this issue? I will try to replicate this issue in my environment.
I'm not sure. It's happening during boot, presumably when the md driver is loaded but before the root file system is mounted (which is on one of the md volumes itself). As such, I don't have many straightforward paths to extract more information about the system's operating state at the time. I'll give it some thought and see if I can think of any ways to glean more information. Perhaps I'll try to reproduce the whole configuration myself on a different set of hardware to troubleshoot for issues involving drivers or other parts of the kernel. Knowing that it's apparently a matter of I/O that is (apparently) never completing might give me a direction to look, too. I'll try to provide more information soon if I'm able to gather any.
Hi Matthew, Could you please try 1/14 through 5/14 of this set fixes this issue? https://patchwork.kernel.org/project/linux-raid/list/?series=822030 They should apply on stable tree linux-6.7.y branch. Or you can use md-6.7-fix branch from md tree: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-6.7-fix Thanks, Song
Actually, it is probably not enough. I will test more. Thanks, Song
OK, now md-6.7-fix passes my tests. Matthew, could you please give it a try? https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-6.7-fix Thanks, Song
I built and tested your md-6.7-fix on my hardware and experienced no problems. System booted normally.
Confirmed that I'm no longer experiencing the issue as of 6.7.7.