Bug 217798

Summary: Infiniate systemd loop when power off the machine with multiple MD RAIDs
Product: IO/Storage Reporter: AceLan Kao (acelan)
Component: MDAssignee: io_md
Status: RESOLVED CODE_FIX    
Severity: high CC: bagasdotme, hch, mariusz.tkaczyk, max.lee, song, srinidhi.s
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: Yes Bisected commit-id: 12a6caf27324
Attachments: lsblk
minicom reboot hang log
attachment-17270-0.html

Description AceLan Kao 2023-08-16 02:23:30 UTC
Created attachment 304860 [details]
lsblk

It needs to build at least 2 different RAIDs(eg. RAID0 and RAID10, RAID5 and RAID10) and then you will see below error repeatly(need to use serial console to see it)

[ 205.360738] systemd-shutdown[1]: Stopping MD devices.
[ 205.366384] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
[ 205.373327] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
[ 205.380427] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
[ 205.388257] systemd-shutdown[1]: Stopping MD /dev/md127 (9:127).
[ 205.394880] systemd-shutdown[1]: Failed to sync MD block device /dev/md127, ignoring: Input/output error
[ 205.404975] md: md127 stopped.
[ 205.470491] systemd-shutdown[1]: Stopping MD /dev/md126 (9:126).
[ 205.770179] md: md126: resync interrupted.
[ 205.776258] md126: detected capacity change from 1900396544 to 0
[ 205.783349] md: md126 stopped.
[ 205.862258] systemd-shutdown[1]: Stopping MD /dev/md125 (9:125).
[ 205.862435] md: md126 stopped.
[ 205.868376] systemd-shutdown[1]: Failed to sync MD block device /dev/md125, ignoring: Input/output error
[ 205.872845] block device autoloading is deprecated and will be removed.
[ 205.880955] md: md125 stopped.
[ 205.934349] systemd-shutdown[1]: Stopping MD /dev/md124p2 (259:7).
[ 205.947707] systemd-shutdown[1]: Could not stop MD /dev/md124p2: Device or resource busy
[ 205.957004] systemd-shutdown[1]: Stopping MD /dev/md124p1 (259:6).
[ 205.964177] systemd-shutdown[1]: Could not stop MD /dev/md124p1: Device or resource busy
[ 205.973155] systemd-shutdown[1]: Stopping MD /dev/md124 (9:124).
[ 205.979789] systemd-shutdown[1]: Could not stop MD /dev/md124: Device or resource busy
[ 205.988475] systemd-shutdown[1]: Not all MD devices stopped, 4 left.
Comment 1 AceLan Kao 2023-08-16 02:26:29 UTC
Didn't encounter this issue with v5.19 kernel, and reproduce the issue after v6.0.
After bisected the kernel and found below commit intorduces the issue
12a6caf27324 md: only delete entries from all_mddevs when the disk is freed

I'm not sure the issue should be fixed in systemd or md driver, please help to check what went wrong, thanks.
Comment 2 AceLan Kao 2023-08-16 02:27:21 UTC
Created attachment 304861 [details]
minicom reboot hang log
Comment 3 Mariusz Tkaczyk 2023-08-18 08:26:14 UTC
Hello,
The issue is reproducible with IMSM metadata on productional environment, affect multiple platforms.
Reproduction ratio: around 20%.
Base system functionality is damaged can we consider bumping it to high?

+ Adding MD raid folks.
Comment 4 Bagas Sanjaya 2023-08-18 11:01:23 UTC
Hi AceLan,

Christoph [1] on linux-regressions list posted a proposed workqueue flushing
diff. Can you please test it?

[1]: https://lore.kernel.org/regressions/1b2166ab-e788-475e-a8e2-a6cef26f2524@suse.de/
Comment 5 Bagas Sanjaya 2023-08-18 11:02:24 UTC
(In reply to Bagas Sanjaya from comment #4)
> Hi AceLan,
> 
> Christoph [1] on linux-regressions list posted a proposed workqueue flushing
> diff. Can you please test it?
> 

Oops, Hannes Reinecke had posted the diff instead.
Comment 6 Srinidhi S 2023-08-21 01:50:02 UTC
Created attachment 304919 [details]
attachment-17270-0.html

Hi,
I am off work, please expect delay in responses.


Thanks,
Comment 7 Mariusz Tkaczyk 2023-09-08 14:07:39 UTC
I found the clue of the problem:

@@ -8280,8 +8289,7 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
                next_mddev = list_entry(tmp, struct mddev, all_mddevs);
                if (mddev_get(next_mddev))
                        break;
-               mddev = next_mddev;
-               tmp = mddev->all_mddevs.next;
+               tmp = next_mddev->all_mddevs.next;
        }
        spin_unlock(&all_mddevs_lock);


We are continuously decrementing "mddev->active" for failed array and as a result we cannot delete it (we wants 0).
It passed if me hits the moment when kernel doesn't call mddev_put() for failed array in md_seq_next.
It should lower reproduction ratio a lot or it may be no longer reproducible.

AceLan could you test this on your side?
Comment 8 AceLan Kao 2023-09-13 00:16:42 UTC
I can't reproduce the issue with this patch.
Comment 9 AceLan Kao 2023-09-15 09:28:24 UTC
The issue has been fixed by the commit, thanks.
https://patchwork.kernel.org/project/linux-raid/patch/20230914152416.10819-1-mariusz.tkaczyk@linux.intel.com/