Created attachment 306092 [details] dmesg of 6.9-rc2 definitely configuration-specific, I have 3 raid 6 devices where 1 of 3 fails to start. during bootup, the 2 working raid 6 devices initialize, assemble, and run. The third one (md1), however, does not: > WARNING: CPU: 2 PID: 2064 at block/blk-settings.c:192 > blk_validate_limits+0x165/0x180 all devices (including external journal) get detected, but the array does not start. restarting mdadm service causes a segmentation fault, the 2 working devices get up again, and the 3rd one doesn't: > mdadm: /dev/md0 has been started with 4 drives and 1 spare and 1 journal. > mdadm: /dev/md3 has been started with 6 drives and 2 spares and 1 journal. > mdadm: failed to RUN_ARRAY /dev/md1: Invalid argument this did not happen in 6.8 series. reverting to 6.8 makes the array work properly again. I'll add the .config + mdadm --info of the failing array, let me know if you'd like more info.
Created attachment 306093 [details] mdadm --detail /dev/md1
Can you test the following patch? diff --git a/block/blk-settings.c b/block/blk-settings.c index cdbaef159c4b..864f86e98f01 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -187,12 +187,7 @@ static int blk_validate_limits(struct queue_limits *lim) * page (which might not be identical to the Linux PAGE_SIZE). Because * of that they are not limited by our notion of "segment size". */ - if (lim->virt_boundary_mask) { - if (WARN_ON_ONCE(lim->max_segment_size && - lim->max_segment_size != UINT_MAX)) - return -EINVAL; - lim->max_segment_size = UINT_MAX; - } else { + if (!lim->virt_boundary_mask) { /* * The maximum segment size has an odd historic 64k default that * drivers probably should override. Just like the I/O size we
Created attachment 306094 [details] kernel .config
(In reply to Lei Ming from comment #2) > Can you test the following patch? > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index cdbaef159c4b..864f86e98f01 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -187,12 +187,7 @@ static int blk_validate_limits(struct queue_limits *lim) > * page (which might not be identical to the Linux PAGE_SIZE). > Because > * of that they are not limited by our notion of "segment size". > */ > - if (lim->virt_boundary_mask) { > - if (WARN_ON_ONCE(lim->max_segment_size && > - lim->max_segment_size != UINT_MAX)) > - return -EINVAL; > - lim->max_segment_size = UINT_MAX; > - } else { > + if (!lim->virt_boundary_mask) { > /* > * The maximum segment size has an odd historic 64k default > that > * drivers probably should override. Just like the I/O size > we that works ...
Hello, It seems I also have the same problem and was about to post to LKML about it. In my laptop, I have a RAID1 array on top of NVMe and SATA SSD hard drives. This RAID array fails to get assembled on Linux 6.9-rc2 with the following warning in dmesg: WARNING: CPU: 0 PID: 170 at block/blk-settings.c:191 blk_validate_limits+0x233/0x270 I have bisected it down to commit 97894f7d3c29 ("md/raid1: use the atomic queue limit update APIs"). It appears that one of the component devices in the array uses a virtual boundary, the other one max_segment_size. On Linux 6.8 I have: /sys/block/md0/queue/max_segment_size:65536 /sys/block/md1/queue/max_segment_size:65536 /sys/block/nvme0n1/queue/max_segment_size:4294967295 /sys/block/sda/queue/max_segment_size:65536 /sys/block/sdb/queue/max_segment_size:65536 /sys/block/md0/queue/virt_boundary_mask:4095 /sys/block/md1/queue/virt_boundary_mask:4095 /sys/block/nvme0n1/queue/virt_boundary_mask:4095 /sys/block/sda/queue/virt_boundary_mask:0 /sys/block/sdb/queue/virt_boundary_mask:0 so it appears that the md0/md1 devices get a combination of values: virt_boundary_mask comes from /dev/nvme0n1, max_segment_size from /dev/sdb. This appears not to be working on Linux 6.9 with the new atomic queue limit update API. Greeting, Mateusz
Created attachment 306103 [details] dmesg on 6.9-rc2 - hybrid RAID
I will test your patch.
Yeah, this patch fixes it. Tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
For me, this bug was fixed in mainline before Linux 6.9 was released, most probably by commit commit b561ea56a26415bf44ce8ca6a8e625c7c390f1ea Author: Ming Lei <ming.lei@redhat.com> Date: Sun Apr 7 21:19:31 2024 +0800 block: allow device to have both virt_boundary_mask and max segment size When one stacking device is over one device with virt_boundary_mask and another one with max segment size, the stacking device have both limits set. This way is allowed before d690cb8ae14b ("block: add an API to atomically update queue limits"). Relax the limit so that we won't break such kind of stacking setting. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218687 Reported-by: janpieter.sollie@edpnet.be Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits") Link: https://lore.kernel.org/linux-block/ZfGl8HzUpiOxCLm3@fedora/ [...] Link: https://lore.kernel.org/r/20240407131931.4055231-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> @Janpieter, can you confirm that it is indeed fixed, so that this bug report be closed?
bug report can be closed indeed ...mainline contains necessary fixes to resolve the problem