Bug 218687

Summary: blk_validate_limits warning : md device fails to start
Product: IO/Storage Reporter: Janpieter Sollie (janpieter.sollie)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: RESOLVED CODE_FIX    
Severity: high CC: mat.jonczyk, tom.leiming
Priority: P3    
Hardware: AMD   
OS: Linux   
Kernel Version: 6.9-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg of 6.9-rc2
mdadm --detail /dev/md1
kernel .config
dmesg on 6.9-rc2 - hybrid RAID

Description Janpieter Sollie 2024-04-06 12:33:57 UTC
Created attachment 306092 [details]
dmesg of 6.9-rc2

definitely configuration-specific, I have 3 raid 6 devices where 1 of 3 fails to start.

during bootup, the 2 working raid 6 devices initialize, assemble, and run.
The third one (md1), however, does not:

> WARNING: CPU: 2 PID: 2064 at block/blk-settings.c:192
> blk_validate_limits+0x165/0x180

all devices (including external journal) get detected, but the array does not start.
restarting mdadm service causes a segmentation fault, the 2 working devices get up again, and the 3rd one doesn't:
> mdadm: /dev/md0 has been started with 4 drives and 1 spare and 1 journal.
> mdadm: /dev/md3 has been started with 6 drives and 2 spares and 1 journal.
> mdadm: failed to RUN_ARRAY /dev/md1: Invalid argument    

this did not happen in 6.8 series.
reverting to 6.8 makes the array work properly again.

I'll add the .config + mdadm --info of the failing array, let me know if you'd like more info.
Comment 1 Janpieter Sollie 2024-04-06 12:35:44 UTC
Created attachment 306093 [details]
mdadm --detail /dev/md1
Comment 2 Ming Lei 2024-04-06 12:37:29 UTC
Can you test the following patch?

diff --git a/block/blk-settings.c b/block/blk-settings.c
index cdbaef159c4b..864f86e98f01 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -187,12 +187,7 @@ static int blk_validate_limits(struct queue_limits *lim)
         * page (which might not be identical to the Linux PAGE_SIZE).  Because
         * of that they are not limited by our notion of "segment size".
         */
-       if (lim->virt_boundary_mask) {
-               if (WARN_ON_ONCE(lim->max_segment_size &&
-                                lim->max_segment_size != UINT_MAX))
-                       return -EINVAL;
-               lim->max_segment_size = UINT_MAX;
-       } else {
+       if (!lim->virt_boundary_mask) {
                /*
                 * The maximum segment size has an odd historic 64k default that
                 * drivers probably should override.  Just like the I/O size we
Comment 3 Janpieter Sollie 2024-04-06 12:38:17 UTC
Created attachment 306094 [details]
kernel .config
Comment 4 Janpieter Sollie 2024-04-06 12:56:21 UTC
(In reply to Lei Ming from comment #2)
> Can you test the following patch?
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index cdbaef159c4b..864f86e98f01 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -187,12 +187,7 @@ static int blk_validate_limits(struct queue_limits *lim)
>          * page (which might not be identical to the Linux PAGE_SIZE). 
> Because
>          * of that they are not limited by our notion of "segment size".
>          */
> -       if (lim->virt_boundary_mask) {
> -               if (WARN_ON_ONCE(lim->max_segment_size &&
> -                                lim->max_segment_size != UINT_MAX))
> -                       return -EINVAL;
> -               lim->max_segment_size = UINT_MAX;
> -       } else {
> +       if (!lim->virt_boundary_mask) {
>                 /*
>                  * The maximum segment size has an odd historic 64k default
> that
>                  * drivers probably should override.  Just like the I/O size
> we

that works ...
Comment 5 Mateusz Jończyk 2024-04-07 13:27:25 UTC
Hello,

It seems I also have the same problem and was about to post to LKML about it.

In my laptop, I have a RAID1 array on top of NVMe and SATA SSD hard drives.
This RAID array fails to get assembled on Linux 6.9-rc2 with the following warning in dmesg:

        WARNING: CPU: 0 PID: 170 at block/blk-settings.c:191 blk_validate_limits+0x233/0x270

I have bisected it down to

commit 97894f7d3c29 ("md/raid1: use the atomic queue limit update APIs").

It appears that one of the component devices in the array uses a virtual boundary,
the other one max_segment_size.

On Linux 6.8 I have:

/sys/block/md0/queue/max_segment_size:65536
/sys/block/md1/queue/max_segment_size:65536
/sys/block/nvme0n1/queue/max_segment_size:4294967295
/sys/block/sda/queue/max_segment_size:65536
/sys/block/sdb/queue/max_segment_size:65536

/sys/block/md0/queue/virt_boundary_mask:4095
/sys/block/md1/queue/virt_boundary_mask:4095
/sys/block/nvme0n1/queue/virt_boundary_mask:4095
/sys/block/sda/queue/virt_boundary_mask:0
/sys/block/sdb/queue/virt_boundary_mask:0

so it appears that the md0/md1 devices get a combination of values:
virt_boundary_mask comes from /dev/nvme0n1, max_segment_size from /dev/sdb.

This appears not to be working on Linux 6.9 with the new atomic queue limit update API.

Greeting,
Mateusz
Comment 6 Mateusz Jończyk 2024-04-07 13:29:27 UTC
Created attachment 306103 [details]
dmesg on 6.9-rc2 - hybrid RAID
Comment 7 Mateusz Jończyk 2024-04-07 13:49:36 UTC
I will test your patch.
Comment 8 Mateusz Jończyk 2024-04-07 16:18:26 UTC
Yeah, this patch fixes it.

Tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Comment 9 Mateusz Jończyk 2024-07-07 19:27:07 UTC
For me, this bug was fixed in mainline before Linux 6.9 was released, most probably by commit

commit b561ea56a26415bf44ce8ca6a8e625c7c390f1ea
Author: Ming Lei <ming.lei@redhat.com>
Date:   Sun Apr 7 21:19:31 2024 +0800

    block: allow device to have both virt_boundary_mask and max segment size
    
    When one stacking device is over one device with virt_boundary_mask and
    another one with max segment size, the stacking device have both limits
    set. This way is allowed before d690cb8ae14b ("block: add an API to
    atomically update queue limits").
    
    Relax the limit so that we won't break such kind of stacking setting.
    
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218687
    Reported-by: janpieter.sollie@edpnet.be
    Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits")
    Link: https://lore.kernel.org/linux-block/ZfGl8HzUpiOxCLm3@fedora/
    [...]
    Link: https://lore.kernel.org/r/20240407131931.4055231-1-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

@Janpieter, can you confirm that it is indeed fixed, so that this bug report be closed?
Comment 10 Janpieter Sollie 2024-07-08 09:16:51 UTC
bug report can be closed indeed ...mainline contains necessary fixes to resolve the problem