Bug 218184

Summary: block/badblocks.c warning in 6.7-rc2
Product: IO/Storage Reporter: Janpieter Sollie (janpieter.sollie)
Component: MDAssignee: Jens Axboe (axboe)
Status: RESOLVED INSUFFICIENT_DATA    
Severity: normal CC: colyli, song
Priority: P3    
Hardware: AMD   
OS: Linux   
Kernel Version: 6.7-rc2 Subsystem:
Regression: Yes Bisected commit-id: 3ea3354cb9f03e34ee3fab98f127ab8da4131eee
Attachments: dmesg | scripts/decode_stacktrace.sh
kernel .config

Description Janpieter Sollie 2023-11-23 21:58:50 UTC
Created attachment 305464 [details]
dmesg | scripts/decode_stacktrace.sh

when booting from 6.7-rc2, compiled with clang, I get this warning on one of my 3 bcachefs volumes:
WARNING: CPU: 3 PID: 712 at block/badblocks.c:1284 badblocks_check (block/badblocks.c:1284) 
The reason why isn't clear, but the stack trace points to an error in md error handling.
This bug didn't happen in 6.6
there are 3 commits in 6.7-rc2 which may cause them,
in attachment:
- decoded stacktrace of dmesg
- kernel .config
Comment 1 Janpieter Sollie 2023-11-23 22:01:44 UTC
Created attachment 305465 [details]
kernel .config
Comment 2 Janpieter Sollie 2023-11-23 22:03:19 UTC
to be clear: the bcachefs volume is located on an md device,
I have 3 md devices with a bcachefs volume on them,
2 of them seem to be fine,
3rd one seems to struggle
Comment 3 Janpieter Sollie 2023-11-24 05:50:18 UTC
added Coly Li as he's the committer of the patches according to git diff
Comment 4 Coly Li 2023-11-24 14:25:24 UTC
Yes, let me look into this.
Comment 5 Coly Li 2023-11-24 14:32:42 UTC
The warning is from this line of code in _badblocks_check(),
1284         WARN_ON(bb->shift < 0 || sectors == 0);

It means the caller sent an invalid range to check. From the oops information,
"RDX: 0000000000000000" means parameter 'sectors' is 0.

So the question is, why does md raid code send a 0-length range for badblocks check? Is this behavior on purpose, or improper?

Coly Li
Comment 6 Coly Li 2023-11-24 14:35:39 UTC
IMHO, it doesn't make sense for caller to check a zero-length LBA range. The warning works as expect to detect improper call to badblocks_check().
Comment 7 Janpieter Sollie 2023-11-24 15:14:33 UTC
if I understand you correctly, should we ask the md people why they are calling for zero-length LBA?
the question whether calling to check with 0 sectors should be allowed or not is purely code style ... and I'm not a stylist
Comment 8 Coly Li 2023-11-24 16:38:38 UTC
I add Song Liu (md maintainer) into CC. Maybe he can give us some hint, why 0-length LBA range is sent into badblocks_check().
Comment 9 Janpieter Sollie 2023-11-27 07:02:20 UTC
reset to be an md bug,
adding commit that caused the regression
Comment 10 Song Liu 2023-11-29 00:20:34 UTC
Looking into this.
Comment 11 Song Liu 2023-11-29 04:46:22 UTC
md layer only calls badblocks_check() when md layer thinks 
there are badblocks in the device (md.h: is_badblock()). 
This is likely the reason we only see the issue in 1 of 
the 3 arrays. 

However, I don't think md layer is creating a zero size 
bio. I wonder whether the zero sized bio is from upper
layer. Could we try reproduce the issue with the 
following patch?

Thanks!


diff --git i/drivers/md/md.c w/drivers/md/md.c
index 4ee4593c874a..967e51ec0169 100644
--- i/drivers/md/md.c
+++ w/drivers/md/md.c
@@ -372,6 +372,7 @@ static bool is_suspended(struct mddev *mddev, struct bio *bio)

 void md_handle_request(struct mddev *mddev, struct bio *bio)
 {
+       WARN_ON((bio_data_dir(bio) == READ) && bio_sectors(bio) == 0);
 check_suspended:
        if (is_suspended(mddev, bio)) {
                DEFINE_WAIT(__wait);
@@ -406,6 +407,7 @@ static void md_submit_bio(struct bio *bio)
        const int rw = bio_data_dir(bio);
        struct mddev *mddev = bio->bi_bdev->bd_disk->private_data;

+       WARN_ON((rw == READ) && bio_sectors(bio) == 0);
        if (mddev == NULL || mddev->pers == NULL) {
                bio_io_error(bio);
                return;
diff --git i/drivers/md/raid5.c w/drivers/md/raid5.c
index dc031d42f53b..924631f65fdf 100644
--- i/drivers/md/raid5.c
+++ w/drivers/md/raid5.c
@@ -5534,6 +5534,7 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
        atomic_inc(&rdev->nr_pending);
        rcu_read_unlock();

+       WARN_ON(bio_sectors(raid_bio) == 0);
        if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad,
                        &bad_sectors)) {
                rdev_dec_pending(rdev, mddev);
@@ -5589,11 +5590,15 @@ static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio *raid_bio)
        unsigned chunk_sects = mddev->chunk_sectors;
        unsigned sectors = chunk_sects - (sector & (chunk_sects-1));

+       WARN_ON(bio_sectors(raid_bio) == 0);
+
        if (sectors < bio_sectors(raid_bio)) {
                struct r5conf *conf = mddev->private;
                split = bio_split(raid_bio, sectors, GFP_NOIO, &conf->bio_split);
                bio_chain(split, raid_bio);
                submit_bio_noacct(raid_bio);
+               WARN_ON(bio_sectors(raid_bio) == 0);
+               WARN_ON(bio_sectors(split) == 0);
                raid_bio = split;
        }
Comment 12 Janpieter Sollie 2023-11-29 07:05:47 UTC
so it's not a bug, the warning is actually a feature.

Sadly, I can't reproduce: in bug #218200, as a last resort, I tried to recreate the whole md array.  It did not help, but it also makes this bug #218184 impossible to debug further.

If you have a suggestion like "apply this patch and do (set of commands to be pasted here, probably some dd if=/dev/zero of=/dev/sd... to corrupt sector)", feel free to reopen.

Sorry for abandoning this too early
Comment 13 Song Liu 2023-11-29 11:52:42 UTC
Post some observation on this for the record. Maybe this will ring a 
bell for somebody (Kent?).

In the stack trace:

[   74.070877] raid5_make_request (drivers/md/md.h:224) 
[   74.070879] ? wait_woken (kernel/sched/wait.c:439) 
[   74.070881] ? sched_clock_cpu (kernel/sched/clock.c:394) 
[   74.070882] ? raw_spin_rq_lock_nested (kernel/sched/core.c:574) 
[   74.070884] ? newidle_balance (kernel/sched/fair.c:12329) 
[   74.070886] md_handle_request (./include/linux/percpu-refcount.h:174) 
[   74.070887] ? wake_bit_function (kernel/sched/wait.c:383) 
[   74.070888] md_submit_bio (drivers/md/md.c:?) 
[   74.070889] __submit_bio (block/blk-core.c:604) 
[   74.070890] submit_bio_noacct_nocheck (./include/linux/bio.h:693) 
[   74.070892] submit_bio_noacct (block/blk-core.c:?) 
[   74.070893] ? journal_write_endio (fs/bcachefs/journal_io.c:1669) 
[   74.070894] submit_bio (block/blk-core.c:835) 
[   74.070895] bch2_journal_write (./include/linux/percpu-refcount.h:174) 

From raid5_make_request, we know the call path is:

    raid5_make_request => 
      chunk_aligned_read =>
        raid5_read_one_chunk =>
          is_badblock =>
            badblocks_check

so the bio is READ. 

However, it was triggered by bch2_journal_write(). I cannot figure out 
why bch2_journal_write issues a read here. It appears to me that it 
should only do write or flush.