Bug 218184 - block/badblocks.c warning in 6.7-rc2
Summary: block/badblocks.c warning in 6.7-rc2
Status: RESOLVED INSUFFICIENT_DATA
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: MD (show other bugs)
Hardware: AMD Linux
: P3 normal
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-23 21:58 UTC by Janpieter Sollie
Modified: 2023-11-29 11:52 UTC (History)
2 users (show)

See Also:
Kernel Version: 6.7-rc2
Subsystem:
Regression: Yes
Bisected commit-id: 3ea3354cb9f03e34ee3fab98f127ab8da4131eee


Attachments
dmesg | scripts/decode_stacktrace.sh (198.64 KB, text/plain)
2023-11-23 21:58 UTC, Janpieter Sollie
Details
kernel .config (128.10 KB, text/plain)
2023-11-23 22:01 UTC, Janpieter Sollie
Details

Description Janpieter Sollie 2023-11-23 21:58:50 UTC
Created attachment 305464 [details]
dmesg | scripts/decode_stacktrace.sh

when booting from 6.7-rc2, compiled with clang, I get this warning on one of my 3 bcachefs volumes:
WARNING: CPU: 3 PID: 712 at block/badblocks.c:1284 badblocks_check (block/badblocks.c:1284) 
The reason why isn't clear, but the stack trace points to an error in md error handling.
This bug didn't happen in 6.6
there are 3 commits in 6.7-rc2 which may cause them,
in attachment:
- decoded stacktrace of dmesg
- kernel .config
Comment 1 Janpieter Sollie 2023-11-23 22:01:44 UTC
Created attachment 305465 [details]
kernel .config
Comment 2 Janpieter Sollie 2023-11-23 22:03:19 UTC
to be clear: the bcachefs volume is located on an md device,
I have 3 md devices with a bcachefs volume on them,
2 of them seem to be fine,
3rd one seems to struggle
Comment 3 Janpieter Sollie 2023-11-24 05:50:18 UTC
added Coly Li as he's the committer of the patches according to git diff
Comment 4 Coly Li 2023-11-24 14:25:24 UTC
Yes, let me look into this.
Comment 5 Coly Li 2023-11-24 14:32:42 UTC
The warning is from this line of code in _badblocks_check(),
1284         WARN_ON(bb->shift < 0 || sectors == 0);

It means the caller sent an invalid range to check. From the oops information,
"RDX: 0000000000000000" means parameter 'sectors' is 0.

So the question is, why does md raid code send a 0-length range for badblocks check? Is this behavior on purpose, or improper?

Coly Li
Comment 6 Coly Li 2023-11-24 14:35:39 UTC
IMHO, it doesn't make sense for caller to check a zero-length LBA range. The warning works as expect to detect improper call to badblocks_check().
Comment 7 Janpieter Sollie 2023-11-24 15:14:33 UTC
if I understand you correctly, should we ask the md people why they are calling for zero-length LBA?
the question whether calling to check with 0 sectors should be allowed or not is purely code style ... and I'm not a stylist
Comment 8 Coly Li 2023-11-24 16:38:38 UTC
I add Song Liu (md maintainer) into CC. Maybe he can give us some hint, why 0-length LBA range is sent into badblocks_check().
Comment 9 Janpieter Sollie 2023-11-27 07:02:20 UTC
reset to be an md bug,
adding commit that caused the regression
Comment 10 Song Liu 2023-11-29 00:20:34 UTC
Looking into this.
Comment 11 Song Liu 2023-11-29 04:46:22 UTC
md layer only calls badblocks_check() when md layer thinks 
there are badblocks in the device (md.h: is_badblock()). 
This is likely the reason we only see the issue in 1 of 
the 3 arrays. 

However, I don't think md layer is creating a zero size 
bio. I wonder whether the zero sized bio is from upper
layer. Could we try reproduce the issue with the 
following patch?

Thanks!


diff --git i/drivers/md/md.c w/drivers/md/md.c
index 4ee4593c874a..967e51ec0169 100644
--- i/drivers/md/md.c
+++ w/drivers/md/md.c
@@ -372,6 +372,7 @@ static bool is_suspended(struct mddev *mddev, struct bio *bio)

 void md_handle_request(struct mddev *mddev, struct bio *bio)
 {
+       WARN_ON((bio_data_dir(bio) == READ) && bio_sectors(bio) == 0);
 check_suspended:
        if (is_suspended(mddev, bio)) {
                DEFINE_WAIT(__wait);
@@ -406,6 +407,7 @@ static void md_submit_bio(struct bio *bio)
        const int rw = bio_data_dir(bio);
        struct mddev *mddev = bio->bi_bdev->bd_disk->private_data;

+       WARN_ON((rw == READ) && bio_sectors(bio) == 0);
        if (mddev == NULL || mddev->pers == NULL) {
                bio_io_error(bio);
                return;
diff --git i/drivers/md/raid5.c w/drivers/md/raid5.c
index dc031d42f53b..924631f65fdf 100644
--- i/drivers/md/raid5.c
+++ w/drivers/md/raid5.c
@@ -5534,6 +5534,7 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
        atomic_inc(&rdev->nr_pending);
        rcu_read_unlock();

+       WARN_ON(bio_sectors(raid_bio) == 0);
        if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad,
                        &bad_sectors)) {
                rdev_dec_pending(rdev, mddev);
@@ -5589,11 +5590,15 @@ static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio *raid_bio)
        unsigned chunk_sects = mddev->chunk_sectors;
        unsigned sectors = chunk_sects - (sector & (chunk_sects-1));

+       WARN_ON(bio_sectors(raid_bio) == 0);
+
        if (sectors < bio_sectors(raid_bio)) {
                struct r5conf *conf = mddev->private;
                split = bio_split(raid_bio, sectors, GFP_NOIO, &conf->bio_split);
                bio_chain(split, raid_bio);
                submit_bio_noacct(raid_bio);
+               WARN_ON(bio_sectors(raid_bio) == 0);
+               WARN_ON(bio_sectors(split) == 0);
                raid_bio = split;
        }
Comment 12 Janpieter Sollie 2023-11-29 07:05:47 UTC
so it's not a bug, the warning is actually a feature.

Sadly, I can't reproduce: in bug #218200, as a last resort, I tried to recreate the whole md array.  It did not help, but it also makes this bug #218184 impossible to debug further.

If you have a suggestion like "apply this patch and do (set of commands to be pasted here, probably some dd if=/dev/zero of=/dev/sd... to corrupt sector)", feel free to reopen.

Sorry for abandoning this too early
Comment 13 Song Liu 2023-11-29 11:52:42 UTC
Post some observation on this for the record. Maybe this will ring a 
bell for somebody (Kent?).

In the stack trace:

[   74.070877] raid5_make_request (drivers/md/md.h:224) 
[   74.070879] ? wait_woken (kernel/sched/wait.c:439) 
[   74.070881] ? sched_clock_cpu (kernel/sched/clock.c:394) 
[   74.070882] ? raw_spin_rq_lock_nested (kernel/sched/core.c:574) 
[   74.070884] ? newidle_balance (kernel/sched/fair.c:12329) 
[   74.070886] md_handle_request (./include/linux/percpu-refcount.h:174) 
[   74.070887] ? wake_bit_function (kernel/sched/wait.c:383) 
[   74.070888] md_submit_bio (drivers/md/md.c:?) 
[   74.070889] __submit_bio (block/blk-core.c:604) 
[   74.070890] submit_bio_noacct_nocheck (./include/linux/bio.h:693) 
[   74.070892] submit_bio_noacct (block/blk-core.c:?) 
[   74.070893] ? journal_write_endio (fs/bcachefs/journal_io.c:1669) 
[   74.070894] submit_bio (block/blk-core.c:835) 
[   74.070895] bch2_journal_write (./include/linux/percpu-refcount.h:174) 

From raid5_make_request, we know the call path is:

    raid5_make_request => 
      chunk_aligned_read =>
        raid5_read_one_chunk =>
          is_badblock =>
            badblocks_check

so the bio is READ. 

However, it was triggered by bch2_journal_write(). I cannot figure out 
why bch2_journal_write issues a read here. It appears to me that it 
should only do write or flush.

Note You need to log in before you can comment on or make changes to this bug.