Bug 200813 - lib/sbitmap.c:406 warning
Summary: lib/sbitmap.c:406 warning
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-14 15:36 UTC by János Tóth F.
Modified: 2019-05-06 12:53 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.18.0
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
BFQ update depth (2.07 KB, patch)
2019-01-18 15:06 UTC, Jens Axboe
Details | Diff

Description János Tóth F. 2018-08-14 15:36:20 UTC
I got this warning after reboot (previous working kernel version: 4.17.14). It seems to work fine afterward (filesystems are accessible and no more warnings or bugs so far). This is preceded and followed by network initialization messages (bridge ports switching states and such, so probably irrelevant).

[   66.095330] WARNING: CPU: 0 PID: 80 at lib/sbitmap.c:406 __sbitmap_queue_get_shallow+0x84/0xb0
[   66.095337] CPU: 0 PID: 80 Comm: kworker/u4:1 Not tainted 4.18.0-gentoo #3
[   66.095339] Hardware name: Gigabyte Technology Co., Ltd. X150M-PRO ECC/X150M-PRO ECC-CF, BIOS F22e 01/11/2018
[   66.095345] Workqueue: btrfs-worker btrfs_worker_helper
[   66.095352] RIP: 0010:__sbitmap_queue_get_shallow+0x84/0xb0
[   66.095353] Code: 11 48 83 c4 08 5b 5d 41 5c c3 48 8b 53 18 65 c7 02 00 00 00 00 48 83 c4 08 5b 5d 41 5c c3 31 ed 45 85 e4 75 09 65 89 28 eb 9b <0f> 0b eb 88 89 74 24 04 e8 ef 26 fb ff 31 d2 41 f7 f4 8b 74 24 04
[   66.095429] RSP: 0018:ffffae61c02ebb38 EFLAGS: 00010282
[   66.095434] RAX: ffff909f50109780 RBX: ffff909f50109790 RCX: ffff909f50109790
[   66.095436] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff909f50109790
[   66.095438] RBP: ffffae61c02ebc28 R08: ffff909ef2827b40 R09: ffff909ef04166d8
[   66.095441] R10: ffff909ef0416700 R11: 0000000000000000 R12: ffff909f0fe7a400
[   66.095443] R13: ffff909f50109790 R14: 0000000000000000 R15: 0000000000000001
[   66.095447] FS:  0000000000000000(0000) GS:ffff909f5ec00000(0000) knlGS:0000000000000000
[   66.095450] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   66.095453] CR2: 00007f21c823a000 CR3: 000000020d20a005 CR4: 00000000002606f0
[   66.095455] Call Trace:
[   66.095465]  ? mempool_alloc+0x60/0x190
[   66.095471]  blk_mq_get_tag+0x24b/0x280
[   66.095478]  ? elv_merge+0x5e/0xe0
[   66.095484]  ? __wake_up_common_lock+0xb0/0xb0
[   66.095491]  blk_mq_get_request+0x30f/0x3c0
[   66.095496]  blk_mq_make_request+0x112/0x450
[   66.095500]  generic_make_request+0x1c7/0x460
[   66.095505]  submit_bio+0x40/0x130
[   66.095512]  ? rbio_orig_end_io+0xc0/0xc0
[   66.095516]  finish_rmw+0x350/0x4e0
[   66.095520]  full_stripe_write+0xa0/0xb0
[   66.095524]  raid56_parity_write+0xd3/0x130
[   66.095529]  btrfs_map_bio+0x325/0x330
[   66.095536]  btrfs_submit_bio_done+0x1d/0x50
[   66.095540]  normal_work_helper+0x13d/0x1d0
[   66.095547]  process_one_work+0x1d1/0x310
[   66.095553]  worker_thread+0x28/0x3c0
[   66.095557]  kthread+0x105/0x120
[   66.095563]  ? process_one_work+0x310/0x310
[   66.095567]  ? __kthread_bind_mask+0x60/0x60
[   66.095572]  ret_from_fork+0x35/0x40
[   66.095576] ---[ end trace 4b2b7d4424186a77 ]---
Comment 1 János Tóth F. 2018-08-14 15:41:30 UTC
Some scheduler parameters get modified after boot (I didn't test if this makes a difference):

~ # cat /etc/local.d/io.start
echo bfq > /sys/block/sda/queue/scheduler
echo bfq > /sys/block/sdb/queue/scheduler
echo bfq > /sys/block/sdc/queue/scheduler
echo bfq > /sys/block/sdd/queue/scheduler
echo bfq > /sys/block/sde/queue/scheduler
echo bfq > /sys/block/sdf/queue/scheduler
echo bfq > /sys/block/sdg/queue/scheduler
echo 64 > /sys/block/sda/queue/nr_requests
echo 256 > /sys/block/sdb/queue/nr_requests
echo 256 > /sys/block/sdc/queue/nr_requests
echo 256 > /sys/block/sdd/queue/nr_requests
echo 256 > /sys/block/sde/queue/nr_requests
echo 256 > /sys/block/sdf/queue/nr_requests
echo 64 > /sys/block/sdg/queue/nr_requests
echo 1024 > /sys/block/sda/queue/max_sectors_kb
echo 1024 > /sys/block/sdb/queue/max_sectors_kb
echo 1024 > /sys/block/sdc/queue/max_sectors_kb
echo 1024 > /sys/block/sdd/queue/max_sectors_kb
echo 1024 > /sys/block/sde/queue/max_sectors_kb
echo 1024 > /sys/block/sdf/queue/max_sectors_kb
echo 1024 > /sys/block/sdg/queue/max_sectors_kb
echo 1024 > /sys/block/sda/queue/read_ahead_kb
echo 1024 > /sys/block/sdb/queue/read_ahead_kb
echo 1024 > /sys/block/sdc/queue/read_ahead_kb
echo 1024 > /sys/block/sdd/queue/read_ahead_kb
echo 1024 > /sys/block/sde/queue/read_ahead_kb
echo 1024 > /sys/block/sdf/queue/read_ahead_kb
echo 1024 > /sys/block/sdg/queue/read_ahead_kb
Comment 2 Kai Krakow 2018-12-27 20:22:02 UTC
I'm seeing this, too. But only after my system crashed a few days ago. File systems seem to be okay but I'm seeing data corruption during heavy write workloads now on my xfs backup partition (the backup fails integrity checks every now and then).

Kernel 4.19.2-ck

[    5.558938] WARNING: CPU: 2 PID: 449 at lib/sbitmap.c:406 __sbitmap_queue_get_shallow+0x85/0xb0
[    5.558939] Modules linked in: snd ecdh_generic irqbypass lpc_ich soundcore uas usb_storage vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia_drm(PO) nvidia_uvm(O) nvidia_modeset(PO) nvidia(PO) nct6775 hwmon_vid coretemp hwmon efivarfs
[    5.558967] CPU: 2 PID: 449 Comm: kworker/2:4 Tainted: P           O      4.19.2-ck #6
[    5.558968] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013
[    5.558971] Workqueue: bcache cached_dev_read_done
[    5.558974] RIP: 0010:__sbitmap_queue_get_shallow+0x85/0xb0
[    5.558976] Code: 11 48 83 c4 08 5b 5d 41 5c c3 48 8b 53 18 65 c7 02 00 00 00 00 48 83 c4 08 5b 5d 41 5c c3 31 ed 45 85 e4 75 09 65 89 28 eb 9a <0f> 0b eb 87 89 74 24 04 e8 3e 12 fa ff 31 d2 8b 74 24 04 41 f7 f4
[    5.558977] RSP: 0018:ffffc90003217c90 EFLAGS: 00010286
[    5.558978] RAX: ffff880401808540 RBX: ffff880401808550 RCX: ffff880401808550
[    5.558979] RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff880401808550
[    5.558980] RBP: ffffc90003217d78 R08: 0000000000000000 R09: ffff88040c781ab0
[    5.558981] R10: 0000000000001000 R11: 0000000000001000 R12: ffff880401808550
[    5.558982] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
[    5.558983] FS:  0000000000000000(0000) GS:ffff88040ea80000(0000) knlGS:0000000000000000
[    5.558984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.558985] CR2: 000055f0ec0b5000 CR3: 000000000320a006 CR4: 00000000001606e0
[    5.558986] Call Trace:
[    5.558992]  blk_mq_get_tag+0xc9/0x280
[    5.558995]  ? wait_woken+0x80/0x80
[    5.558996]  blk_mq_get_request+0xef/0x3b0
[    5.558998]  blk_mq_make_request+0xdb/0x430
[    5.559000]  generic_make_request+0x1e7/0x460
[    5.559002]  ? bch_data_insert_start+0xb0/0x4c0
[    5.559003]  bch_data_insert_start+0xb0/0x4c0
[    5.559014]  cached_dev_read_done+0x13b/0x190
[    5.559016]  process_one_work+0x1c0/0x340
[    5.559019]  worker_thread+0x28/0x3c0
[    5.559021]  ? set_worker_desc+0xb0/0xb0
[    5.559022]  kthread+0x109/0x120
[    5.559024]  ? kthread_create_worker_on_cpu+0x70/0x70
[    5.559026]  ret_from_fork+0x35/0x40
[    5.559028] ---[ end trace 0b6268e3d074dc52 ]---

This seems to involve bcache, I disconnected bcache from the partitions but the backtrace remains (this time involving no "bch_"-prefixed functions).

I also applied the btrfs and blk-mq related commits from 4.19.3 to 4.19.12 which didn't help this problem. I cannot currently go to a newer kernel because the CK patchset conflicts. But I guess I can re-test with 4.20 as soon as the CK patchset is released for it.


I'm also applying nr_request changes:

# cat /etc/udev/rules.d/99-io-scheduler.rules
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/nr_requests}="512"


My scheduler defaults to mq-bfq (except SSD/Flash):

# grep ^ /sys/block/*/queue/scheduler
/sys/block/bcache0/queue/scheduler:none
/sys/block/bcache1/queue/scheduler:none
/sys/block/bcache2/queue/scheduler:none
/sys/block/bcache3/queue/scheduler:none
/sys/block/sda/queue/scheduler:mq-deadline [kyber] bfq none
/sys/block/sdb/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/sdc/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/sdd/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/sde/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/sdf/queue/scheduler:mq-deadline [kyber] bfq none
/sys/block/sdg/queue/scheduler:mq-deadline kyber [bfq] none
Comment 3 Kai Krakow 2018-12-27 20:25:14 UTC
(In reply to Kai Krakow from comment #2)
> I'm seeing this, too. But only after my system crashed a few days ago. File
> systems seem to be okay but I'm seeing data corruption during heavy write
> workloads now on my xfs backup partition (the backup fails integrity checks
> every now and then).
> 
> Kernel 4.19.2-ck
> 
> [    5.558938] WARNING: CPU: 2 PID: 449 at lib/sbitmap.c:406
> __sbitmap_queue_get_shallow+0x85/0xb0
> [    5.558939] Modules linked in: snd ecdh_generic irqbypass lpc_ich
> soundcore uas usb_storage vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O)
> nvidia_drm(PO) nvidia_uvm(O) nvidia_modeset(PO) nvidia(PO) nct6775 hwmon_vid
> coretemp hwmon efivarfs
> [    5.558967] CPU: 2 PID: 449 Comm: kworker/2:4 Tainted: P           O     
> 4.19.2-ck #6
> [    5.558968] Hardware name: To Be Filled By O.E.M. To Be Filled By
> O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013
> [    5.558971] Workqueue: bcache cached_dev_read_done
> [    5.558974] RIP: 0010:__sbitmap_queue_get_shallow+0x85/0xb0
> [    5.558976] Code: 11 48 83 c4 08 5b 5d 41 5c c3 48 8b 53 18 65 c7 02 00
> 00 00 00 48 83 c4 08 5b 5d 41 5c c3 31 ed 45 85 e4 75 09 65 89 28 eb 9a <0f>
> 0b eb 87 89 74 24 04 e8 3e 12 fa ff 31 d2 8b 74 24 04 41 f7 f4
> [    5.558977] RSP: 0018:ffffc90003217c90 EFLAGS: 00010286
> [    5.558978] RAX: ffff880401808540 RBX: ffff880401808550 RCX:
> ffff880401808550
> [    5.558979] RDX: 0000000000000000 RSI: 000000000000000c RDI:
> ffff880401808550
> [    5.558980] RBP: ffffc90003217d78 R08: 0000000000000000 R09:
> ffff88040c781ab0
> [    5.558981] R10: 0000000000001000 R11: 0000000000001000 R12:
> ffff880401808550
> [    5.558982] R13: 0000000000000000 R14: 0000000000000001 R15:
> 0000000000000001
> [    5.558983] FS:  0000000000000000(0000) GS:ffff88040ea80000(0000)
> knlGS:0000000000000000
> [    5.558984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.558985] CR2: 000055f0ec0b5000 CR3: 000000000320a006 CR4:
> 00000000001606e0
> [    5.558986] Call Trace:
> [    5.558992]  blk_mq_get_tag+0xc9/0x280
> [    5.558995]  ? wait_woken+0x80/0x80
> [    5.558996]  blk_mq_get_request+0xef/0x3b0
> [    5.558998]  blk_mq_make_request+0xdb/0x430
> [    5.559000]  generic_make_request+0x1e7/0x460
> [    5.559002]  ? bch_data_insert_start+0xb0/0x4c0
> [    5.559003]  bch_data_insert_start+0xb0/0x4c0
> [    5.559014]  cached_dev_read_done+0x13b/0x190
> [    5.559016]  process_one_work+0x1c0/0x340
> [    5.559019]  worker_thread+0x28/0x3c0
> [    5.559021]  ? set_worker_desc+0xb0/0xb0
> [    5.559022]  kthread+0x109/0x120
> [    5.559024]  ? kthread_create_worker_on_cpu+0x70/0x70
> [    5.559026]  ret_from_fork+0x35/0x40
> [    5.559028] ---[ end trace 0b6268e3d074dc52 ]---
> 
> This seems to involve bcache, I disconnected bcache from the partitions but
> the backtrace remains (this time involving no "bch_"-prefixed functions).
[...snip...]

BTW: FS is also btrfs here. Detaching bcache then changes the backtrace to more closely match that of the OP.
Comment 4 János Tóth F. 2018-12-28 05:06:52 UTC
For me, these errors seem to have stopped. I completely forgot about this bug, so can't tell for sure which was the last version I saw this errors with. But virtually all the seemingly Btrfs or BLK related problems I had (not just this) went away around 4.19.5 (if I recall, not sure exactly). 4.20.0 also seems fine so far.
Comment 5 Kai Krakow 2018-12-31 05:23:50 UTC
Did you enable multiqueue before and now you don't? For me, the problem went away after I disabled mq-bfq. But the remaining schedulers are not very useful here (system was very sluggish) so I've switched MQ off and returned to cfq.
Comment 6 János Tóth F. 2018-12-31 11:47:19 UTC
I have been using bfq-mq (and consequently blk-mq - which wasn't a viable option for me before it got some HDD-friendly schedulers) ever since it became available in the mainline kernel.

Do you modify any scheduler parameters?
In the mean time I changed the script from Comment_1 to these udev rules:

~ # cat /etc/udev/rules.d/60-ioschedulers.rules
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="bfq"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/read_ahead_kb}="0"

I think I dropped the nr_requests lines for a good reason. I guess that might be the one which triggered the warnings for me, especially when I touched the parameters of the USB drive (which is yet another variable since a distro-patch recently disabled the UAS mode for my USB3 enclosure, so it might started behaving differently at some point).
Comment 7 Kai Krakow 2018-12-31 15:45:29 UTC
I've also migrated from /etc/local.d to /etc/udev/rules.d a long time ago because it get's applied very early in the boot process and not only after all other boot scripts ran:

# cat /etc/udev/rules.d/00-fix-rotational.rules
ACTION=="add|change", KERNEL=="bcache*", ATTR{queue/rotational}="1"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{device/model}=="SD/MMC", ATTR{queue/rotational}="0"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/iosched/slice_idle}="0"

# cat /etc/udev/rules.d/01-io-schedulers.rules
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/nr_requests}="512"

#ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="kyber"
#ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="cfq"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="cfq"

# cat /etc/udev/rules.d/01-bcache-writeback.rules
ACTION=="add|change", KERNEL=="bcache*", ATTR{bcache/writeback_rate_minimum}="8192"

I'll look into the nr_requests thing next time I get a chance to try bfq again, thanks for the hint, but I think I found that this problem may have been fixed in the out-of-tree bfq patches (which also add bfq-sq which never worked well for me). Currently I'm back to 4.18.y and will stay there until the nvidia-drivers are ready for 4.20 and Gentoo released the ck-kernel for it. Applying the ck patchset manually through /etc/portage/patches worked well but the nvidia module is missing a kernel symbol and refused to load.

Meanwhile I figured out that the kernel backtrace shows up as soon as the bfq udev rule becomes loaded and thus the scheduler switches to bfq. The system is not stable that way: It eventually freezes (without hints in "dmesg -w") sooner or later due to heavy IO, I can still move windows around but after a while even kwin freezes, requiring Alt+SysRq+REISUB.

Coincidence has it that one of my RAM modules went bad around the same time, this was hard to split into different error sources... I restored from backup several times (which took 10+ hours per run) until I decided to re-test my RAM with memtest86. I never expected RAM to fail after years of usage (and I always test new RAM modules after installation) but now it shows two bitflips in the same byte addr. This totally destroyed btrfs with random errors, prevented the backup to be restored correctly, and even destroyed parts of the backup. Luckily, I could take a backup of recovered data and borg-backup was able to heal the missing file chunks with that. Rebuilding my borg repository took about 15 hours, and was only successful after isolating and removing the failing RAM module. :-(

Yeah, fun, happy new year. :-/
Comment 8 Kai Krakow 2019-01-02 09:31:44 UTC
Looks like setting the queue depth to 64 prevents the trace from occurring when activating bfq...

So maybe the title should be amended to add "when setting bfq with nr_requests > 64" or something similar.
Comment 9 Kai Krakow 2019-01-02 15:02:07 UTC
(In reply to Kai Krakow from comment #8)
> Looks like setting the queue depth to 64 prevents the trace from occurring
> when activating bfq...
> 
> So maybe the title should be amended to add "when setting bfq with
> nr_requests > 64" or something similar.

Or 128 that is... I just remembered the default value wrong...
Comment 10 Kai Krakow 2019-01-02 21:13:54 UTC
I'm not sure what the defaults are... Some devices have 64, some have 128, some even have 2.

But I can confirm on a second system now that leaving nr_requests at the default gets rid of the trace - and in consequence it probably also gets rid of the random system freezes during high IO loads.

Should we CC the mq-bfq devs somehow on this?
Comment 11 Paolo Valente 2019-01-08 09:22:55 UTC
This blk-mq bug has been fixed last year, by this patch series by Jens:
https://www.spinics.net/lists/linux-block/msg25832.html

The mainline commit for bfq is:
483b7bf2e402336577 ("bfq-iosched: update shallow depth to smallest one used")

This commit is available in mainline from 4.18 on.

I guess this fix has not been backported to 4.17.14.

So, if you do want to use bfq:
- backport these fix commits manually (or maybe ask for a backport), or
- use a newer kernel version, containing them, or
- use my out-of-tree bfq-mq
Comment 12 Kai Krakow 2019-01-08 09:39:51 UTC
I'll look into, great info. Thanks. :-)
Comment 13 Kai Krakow 2019-01-13 06:00:11 UTC
@Paolo

The commit you mentioned is part of my kernel (4.18.19-ck).

But the error can still be seen as soon as you increase `nr_requests` to a higher value, e.g. 512.

As a result, the system may freeze during high IO load and probably may loose important meta data changes, at least I'm seeing csum and/or transaction errors in btrfs then, often resulting in a filesystem damaged beyond repair.

I cross-checked with a non-CK kernel, and the problem exists there, too.

It actually has been merged into 4.17-rc4:

$ git describe 483b7bf2e40233657713279b6f98a9225ea0ff84; git show 483b7bf2e40233657713279b6f98a9225ea0ff84
v4.17-rc4-36-g483b7bf2e402
commit 483b7bf2e40233657713279b6f98a9225ea0ff84
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed May 9 15:26:55 2018 -0600

    bfq-iosched: update shallow depth to smallest one used

    If our shallow depth is smaller than the wake batching of sbitmap,
    we can introduce hangs. Ensure that sbitmap knows how low we'll go.

    Acked-by: Paolo Valente <paolo.valente@linaro.org>
    Reviewed-by: Omar Sandoval <osandov@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 10294124d597..b622e73a326a 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5081,10 +5081,13 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)

 /*
  * See the comments on bfq_limit_depth for the purpose of
- * the depths set in the function.
+ * the depths set in the function. Return minimum shallow depth we'll use.
  */
-static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
+static unsigned int bfq_update_depths(struct bfq_data *bfqd,
+                                     struct sbitmap_queue *bt)
 {
+       unsigned int i, j, min_shallow = UINT_MAX;
+
        /*
         * In-word depths if no bfq_queue is being weight-raised:
         * leaving 25% of tags only for sync reads.
@@ -5115,14 +5118,22 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
        bfqd->word_depths[1][0] = max(((1U << bt->sb.shift) * 3) >> 4, 1U);
        /* no more than ~37% of tags for sync writes (~20% extra tags) */
        bfqd->word_depths[1][1] = max(((1U << bt->sb.shift) * 6) >> 4, 1U);
+
+       for (i = 0; i < 2; i++)
+               for (j = 0; j < 2; j++)
+                       min_shallow = min(min_shallow, bfqd->word_depths[i][j]);
+
+       return min_shallow;
 }

 static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
 {
        struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
        struct blk_mq_tags *tags = hctx->sched_tags;
+       unsigned int min_shallow;

-       bfq_update_depths(bfqd, &tags->bitmap_tags);
+       min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
+       sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
        return 0;
 }
Comment 14 Paolo Valente 2019-01-18 11:50:06 UTC
@Krai

> The commit you mentioned is part of my kernel (4.18.19-ck).

ok


> But the error can still be seen as soon as you increase `nr_requests` to a
> higher value, e.g. 512.

ok, sorry, I read this thread too quickly, and when I got to the end of it, I recalled the tag 4.17.14, but I didn't recall any longer that 4.17.14 was the last working version, and not the first non-working one.

I reproduced your failure, and probably found the culprit: a commit by Jens. I'm about to report this on linux-block, CCing you too.

In addition, I'll report here a link to my email on linux-block, as it gets archived somewhere.

> It actually has been merged into 4.17-rc4:
> $ git describe 483b7bf2e40233657713279b6f98a9225ea0ff84

Unfortunately this is not the right command to use for your goal. You could try, e.g.,

git tag -l --contains 483b7bf2e40233657713279b6f98a9225ea0ff84

and see that that commit is available only from 4.18 on.
Comment 15 Kai Krakow 2019-01-18 14:18:25 UTC
(In reply to Paolo Valente from comment #14)
> > The commit you mentioned is part of my kernel (4.18.19-ck).
> 
> ok
> 
> 
> > But the error can still be seen as soon as you increase `nr_requests` to a
> > higher value, e.g. 512.
> 
> ok, sorry, I read this thread too quickly, and when I got to the end of it,
> I recalled the tag 4.17.14, but I didn't recall any longer that 4.17.14 was
> the last working version, and not the first non-working one.
> 
> I reproduced your failure, and probably found the culprit: a commit by Jens.
> I'm about to report this on linux-block, CCing you too.
> 
> In addition, I'll report here a link to my email on linux-block, as it gets
> archived somewhere.

Thanks for CCing, I'd happily try proposed patches for testing.


> > It actually has been merged into 4.17-rc4:
> > $ git describe 483b7bf2e40233657713279b6f98a9225ea0ff84
> 
> Unfortunately this is not the right command to use for your goal. You could
> try, e.g.,

(facepalm) Oh yeah, you're right. :-\


> git tag -l --contains 483b7bf2e40233657713279b6f98a9225ea0ff84
> 
> and see that that commit is available only from 4.18 on.

Yes. Makes more sense.

But still, I think I also saw it in 4.17 but I'm not sure. I purged the old kernels after restore from backup with a kernel working stable for me. The other kernels have been compiled with a potentially defective memory module with bitflip errors so I just removed them.

Thanks
Kai
Comment 16 Paolo Valente 2019-01-18 14:21:12 UTC
Here's the thread too:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1903590.html
Comment 17 Jens Axboe 2019-01-18 15:05:59 UTC
(In reply to Kai Krakow from comment #15)
 
> Thanks for CCing, I'd happily try proposed patches for testing.

Can you try the one I'm about to attach? Also posted in the thread.
Comment 18 Jens Axboe 2019-01-18 15:06:26 UTC
Created attachment 280579 [details]
BFQ update depth
Comment 19 Kai Krakow 2019-01-18 23:18:09 UTC
@Jens

I tried to apply this to 4.18.19 (as this is what I'm currently working with, 4.19 showed FS problems for me, 4.20 showed problems when loading the nvidia driver).

Applying the patch worked without conflicts but I cannot compile the kernel:

kakra@jupiter /usr/src/linux $ sudo LANG=C make -s
block/blk-mq.c: In function 'blk_mq_update_nr_requests':
block/blk-mq.c:2839:44: error: 'union <anonymous>' has no member named 'depth_updated'
   if (q->elevator && q->elevator->type->ops.depth_updated)
                                            ^
block/blk-mq.c:2840:26: error: 'union <anonymous>' has no member named 'depth_updated'
    q->elevator->type->ops.depth_updated(hctx);
                          ^
make[1]: *** [scripts/Makefile.build:318: block/blk-mq.o] Error 1
make: *** [Makefile:1034: block] Error 2

I'll take a deeper look but if you have an easy fix, I'd appreciate an answer.
Comment 20 Jens Axboe 2019-01-18 23:21:37 UTC
You didn't apply the whole patch, notably the elevator.h parts. And those will definitely conflict, you need to hand copy that one line into just the mq_ops part.
Comment 21 Kai Krakow 2019-01-18 23:57:12 UTC
I used the patch sent to the linux-block list and applied it to liunx/master (git am), then cherry-picked it pack into v4.18.19. The resulting patch looks identical to the attachment here (vimdiff shows no diff in code lines).

I now cross-checked if the line you mentioned is included, and it is.

I've hard reset my branch and directly applied your attachment from bugzilla, it applied without conflicts, resulting in an identical commit (minus the different commit id due to different commit date).

$ git co v4.18.19
$ curl "https://bugzilla.kernel.org/attachment.cgi?id=280579&action=diff&collapsed=&headers=1&format=raw" | git apply

# --> no conflicts, it just applied
Comment 22 Jens Axboe 2019-01-19 00:05:58 UTC
Ah, I forgot, you just need to change any:

q->elevator->type->ops.depth_updated

to

q->elevator->type->ops.mq.depth_updated

since the older kernels have both sq and mq ops. So that's why it applies cleanly yet doesn't compile.
Comment 23 Kai Krakow 2019-01-19 00:27:25 UTC
Yep, that's it. Now it compiles, thanks. I'll report back after reboot with nr_request changes back in place.

Next step would be to update to 4.20 with this patch. Does it need the mq/sq version of this patch, or the one you originally posted?
Comment 24 Jens Axboe 2019-01-19 01:52:19 UTC
Yes, 4.20 will require the same ->ops.mq.depth_updated() treatment.
Comment 25 Kai Krakow 2019-01-19 11:06:22 UTC
Works here, maybe update the bug status?
Comment 26 Jens Axboe 2019-01-19 14:37:08 UTC
Moving to RESOLVED, I'll queue up the fix for 5.0-rc4.
Comment 27 Jan Steffens 2019-03-04 12:56:13 UTC
(In reply to Jens Axboe from comment #26)
> Moving to RESOLVED, I'll queue up the fix for 5.0-rc4.

What happened to this fix? It's not in 5.0. Is it still relevant?
Comment 28 Timofey Titovets 2019-05-03 00:22:21 UTC
Same warning for 4.19.38,
looks like upstream fix: 77f1e0a52d26242b6c2dba019f6ebebfb9ff701e
Comment 29 Timofey Titovets 2019-05-06 12:53:36 UTC
Backported to 4.19.38+
https://github.com/Nefelim4ag/linux/commit/645dabdc2d484c8f6b87cfa7d10ca1f41bfa6102

That fixes the problem

Note You need to log in before you can comment on or make changes to this bug.