Bug 196675 - BFQ scheduler hangs system
Summary: BFQ scheduler hangs system
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Jens Axboe
Depends on:
Reported: 2017-08-16 06:23 UTC by Vladimir Lomov
Modified: 2017-09-24 04:24 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.12
Tree: Mainline
Regression: No

This is copy-pasted from terminal (output of journalctl -k -f) (20.56 KB, text/plain)
2017-08-16 06:23 UTC, Vladimir Lomov
This is collected using netcosole (87.25 KB, text/plain)
2017-08-16 06:28 UTC, Vladimir Lomov

Description Vladimir Lomov 2017-08-16 06:23:59 UTC
Created attachment 257941 [details]
This is copy-pasted from terminal (output of journalctl -k -f)

I'm using Archlinux x86_64 and the distro ships linux kernel ver. 4.12.7. I enable BFQ scheduler using kernel parameter and udev rule:

[grub, kernel parameter]
systemd.unified_cgroup_hierarchy=1 scsi_mod.use_blk_mq=1

[udev rule]
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="bfq"

After the boot I checked that disks are use bfq scheduler:

$ cat /sys/block/sda/queue/scheduler
mq-deadline kyber [bfq] none

Now I'm starting systemd-nspawn instance files of which are located on disk governed by bfq. This instance is used to run Yandex.Disk to synchronize files and directories. A few seconds after that system hangs. I was able to get messages from kernel (it is not 100% possible).

If I use another scheduler, kyber for example, all works fine.

Let me know if more details are needed.

P.S. I already filed bug report on distro bug tracker.

WBR, Vladimir Lomov
Comment 1 Vladimir Lomov 2017-08-16 06:24:55 UTC
Forgot to add url to distro bug report (it has some details)
Comment 2 Vladimir Lomov 2017-08-16 06:28:18 UTC
Created attachment 257943 [details]
This is collected using netcosole

This data is taken from netconsole. I had started system-nspawn instance several times to try to get something meaningful, but when kernel stuck nothing is printed. After 5 minutes watchdog restarts the system. Finally I was able to obtain something useful when I started second systemd-nspawn instance and then the first one.
Comment 3 Jens Axboe 2017-08-16 14:49:17 UTC
I have forwarded this report to Paolo, doesn't look like he has a bugzilla account.
Comment 4 Vladimir Lomov 2017-08-22 13:14:41 UTC
Kernel with patches from messages

1. https://www.spinics.net/lists/linux-block/msg14113.html
2. https://www.spinics.net/lists/linux-block/msg14303.html
3. https://www.spinics.net/lists/linux-block/msg15016.html
4. https://www.spinics.net/lists/linux-block/msg15222.html
5. https://www.spinics.net/lists/linux-block/msg15514.html
6. https://www.spinics.net/lists/linux-block/msg15516.html
7. https://www.spinics.net/lists/linux-block/msg15626.html
8. https://www.spinics.net/lists/linux-block/msg15625.html
9. https://www.spinics.net/lists/linux-block/msg16172.html

works fine.

I didn't try to find exact patch that solves the problem (sorry, I don't have time right now) just searched linux-block mailing list and collected all patches related to BFQ (except trivial and module aliasing one).

As I understand all these patches will be in kernel 4.14 so I think when kernel 4.14 will be released this bug may be closed.
Comment 5 Sami Farin 2017-09-23 06:47:10 UTC
Vladimir, do you use DM and/or LUKS?

I have XFS, SLUB, LUKS (for root and media partitions).  I started testing bfq with 4.12.5 kernel.  System just hangs in 5 min to one day and I have to power cycle.  I have tried 4.12.5, 4.12.6, 4.12.7, 4.12.8, 4.12.10, 4.12.11, 4.12.13, 4.12.14.  With 4.12.12 I got 11 days uptime when I used scsi_mod.use_blk_mq=N .
4.9 series was very stable (it didn't have bfq :-P ).

Unfortunately I don't have any logs, I run in Xorg and after reboot there is nothing in logs about the crash.

On next reboot I try if kyber is more stable...
Comment 6 Vladimir Lomov 2017-09-24 04:24:39 UTC
Hello Sami Farin,

No, I don't use DM and LUKS.

I partially resolved my BFQ problems using patches published on linux-block (will hope they will be in 4.14) but still has sometimes problems with recent kernels (4.12.12, 4.12.13, 4.12.14). From time to time some of systems hang and reboot (thanks watchdog), but this is rather rare situations so it is difficult to get logs. To get logs from kernel I setup netconsole for my hosts but caught only two hangs and only one is related to bfq.

If you want to check kernel 4.9 you may try CK patches (I use linux-ck but until 4.12 BFQ was no kernel).

Note You need to log in before you can comment on or make changes to this bug.